Proceedings of the 4th RapidMiner Community Meeting and Conference (RCOMM 2013) | ISBN 9783844021455

Proceedings of the 4th RapidMiner Community Meeting and Conference (RCOMM 2013)

herausgegeben von Simon Fischer, Ingo Mierswa, Jõao Mendes Moreira und Carlos Soares
Mitwirkende
Herausgegeben vonSimon Fischer
Herausgegeben vonIngo Mierswa
Herausgegeben vonJõao Mendes Moreira
Herausgegeben vonCarlos Soares
Buchcover Proceedings of the 4th RapidMiner Community Meeting and Conference (RCOMM 2013)  | EAN 9783844021455 | ISBN 3-8440-2145-0 | ISBN 978-3-8440-2145-5

Proceedings of the 4th RapidMiner Community Meeting and Conference (RCOMM 2013)

herausgegeben von Simon Fischer, Ingo Mierswa, Jõao Mendes Moreira und Carlos Soares
Mitwirkende
Herausgegeben vonSimon Fischer
Herausgegeben vonIngo Mierswa
Herausgegeben vonJõao Mendes Moreira
Herausgegeben vonCarlos Soares
Because of costs and scarcity, datasets are often highly imbalanced, with a large majority class and a far smaller minority class. Typical examples of imbalanced datasets are healthy versus diseased tissue measurements, lawful versus criminal banking transactions, and correctly priced versus mispriced financial instruments. Constructing classifiers from imbalanced data presents significant theoretical and practical challenges. Validation is also affected by imbalance, as a trivial classifier that ignores its input and always predicts the majority class will appear prescient. This presentation surveys class imbalance from a conceptual perspective, and empirically investigates several RapidMiner approaches to constructing classifiers from imbalanced data. Finally, the presentation describes a set of broadly applicable RapidMiner processes that detect, construct, and evaluate classifiers with imbalanced data.