The original ForestCover/Covertype dataset from UCI machine learning repository is a multiclass classification dataset. It is used in predicting forest cover type from cartographic variables only (no remotely sensed data). This study area includes four wilderness areas located in the Roosevelt National Forest of northern Colorado. These areas represent forests with minimal human-caused disturbances, so that existing forest cover types are more a result of ecological processes rather than forest management practices. This dataset has 54 attributes (10 quantitative variables, 4 binary wilderness areas and 40 binary soil type variables). Here, outlier detection dataset is created using only 10 quantitative attributes. Instances from class 2 are considered as normal points and instances from class 4 are anomalies. The anomalies ratio is 0.9%. Instances from the other classes are omitted.
ForestCover is available on Aftershock and normal observations are available in the included training dataset consisting of 10 dimensions per observation. During evaluation, the main program of your submission is expected to access /ingress/covertype/testing.csv
which has the same form as the development dataset and produce sequentially aligned anomaly confidence values (in [0, 1]) at /egress/covertype/predictions.csv
.