The challenge consists of 3 tracks, involving different category systems with different data properties and focusing on different learning and mining problems. The challenge is based on two large datasets: one created from the ODP web directory (DMOZ) and one from Wikipedia. The datasets are multi-class, multi-label and hierarchical. The number of categories range between 13,000 and 325,000 roughly and the number of the documents between 380,000 and 2,400,000.
Track 1: Large Scale Hierarchical Classification.
This track is the standard large scale hierarchical classification task, base on Wikipedia data, and comprises two different subtasks:
Track 2: Multi-task Learning.
This track introduces a multitask learning track between DMOZ and the medium-sized Wikipedia datasets. Multitask learning aims at leveraging classification in one category system with the classification results obtained in a different, yet related category system. One makes use of the shared information between the two category systems in order to improve classification performance on each of the individual tasks. For the challenge the participants will be provided with DMOZ and medium-sized Wikipedia datasets under a common feature space. The participating methods will be assessed on test sets from both datasets.
Track 3: Refinement Learning.
By refinement we refer to the process of creating new categories in the hierarchy by splitting old ones. In the context of hierarchy development, the creation of new categories and thus the expansion of the hierarchy corresponds to a scenario in which users interact with the taxonomy and modify it so as to best match their need. After the creation of several new categories, the system has to reassign documents to them. The development of automated solutions that can support this procedure in an efficient way has direct, practical impact. The proposed track addresses this scenario. It comprises two subtasks: a semi-supervised and an unsupervised one:
The hierarchy for this track is a tree.
The participants will be able to upload their results on an online evaluation system. For more information please refere to the evaluation section.
At the closing date of the testing phase, participants will be asked to submit the following:
* A short paper describing their method, including an algorithmic description, results of dry-run tests, computational complexity estimates, hardware set-up used for training the classifiers and training times. This paper will be uploaded to the site and will be publicly available.