LSHTC4 starts

We are pleased to announce the fourth edition of the LSHTC challenge. This year’s challenge comprises three tracks and is based on two large datasets created from the ODP web directory (DMOZ) and Wikipedia. The datasets are multi-class, multi-label and hierarchical. The number of categories ranges between 13,000 and 325,000 and the number of documents between 380,000 and 2,400,000.
The tracks of the challenge are organized as follows:
1. Very Large Scale Supervised Learning on a large collection from Wikipedia
2. Multi-task learning, based on both DMOZ and Wikipedia category systems
3. Refinement-learning on a subset of the DMOZ category system
In order to register for the challenge and gain access to the datasets, you must have an account at the challenge Web site. Please consult the web site ( for more information on this challenge.
Important dates:
- July 17, start of the challenge
- July 31, opening of the evaluation
- June 29, closing of evaluation