WSDM 2014 Workshop: "Web-Scale Classification: Classifying Big Data from the Web"

7th ACM WSDM Conference

Crowne Plaza Times Square, New York City

Febryary 28, 2014



The huge amount of data available in the Web in various forms (text, images, videos etc.) pose challenging and difficult problems towards the extraction and assessment of useful information. Intelligent systems for knowledge extraction are nowadays of utmost importance due to the scale of the data in the Web. A key module for any intelligent system is the capability of identifying and classifying correctly data items in a pre-defined set of classes. For example, in the e-commerce setting the goal is to classify product items in a set of categories (contained in the inventory). In order to ease the classification and organization of the data many real world systems use taxonomies over the set of categories which are typically organized in a hierarchical structure with parent-child relations. Typical examples of such taxonomies are DMOZ,, the International Patent Classification, or Wikipedia.

In this setting a number of interesting problems arise for Web classification systems as the both the size of the hierarchies and the data grow.  In particular it is one of the rare situations where data sparsity remains an issue despite the vastness of available data. The reasons for this are the simultaneous increase in the number of classes and their hierarchical organization. The latter leads to a very high imbalance between the classes at different levels of the hierarchy. Additionally, the statistical dependence of the classes poses challenges and opportunities for learning approaches.

The goal of this workshop is to discuss and assess recent research focusing on classification and mining in Web-scale category systems. In particular, we want to attract researchers developing new ways to exploit such Web-scale systems, e.g. by exploring how different category systems can be combined (though multi-task or transfer learning for example) to improve classification accuracy or by exploring how hierarchies can be refined or simplified for classification and mining purposes. We also want to attract studies that reveal new properties of large scale category systems, e.g. the type of data distributions in large scale systems. The following topics are of interest to the workshop (this list is not exhaustive):

  • Semi-supervised learning for WSC
  • Transfer learning for WSC
  • Multi-task learning for WSC
  • Deep learning approaches to WSC
  • Clustering and hierarchy refinement
  • Mining large scale hierarchical category systems
  • Large scale classification for e-commerce
  • Budget learning for large scale classification and clustering
  • Parallel implementations of large scale classification and clustering systems

Workshop format

The workshop is intended for one day with two presentation formats: oral presentations (each of 30 minutes including questions) or poster presentations (presented in a dedicated session).

Important dates

  • Paper submission - December 23
  • Notification - January 10
  • Camera ready paper - January 20
  • Workshop - February 28


Massih-Reza Amini, LIG, Grenoble, France
Ion Androutsopoulos, AUEB, Athens, Greece
Thierry Artières, LIP6, Paris, France
Patrick Gallinari, LIP6, Paris, France
Eric Gaussier, LIG, Grenoble, France
George Paliouras, NCSR "Demokritos", Athens, Greece
Ioannis Partalas, LIG, Grenoble, France