From Web Spam Challenge

Main: PhaseIIICorpus

Web Spam Challenge 2008: Corpus

The dataset (contents, links, and labels) can be downloaded from:

It is based on a crawl of .UK done on May 2007.

2/3 of the labels have been released for training, and 1/3 of the labels are being held for testing.

Retrieved from
Page last modified on March 12, 2008, at 05:40 AM