Web Spam Challenge 2008: Corpus
The dataset (contents, links, and labels) can be downloaded from:
It is based on a crawl of .UK done on May 2007.
2/3 of the labels have been released for training, and 1/3 of the labels are being held for testing.
The dataset (contents, links, and labels) can be downloaded from:
It is based on a crawl of .UK done on May 2007.
2/3 of the labels have been released for training, and 1/3 of the labels are being held for testing.