Web Spam Challenge 2008: Feature vectors

As in previous versions, we provide a set of pre-computed features extracted from the contents and links in the collection. These features may be used by the participant teams in addition to any automatic technique they choose to use.

The feature vectors as comma-separated text files are available in:

Features in Matlab format

You can also download the features in matlab format. Each feature set is divided into two files: a small one for the hosts with labels for training, and a large one for the unlabeled hosts for computing the predictions:

Features in ARFF format

You can also download the features in arff format for Weka. Each feature set is divided into two files: a small one for the hosts with labels for training, and a large one for the unlabeled hosts for computing the predictions:

If you have any comment or suggestion about these files, or you find any problems with them, please contact ludovic [dot] denoyer [at] lip6 [dot] fr