Web Spam Challenge 2008: Feature vectors
As in previous versions, we provide a set of pre-computed features extracted from the contents and links in the collection. These features may be used by the participant teams in addition to any automatic technique they choose to use.
The feature vectors as comma-separated text files are available in:
Features in Matlab format
You can also download the features in matlab format. Each feature set is divided into two files: a small one for the hosts with labels for training, and a large one for the unlabeled hosts for computing the predictions:
- MatLab Training files (~5 MB)
- MatLab Testing files (~130 MB)
Features in ARFF format
You can also download the features in arff format for Weka. Each feature set is divided into two files: a small one for the hosts with labels for training, and a large one for the unlabeled hosts for computing the predictions:
- ARFF Training files (~5 MB)
- ARFF Testing files (~130 MB)
If you have any comment or suggestion about these files, or you find any problems with them, please contact ludovic [dot] denoyer [at] lip6 [dot] fr