Submitting predictions

The predictions you submit for the challenge can use all of the labels in SET 1 that we are providing as training data. Predictions are expected to be produced by fully automatic systems. We do not allow participants to manually modify the predictions obtained by their systems.

Data format

To submit the predictions, create a comma-separated plain text file containing one line per host, including the hostname, your predicted label ("nonspam" or "spam"), and the probability your model predicts of the host being spam. Use 0.0 for nonspam and 1.0 for spam if your model does not generate such probabilities.


#hostname,prediction,probability_spam
alex.crystalmark.co.uk,spam,0.9000
alexchamberlin.mysite.wanadoo-members.co.uk,nonspam,0.1750
alexwy.20six.co.uk,nonspam,0.0520
alfieplastering.mysite.wanadoo-members.co.uk,spam,0.7890
...

Note that you do not need to provide predictions for the labeled elements of the dataset. The evaluation of the participating entries will be based on the labels provided for (a subset of) the elements that were not labeled.

Submitting your predictions

It is good to include standard validation results in the abstract accompanying your submission (e.g.: indicating the accuracy obtained by holding a part of the data as testing set and/or cross-validating), but remember that the predictions you submit can (and perhaps should) be obtained using all of the labels you have as training set.

Something similar holds for any TrustRank/BadRank/Neighborhood-based feature you want to use. For validating, some of the labels can be held while generating such a feature, but for the predictions you submit for the challenge, you can use all of the labels you have for generating the feature.

A maximum of two sets of predictions per participant team will be allowed. If you are submitting two sets of predictions, please submit each of them as a separate entry.

Submissions of predictions to the Web Spam Challenge must be accompanied by a 1-page PDF abstract (template) containing a high-level description of the system used for generating such predictions and the data sources used by your system.

Submit your predictions using EasyChair. Submit a 1-page PDF abstract (template) containing a high-level description of the system used for generating the predictions and the data sources used by your system. Attach to your submission a .txt containing the predictions in the format described above.

Thank you for participating!