Predictions are expected to be produced by fully automatic systems. We do not allow participants to manually modify the predictions obtained by the systems. The use of external data sources is allowed, as long as you list such sources in the abstract accompanying your submission.
The predictions you submit for the challenge can use all of the labeled data we are providing as training data. The testing data will be obtained during the assessment phase.
To submit the predictions, create a comma-separated plain text file containing one line per host, including the hostname, your predicted label ("nonspam" or "spam"), and the probability your model predicts of the host being spam. Use 0.0 for nonspam and 1.0 for spam if your model does not generate such probabilities.
#hostname,prediction,probability_spam alex.crystalmark.co.uk,spam,0.9000 alexchamberlin.mysite.wanadoo-members.co.uk,nonspam,0.1750 alexwy.20six.co.uk,nonspam,0.0520 alfieplastering.mysite.wanadoo-members.co.uk,spam,0.7890 ...
Note that you do not need to provide predictions for the labeled elements of the dataset. The evaluation of the participating entries will be based on the labels provided for (a subset of) the elements that were not labeled.
Submitting your predictions
Submissions of predictions to the Track I of the Web Spam Challenge must be accompanied by a 1-page PDF abstract (template) containing a high-level description of the system used for generating such predictions and the data sources used by your system.
It would be very relevant to include standard validation results in the abstract accompanying your submission (e.g.: indicating the accuracy obtained by holding a part of the data as testing set and/or cross-validating), but remember that the predictions you submit can (and perhaps should) be obtained using all of the labels you have as training set.
Something similar holds for any TrustRank/BadRank/Neighborhood-based feature you want to use. For validating, some of the labels can be held while generating such a feature, but for the predictions you submit for the challenge, you can use all of the labels you have for generating the feature.
To upload your predictions, use EasyChair. Add the file containing the predictions as an attachment with .txt extension. A maximum of two sets of predictions per participant team will be allowed. If you are submitting two sets of predictions, please upload each of them as a separate entry. Thank you.