Main.PhaseIIISubmission History

Hide minor edits - Show changes to markup

March 06, 2008, at 02:34 PM by 84.88.76.49 -
Changed lines 24-27 from:

Submissions of predictions to the Track I of the Web Spam Challenge must be accompanied by a 1-page PDF abstract (template) containing a high-level description of the system used for generating such predictions and the data sources used by your system.

It would be very relevant to include standard validation results in the abstract accompanying your submission (e.g.: indicating the accuracy obtained by holding a part of the data as testing set and/or cross-validating), but remember that the predictions you submit can (and perhaps should) be obtained using all of the labels you have as training set.

to:

It is good to include standard validation results in the abstract accompanying your submission (e.g.: indicating the accuracy obtained by holding a part of the data as testing set and/or cross-validating), but remember that the predictions you submit can (and perhaps should) be obtained using all of the labels you have as training set.

Changed lines 30-31 from:

Submit your predictions using EasyChair. Attach to your 1-page description the file with the predictions. Use extension .txt for compatibility with EasyChair.

to:

Submissions of predictions to the Web Spam Challenge must be accompanied by a 1-page PDF abstract (template) containing a high-level description of the system used for generating such predictions and the data sources used by your system.

Submit your predictions using EasyChair. Submit a 1-page PDF abstract (template) containing a high-level description of the system used for generating the predictions and the data sources used by your system. Attach to your submission a .txt containing the predictions in the format described above.

March 06, 2008, at 02:32 PM by 84.88.76.49 -
Changed lines 32-36 from:

Submit your predictions using EasyChair

Thank you.

to:

Submit your predictions using EasyChair. Attach to your 1-page description the file with the predictions. Use extension .txt for compatibility with EasyChair.

Thank you for participating!

March 06, 2008, at 02:31 PM by 84.88.76.49 -
Changed lines 30-36 from:

A maximum of two sets of predictions per participant team will be allowed. If you are submitting two sets of predictions, please submit each of them as a separate entry. Thank you.

to:

A maximum of two sets of predictions per participant team will be allowed. If you are submitting two sets of predictions, please submit each of them as a separate entry.

Submit your predictions using EasyChair

Thank you.

February 25, 2008, at 12:40 PM by 216.145.54.158 -
Changed lines 3-6 from:

Predictions are expected to be produced by fully automatic systems. We do not allow participants to manually modify the predictions obtained by the systems. The use of external data sources is allowed, as long as you list such sources in the abstract accompanying your submission.

The predictions you submit for the challenge can use all of the labels in SET 1 that we are providing as training data.

to:

The predictions you submit for the challenge can use all of the labels in SET 1 that we are providing as training data. Predictions are expected to be produced by fully automatic systems. We do not allow participants to manually modify the predictions obtained by their systems.

February 25, 2008, at 12:32 PM by 216.145.54.158 -
Added lines 1-32:

Submitting predictions

Predictions are expected to be produced by fully automatic systems. We do not allow participants to manually modify the predictions obtained by the systems. The use of external data sources is allowed, as long as you list such sources in the abstract accompanying your submission.

The predictions you submit for the challenge can use all of the labels in SET 1 that we are providing as training data.

Data format

To submit the predictions, create a comma-separated plain text file containing one line per host, including the hostname, your predicted label ("nonspam" or "spam"), and the probability your model predicts of the host being spam. Use 0.0 for nonspam and 1.0 for spam if your model does not generate such probabilities.


#hostname,prediction,probability_spam
alex.crystalmark.co.uk,spam,0.9000
alexchamberlin.mysite.wanadoo-members.co.uk,nonspam,0.1750
alexwy.20six.co.uk,nonspam,0.0520
alfieplastering.mysite.wanadoo-members.co.uk,spam,0.7890
...

Note that you do not need to provide predictions for the labeled elements of the dataset. The evaluation of the participating entries will be based on the labels provided for (a subset of) the elements that were not labeled.

Submitting your predictions

Submissions of predictions to the Track I of the Web Spam Challenge must be accompanied by a 1-page PDF abstract (template) containing a high-level description of the system used for generating such predictions and the data sources used by your system.

It would be very relevant to include standard validation results in the abstract accompanying your submission (e.g.: indicating the accuracy obtained by holding a part of the data as testing set and/or cross-validating), but remember that the predictions you submit can (and perhaps should) be obtained using all of the labels you have as training set.

Something similar holds for any TrustRank/BadRank/Neighborhood-based feature you want to use. For validating, some of the labels can be held while generating such a feature, but for the predictions you submit for the challenge, you can use all of the labels you have for generating the feature.

A maximum of two sets of predictions per participant team will be allowed. If you are submitting two sets of predictions, please submit each of them as a separate entry. Thank you.