Brian
2007-11-08 22:58:42 UTC
Several collaborators and I are preparing to expand on previous work to
automatically ascertain the quality of Wikipedia articles on the English
Wikipedia (presented at Wikimania '07 [0]). PageRank is Google's hallmark
quality metric, and the foundation actually has access to these numbers
through the Google Webmaster Tools website. If a foundation representative
were to create a Google account and verify that they were a "webmaster,"
they could download the PageRank for every article on the English Wikipedia
in a convenient tabular format. This data would likely serve as a fantastic
predictor. I would also like to compare the Google-computed PageRank to the
PageRank computed via Wikipedia's internal link structure. I don't see any
privacy implications in releasing this data. It also doesn't seem to help
spammers much, as they already know the pages that have a very high
PageRank, and we include rel="nofollow" on outbound links. Nonetheless, I
would of course be willing to keep the data private.
This would only take a few minutes if it were approved. Is anyone out there
who has the power to make it happen?
Cheers :)
Brian
[0]
http://upload.wikimedia.org/wikipedia/wikimania2007/d/d3/RassbachPincockMingus07.pdf
automatically ascertain the quality of Wikipedia articles on the English
Wikipedia (presented at Wikimania '07 [0]). PageRank is Google's hallmark
quality metric, and the foundation actually has access to these numbers
through the Google Webmaster Tools website. If a foundation representative
were to create a Google account and verify that they were a "webmaster,"
they could download the PageRank for every article on the English Wikipedia
in a convenient tabular format. This data would likely serve as a fantastic
predictor. I would also like to compare the Google-computed PageRank to the
PageRank computed via Wikipedia's internal link structure. I don't see any
privacy implications in releasing this data. It also doesn't seem to help
spammers much, as they already know the pages that have a very high
PageRank, and we include rel="nofollow" on outbound links. Nonetheless, I
would of course be willing to keep the data private.
This would only take a few minutes if it were approved. Is anyone out there
who has the power to make it happen?
Cheers :)
Brian
[0]
http://upload.wikimedia.org/wikipedia/wikimania2007/d/d3/RassbachPincockMingus07.pdf