Discussion:
Wikipedia colored according to trust
Luca de Alfaro
2007-12-19 21:36:37 UTC
Permalink
Dear All,

we have a demo at http://wiki-trust.cse.ucsc.edu/ that features the whole
English Wikipedia, as of its February 6, 2007 snapshot, colored according to
text trust.
This is the first time that even we can look at how the "trust coloring"
looks on the whole of the Wikipedia!
We would be very interested in feedback (the
wikiquality-l-RusutVdil2icGmH+5r0DM0B+***@public.gmane.org mailing list is the best place).

If you find bugs, you can email us at
http://groups.google.com/group/wiki-trust

Happy Holidays!

Luca

PS: yes, we know, some images look off. It is currently fairly difficult
for a site outside of the Wikipedia to fetch Wikipedia images correctly.

PPS: there are going to be a few planned power outages on our campus in the
next days, so if the demo is off, try again later.
Gregory Maxwell
2007-12-19 23:05:45 UTC
Permalink
Post by Luca de Alfaro
Dear All,
we have a demo at http://wiki-trust.cse.ucsc.edu/ that features the whole
English Wikipedia, as of its February 6, 2007 snapshot, colored according to
text trust.
This is the first time that even we can look at how the "trust coloring"
looks on the whole of the Wikipedia!
We would be very interested in feedback (the
Sadly it doesn't appear to contain the complete article histories.
There are a number old manually detected cases of long undetected
vandalism I had recorded that I wanted to use to gauge its
performance.
Luca de Alfaro
2007-12-19 23:10:20 UTC
Permalink
That's true. We had to truncate histories to make everything fit into a
server.
We are gaining experience in how to deal with Wikipedia information
(terabytes of it),
and we may be able to give a better demo in some time, with full histories,
but.... we beed to buy some storage first! :-)

Luca
Post by Luca de Alfaro
Post by Luca de Alfaro
Dear All,
we have a demo at http://wiki-trust.cse.ucsc.edu/ that features the
whole
Post by Luca de Alfaro
English Wikipedia, as of its February 6, 2007 snapshot, colored
according to
Post by Luca de Alfaro
text trust.
This is the first time that even we can look at how the "trust coloring"
looks on the whole of the Wikipedia!
We would be very interested in feedback (the
Sadly it doesn't appear to contain the complete article histories.
There are a number old manually detected cases of long undetected
vandalism I had recorded that I wanted to use to gauge its
performance.
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
d***@public.gmane.org
2007-12-20 00:00:18 UTC
Permalink
That's true. We had to truncate histories to make everything fit into a server.
We are gaining experience in how to deal with Wikipedia information (terabytes of it),
and we may be able to give a better demo in some time, with full histories, but.... we
need to buy some storage first! :-)
Could you possibly take a random sample of 2% of articles and examine
the full histories of those? 40,000 articles is more than enough for a
dem, and we can rig the sample to include some articles of interest if
needed.

Akash
Luca de Alfaro
2007-12-20 00:05:40 UTC
Permalink
Oh yes!
In fact, if you tell me which article titles you are interested in, I can
run those through, and load them in a secondary demo we have.
I may get around to posting the results only in early January though, as the
break is fast approaching.
Luca
Post by Luca de Alfaro
Post by Luca de Alfaro
That's true. We had to truncate histories to make everything fit into a
server.
Post by Luca de Alfaro
We are gaining experience in how to deal with Wikipedia information
(terabytes of it),
Post by Luca de Alfaro
and we may be able to give a better demo in some time, with full
histories, but.... we
Post by Luca de Alfaro
need to buy some storage first! :-)
Could you possibly take a random sample of 2% of articles and examine
the full histories of those? 40,000 articles is more than enough for a
dem, and we can rig the sample to include some articles of interest if
needed.
Akash
Brian
2007-12-20 00:07:22 UTC
Permalink
I can provide a list of the top 40,000 articles rated by quality according
to the wikipedia editorial team. A random sample is unlikely to be
interesting, as greater than 70% of articles are stubs.

http://en.wikipedia.org/wiki/Wikipedia:Version_1.0_Editorial_Team/Index
Post by Luca de Alfaro
Oh yes!
In fact, if you tell me which article titles you are interested in, I can
run those through, and load them in a secondary demo we have.
I may get around to posting the results only in early January though, as
the break is fast approaching.
Luca
Post by Luca de Alfaro
That's true. We had to truncate histories to make everything fit into
a server.
Post by Luca de Alfaro
We are gaining experience in how to deal with Wikipedia information
(terabytes of it),
Post by Luca de Alfaro
and we may be able to give a better demo in some time, with full
histories, but.... we
Post by Luca de Alfaro
need to buy some storage first! :-)
Could you possibly take a random sample of 2% of articles and examine
the full histories of those? 40,000 articles is more than enough for a
dem, and we can rig the sample to include some articles of interest if
needed.
Akash
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
Luca de Alfaro
2007-12-20 00:10:12 UTC
Permalink
Great, but I also welcome suggestions of articles where you know interesting
things have happened (I can then include both).

Luca

PS: When doing a random sample, I can select articles with > 200 revisions,
and that gets rid of the stub problem.
Post by Brian
I can provide a list of the top 40,000 articles rated by quality according
to the wikipedia editorial team. A random sample is unlikely to be
interesting, as greater than 70% of articles are stubs.
http://en.wikipedia.org/wiki/Wikipedia:Version_1.0_Editorial_Team/Index
Post by Luca de Alfaro
Oh yes!
In fact, if you tell me which article titles you are interested in, I
can run those through, and load them in a secondary demo we have.
I may get around to posting the results only in early January though, as
the break is fast approaching.
Luca
Post by Luca de Alfaro
That's true. We had to truncate histories to make everything fit
into a server.
Post by Luca de Alfaro
We are gaining experience in how to deal with Wikipedia information
(terabytes of it),
Post by Luca de Alfaro
and we may be able to give a better demo in some time, with full
histories, but.... we
Post by Luca de Alfaro
need to buy some storage first! :-)
Could you possibly take a random sample of 2% of articles and examine
the full histories of those? 40,000 articles is more than enough for a
dem, and we can rig the sample to include some articles of interest if
needed.
Akash
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
d***@public.gmane.org
2007-12-20 00:11:53 UTC
Permalink
Post by Brian
I can provide a list of the top 40,000 articles rated by quality according
to the wikipedia editorial team. A random sample is unlikely to be
interesting, as greater than 70% of articles are stubs.
Well, we don't really want the top articles, a broad range to see how
the system behaves with different levels of quality is important, but
we could certainly take the top 10,000 and put in another 10,000 at
random. At the end of the day, this is just a demo, and even 100
articles will do -- nobody on this list is going to read through the
whole 40K. If anyone

@Gregory: Could you post some of those cases to the list so that they
can be imported manually whenever Luca is available?
Brian
2007-12-20 00:18:36 UTC
Permalink
Well, here are the ids<TAB>titles for the top 2000 articles. I'll let you
deal with the random sample :)

http://pastebin.ca/824492
Post by Luca de Alfaro
Post by Brian
I can provide a list of the top 40,000 articles rated by quality
according
Post by Brian
to the wikipedia editorial team. A random sample is unlikely to be
interesting, as greater than 70% of articles are stubs.
Well, we don't really want the top articles, a broad range to see how
the system behaves with different levels of quality is important, but
we could certainly take the top 10,000 and put in another 10,000 at
random. At the end of the day, this is just a demo, and even 100
articles will do -- nobody on this list is going to read through the
whole 40K. If anyone
@Gregory: Could you post some of those cases to the list so that they
can be imported manually whenever Luca is available?
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
Daniel Arnold
2007-12-19 23:32:10 UTC
Permalink
Post by Luca de Alfaro
we have a demo at http://wiki-trust.cse.ucsc.edu/ that features the whole
English Wikipedia, as of its February 6, 2007 snapshot, colored according
to text trust.
I looked at the demo at http://wiki-trust.cse.ucsc.edu:80/index.php/Moon. Most
remarkably in this example is a whole section with a private original
research theory on "Binary planet systems". So sadly (or luckily? ;-) the
latest version in your snapshot contains a bad edit; compare it also to the
relevant edit in Wikipedia:
http://en.wikipedia.org/w/index.php?title=Moon&diff=prev&oldid=107099473

So your algorithm highlighted the wrong content. The problematic part is a bad
summary of an older origin of moon theory which is beeing described here
overly simplified with some ad-hoc-erisms and thus is made even more wrong by
the author of these lines (OT: I probably know the author of these lines from
de.wikipedia: he tried to post this and other private theories in several
astronomy articles).

Ok. How did Wikipedia work out in that case? It took a little more than an
hour to revert this. So Wikipedia was able to resolve this problem with the
current tools rather quickly. :-)

This doesn't mean we don't need your stuff. Quite the contrary. I come to some
very promising and interesting (and maybe non-obvious) use cases:

1) The (German) Wikipedia DVD.
The basis of the Wikipedia DVD is a database dump. The first Wikipedia CD and
DVD contained an "as is" snapshot transformed to the Digibib reader format of
Directmedia Publishing GmbH
(http://en.wikipedia.org/wiki/Directmedia_Publishing). However these
snapshots had the above problem with short lived nonsense content that
happened to be in the snapshot. For the DVD's up to now different trust
metrices were used in order to find the "latest acceptable article version"
out of a given snapshot. One metric was the "latest version of a trusted
user". The current DVD from November 2007 uses a "user karma system" in order
to find the latest acceptable version (see
http://en.wikipedia.org/wiki/Directmedia_Publishing if you can read German,
however the karma system doesn't get described there). So I think
that "offline Wikipedias" such as the Wikipedia DVD and Wikipedia read only
mirrors would benefit a lot from your effort in order to know which most
recent version of a given article is the one they should provide to their
readers.

2) A combination with the reviewed article version.
Several people pointed out that they fear the reviewed article version need a
lot of checks depending on configuration mode if latest flagged or current
version is shown by default. Furthermore there are different opinions which
one of both modes is the best.
How about this third "middle ground" mode: If the karma of a given article
(according to your algorithm) version falls below a certain karma threshold,
the latest version above this theshold is shown by default to anon readers if
there is no newer version flagged as reviewed.
That way anon people usually see the most recent article version and we always
can overrule the alorithm which is a good thing (TM) as you never should
blindly trust algorithms (you know otherwise people will try to trick the
algorithm, see Google PageRank).

The drop below a certain karma threshold could be highlighted via a simple
automatically added "veto" flag, which can be undone by people that can set
quality flags.

That way we would have three flags (in my favourite system): "veto", "sighted"
and "reviewed". The veto flags makes only little sense for manual application
cause a human can and should (!) do a revert but it would be very useful for
automatic things (automatic reverts are evil).

Cheers,
Arnomane
Daniel Arnold
2007-12-19 23:42:38 UTC
Permalink
Short correction of myself...
Post by Daniel Arnold
So your algorithm highlighted the wrong content.
I naturally meant: "So your algorithm correctly highlighted dubious content."
Post by Daniel Arnold
The current DVD from November 2007 uses a "user karma system" in
order to find the latest acceptable version (see
http://en.wikipedia.org/wiki/Directmedia_Publishing if you can read German,
however the karma system doesn't get described there).
Wrong copy & paste of the same URL. The right URL is:
http://blog.zeno.org/?p=87

Arnomane
Luca de Alfaro
2007-12-19 23:54:18 UTC
Permalink
Dear Daniel,

I believe you are saying that:

1. The trust coloring rightly colored orange (low-trust) some
unreliable content,
2. and the Wikipedia people were quick in reverting it.

Right?

About 1, I am delighted our methods worked in this case. Note that we also
highlight as low trust text that is by anonymous contributors. The text
will then gain trust as it is revised. Also, we color the whole article
history, so if you want to see how things evolve, you can look at that.

About 2, I am very glad that bad edits are quickly reverted; this is the
whole reason Wikipedia has worked up to now.
Still, it might be easier for editors to find content to check via the
coloring, rather than by staring at diffs.
Other uses, as you point out, are:

- Burning the content on DVDs / Flash memory (wikisticks?)
- Making feeds of high-quality revisions for elementary schools, etc
- Generally giving readers (who unlike editors do not do diffs) that
warm fuzzy feeling that "the text has been around awhile" (can this help
answer those critics who mumble that the wikipedia is "unreliable"?)
- Finding when flagged revisions are out of date (there may be a new
high-trust version later)

BTW, as the method is language-independent, we look forward to doing the
same for wikipedias in other languages.

Luca
Post by Luca de Alfaro
Post by Luca de Alfaro
we have a demo at http://wiki-trust.cse.ucsc.edu/ that features the
whole
Post by Luca de Alfaro
English Wikipedia, as of its February 6, 2007 snapshot, colored
according
Post by Luca de Alfaro
to text trust.
I looked at the demo at http://wiki-trust.cse.ucsc.edu:80/index.php/Moon. Most
remarkably in this example is a whole section with a private original
research theory on "Binary planet systems". So sadly (or luckily? ;-) the
latest version in your snapshot contains a bad edit; compare it also to the
http://en.wikipedia.org/w/index.php?title=Moon&diff=prev&oldid=107099473
So your algorithm highlighted the wrong content. The problematic part is a bad
summary of an older origin of moon theory which is beeing described here
overly simplified with some ad-hoc-erisms and thus is made even more wrong by
the author of these lines (OT: I probably know the author of these lines from
de.wikipedia: he tried to post this and other private theories in several
astronomy articles).
Ok. How did Wikipedia work out in that case? It took a little more than an
hour to revert this. So Wikipedia was able to resolve this problem with the
current tools rather quickly. :-)
This doesn't mean we don't need your stuff. Quite the contrary. I come to some
1) The (German) Wikipedia DVD.
The basis of the Wikipedia DVD is a database dump. The first Wikipedia CD and
DVD contained an "as is" snapshot transformed to the Digibib reader format of
Directmedia Publishing GmbH
(http://en.wikipedia.org/wiki/Directmedia_Publishing). However these
snapshots had the above problem with short lived nonsense content that
happened to be in the snapshot. For the DVD's up to now different trust
metrices were used in order to find the "latest acceptable article version"
out of a given snapshot. One metric was the "latest version of a trusted
user". The current DVD from November 2007 uses a "user karma system" in order
to find the latest acceptable version (see
http://en.wikipedia.org/wiki/Directmedia_Publishing if you can read German,
however the karma system doesn't get described there). So I think
that "offline Wikipedias" such as the Wikipedia DVD and Wikipedia read only
mirrors would benefit a lot from your effort in order to know which most
recent version of a given article is the one they should provide to their
readers.
2) A combination with the reviewed article version.
Several people pointed out that they fear the reviewed article version need a
lot of checks depending on configuration mode if latest flagged or current
version is shown by default. Furthermore there are different opinions which
one of both modes is the best.
How about this third "middle ground" mode: If the karma of a given article
(according to your algorithm) version falls below a certain karma threshold,
the latest version above this theshold is shown by default to anon readers if
there is no newer version flagged as reviewed.
That way anon people usually see the most recent article version and we always
can overrule the alorithm which is a good thing (TM) as you never should
blindly trust algorithms (you know otherwise people will try to trick the
algorithm, see Google PageRank).
The drop below a certain karma threshold could be highlighted via a simple
automatically added "veto" flag, which can be undone by people that can set
quality flags.
That way we would have three flags (in my favourite system): "veto", "sighted"
and "reviewed". The veto flags makes only little sense for manual application
cause a human can and should (!) do a revert but it would be very useful for
automatic things (automatic reverts are evil).
Cheers,
Arnomane
Daniel Arnold
2007-12-20 02:05:20 UTC
Permalink
Hello Luca,
Post by Luca de Alfaro
1. The trust coloring rightly colored orange (low-trust) some
unreliable content,
Yes I was lost in translation. ;-)
Post by Luca de Alfaro
2. and the Wikipedia people were quick in reverting it.
Yes.
Post by Luca de Alfaro
Note that we also highlight as low trust text that is by anonymous
contributors. The text will then gain trust as it is revised.
One possible weakness came into my mind after I also read your paper. Your
algorithm is perhapes a bit vulnerable to "sock puppets". Imagine person A
with one account and person B with two accounts. Both have a medium
reputation value for their accounts. User A edits an article with his account
4 times. All 4 subsequent edits are taken together and the article has a
maximum trust value according to the user's reputation. User B makes as well
4 edits to an article but switches between his accounts and thus "reviews"
his own edits. If I understand your algorithm correctly the sock puppeted
article is trusted more than the other one.

Quite some time ago I reflected how to avoid incentives for sock puppets in
karma systems without even knowing which accounts are sock puppets:
http://meta.wikimedia.org/wiki/Meritokratischer_Review (sadly in German ;-).
The system described there differs from your approach but the idea on how to
avoid incentives for sock puppets without even knowing who a sock puppet is
could perhapes adapted to your system.

The basic idea for a sock puppet proof metric is is that a person has only a
limited amount of time for editing (I don't consider bots cause they are
easily detectable by humans). A single person needs the same time for e.g. 4
edits (in the following I assume each edit has the same length in bytes)
regardless how much accounts are used but two different people with each 2
edits only need half of the (imaginary) time (you don't need to measure any
time untits at all).

So the maximum possible reliability person B can apply to the article with its
two accounts (let us say each acount has 2 edits = 4 total edits) has to be
the same as the one which is possible with person A's single account (4
edits). So in general two accounts with each X edits should never be able to
add more trust to an article than one person with 2*X edits (note: edit count
number is only for illustration, you can take another appropriate
contribution unit).
Post by Luca de Alfaro
About 2, I am very glad that bad edits are quickly reverted; this is the
whole reason Wikipedia has worked up to now.
Still, it might be easier for editors to find content to check via the
coloring, rather than by staring at diffs.
That's certainly true for articles not on your watchlist (or bad edits that
were forgotten and are still the latest version).
Post by Luca de Alfaro
- Finding when flagged revisions are out of date (there may be a new
high-trust version later)
Well as I said I'd love to see flagged revisions and your system combined (in
a way described by my previous mail). An automated system probably always has
some weaknesses some clever people can abuse but it is very fast, while a
hand crafted system depends on the speed of individual persons but is much
harder to fool.
Post by Luca de Alfaro
BTW, as the method is language-independent, we look forward to doing the
same for wikipedias in other languages.
Good to know. :-)

Arnomane
Brian
2007-12-20 03:07:18 UTC
Permalink
Sockpuppets? Surely this can't be more than .00000000000001% of the user
base?
Post by Daniel Arnold
Hello Luca,
Post by Luca de Alfaro
1. The trust coloring rightly colored orange (low-trust) some
unreliable content,
Yes I was lost in translation. ;-)
Post by Luca de Alfaro
2. and the Wikipedia people were quick in reverting it.
Yes.
Post by Luca de Alfaro
Note that we also highlight as low trust text that is by anonymous
contributors. The text will then gain trust as it is revised.
One possible weakness came into my mind after I also read your paper. Your
algorithm is perhapes a bit vulnerable to "sock puppets". Imagine person A
with one account and person B with two accounts. Both have a medium
reputation value for their accounts. User A edits an article with his account
4 times. All 4 subsequent edits are taken together and the article has a
maximum trust value according to the user's reputation. User B makes as well
4 edits to an article but switches between his accounts and thus "reviews"
his own edits. If I understand your algorithm correctly the sock puppeted
article is trusted more than the other one.
Quite some time ago I reflected how to avoid incentives for sock puppets in
http://meta.wikimedia.org/wiki/Meritokratischer_Review (sadly in German ;-).
The system described there differs from your approach but the idea on how to
avoid incentives for sock puppets without even knowing who a sock puppet is
could perhapes adapted to your system.
The basic idea for a sock puppet proof metric is is that a person has only a
limited amount of time for editing (I don't consider bots cause they are
easily detectable by humans). A single person needs the same time for e.g. 4
edits (in the following I assume each edit has the same length in bytes)
regardless how much accounts are used but two different people with each 2
edits only need half of the (imaginary) time (you don't need to measure any
time untits at all).
So the maximum possible reliability person B can apply to the article with its
two accounts (let us say each acount has 2 edits = 4 total edits) has to be
the same as the one which is possible with person A's single account (4
edits). So in general two accounts with each X edits should never be able to
add more trust to an article than one person with 2*X edits (note: edit count
number is only for illustration, you can take another appropriate
contribution unit).
Post by Luca de Alfaro
About 2, I am very glad that bad edits are quickly reverted; this is the
whole reason Wikipedia has worked up to now.
Still, it might be easier for editors to find content to check via the
coloring, rather than by staring at diffs.
That's certainly true for articles not on your watchlist (or bad edits that
were forgotten and are still the latest version).
Post by Luca de Alfaro
- Finding when flagged revisions are out of date (there may be a new
high-trust version later)
Well as I said I'd love to see flagged revisions and your system combined (in
a way described by my previous mail). An automated system probably always has
some weaknesses some clever people can abuse but it is very fast, while a
hand crafted system depends on the speed of individual persons but is much
harder to fool.
Post by Luca de Alfaro
BTW, as the method is language-independent, we look forward to doing the
same for wikipedias in other languages.
Good to know. :-)
Arnomane
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
d***@public.gmane.org
2007-12-20 03:13:52 UTC
Permalink
Post by Brian
Sockpuppets? Surely this can't be more than .00000000000001% of the user
base?
Are you suggesting we have 0.000000006 sockpuppets on the English Wikipedia? ;)

That's a bit optimistic. I'd aim for more like 0.001%, which gives us
e.g. just under 100 to deal with on enwiki. The problem with socks is
that their impact on quality is potentially many times that of a
typical user, and hence deserve our attention.
Luca de Alfaro
2007-12-20 03:44:14 UTC
Permalink
Daniel is making some very good points.

Our current algorithm is vulnerable to two kinds of attacks:

- Sock puppets
- People who split an edit into many smaller ones, done with sock
puppets or not, in order to raise the trust of text.

We think we know how to fix or at least mitigate both problems. This is why
I say that a "real-time" system that colors revisions as they are made is a
couple of months (I hope) away. The challenge is not so much to reorganize
the code to work from wikipedia dumps to real-time edits. The challenge for
us is to analyze, implement, and quantify the performance of versions of the
algorithms that are resistant to attack. For those of you who have checked
our papers, you would have seen that not only we propose algorithms, but we
do extensive performance studies on how good the algorithms are. We will
want to do the same for the algorithms for fighting sock puppets.

About the proposal by Daniel: time alone does not cover our full set of
concerns.
I can every day use identity A to erase some good text, and identity B to
put it back in. Then, the reputation of B would grow a bit every day, even
though B did not do much effort.
We are thinking of some other solutions... but please forgive us for keeping
this to ourselves a little bit longer... we would like to have a chance to
do a full study before shooting our mouths off...

Luca
Post by Daniel Arnold
Hello Luca,
Post by Luca de Alfaro
1. The trust coloring rightly colored orange (low-trust) some
unreliable content,
Yes I was lost in translation. ;-)
Post by Luca de Alfaro
2. and the Wikipedia people were quick in reverting it.
Yes.
Post by Luca de Alfaro
Note that we also highlight as low trust text that is by anonymous
contributors. The text will then gain trust as it is revised.
One possible weakness came into my mind after I also read your paper. Your
algorithm is perhapes a bit vulnerable to "sock puppets". Imagine person A
with one account and person B with two accounts. Both have a medium
reputation value for their accounts. User A edits an article with his account
4 times. All 4 subsequent edits are taken together and the article has a
maximum trust value according to the user's reputation. User B makes as well
4 edits to an article but switches between his accounts and thus "reviews"
his own edits. If I understand your algorithm correctly the sock puppeted
article is trusted more than the other one.
Quite some time ago I reflected how to avoid incentives for sock puppets in
http://meta.wikimedia.org/wiki/Meritokratischer_Review (sadly in German ;-).
The system described there differs from your approach but the idea on how to
avoid incentives for sock puppets without even knowing who a sock puppet is
could perhapes adapted to your system.
The basic idea for a sock puppet proof metric is is that a person has only a
limited amount of time for editing (I don't consider bots cause they are
easily detectable by humans). A single person needs the same time for e.g. 4
edits (in the following I assume each edit has the same length in bytes)
regardless how much accounts are used but two different people with each 2
edits only need half of the (imaginary) time (you don't need to measure any
time untits at all).
So the maximum possible reliability person B can apply to the article with its
two accounts (let us say each acount has 2 edits = 4 total edits) has to be
the same as the one which is possible with person A's single account (4
edits). So in general two accounts with each X edits should never be able to
add more trust to an article than one person with 2*X edits (note: edit count
number is only for illustration, you can take another appropriate
contribution unit).
Post by Luca de Alfaro
About 2, I am very glad that bad edits are quickly reverted; this is the
whole reason Wikipedia has worked up to now.
Still, it might be easier for editors to find content to check via the
coloring, rather than by staring at diffs.
That's certainly true for articles not on your watchlist (or bad edits that
were forgotten and are still the latest version).
Post by Luca de Alfaro
- Finding when flagged revisions are out of date (there may be a new
high-trust version later)
Well as I said I'd love to see flagged revisions and your system combined (in
a way described by my previous mail). An automated system probably always has
some weaknesses some clever people can abuse but it is very fast, while a
hand crafted system depends on the speed of individual persons but is much
harder to fool.
Post by Luca de Alfaro
BTW, as the method is language-independent, we look forward to doing the
same for wikipedias in other languages.
Good to know. :-)
Arnomane
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
Aaron Schulz
2007-12-20 09:03:34 UTC
Permalink
Having looked at several pages there, socks don't seem much of a problem. Either it would take several very old "high-trusted" socks to edit or many many new socks to get an article to slight orange/white levels. It takes a good 5-7 edits of mainly average users to get out of deep orange into near white.

What does bother me just a little is that all kinds minor grammar/spelling fixing, tagging, categorizing, ect...edits seem to be made for articles. Actually, when I wrote a JS regexp based history stats tool, I noticed this a while ago. Some random IP/new users adds a chuck of text or starts and article, then some users and bots make from 5-10 really minor edits (style/category/tagging stuff), and then occasionally some actual content edits are made. The Article Trust code keeps bumping up the trust for pages. Looking at sample pages there, it doesn't appear to be enough to get vandalism to near-white, which is good. My only worry is the trollish user that adds POV-vandalism and subtle vandalism (switching dates) will have it's trust get too because bunch of users make maintenance edits afterwards. Much of our main user base just does maintenance, so they tend to have high "reputation" trust and make many such edits.

I think what would help, behind excluding bots (which is really a no-brainer in my opinion) is to add some heuristics that devalue the trust increase if someone is just make very small edits to the upper or lower extremities of a page. This would catch a lot of tag/category/interwiki link maintenance and stop it from bumping the trust to much in the page's middle.

Still it is pretty good at picking up on the garbage, so it looks promising. I'd be interesting in knowing what would happen if the newest revision with all 7+ trust was marked as a stable version for each page. Would that be good enough to pretty much be vandal-free?

-Aaron Schulz

Date: Wed, 19 Dec 2007 19:44:14 -0800
From: luca-***@public.gmane.org
To: wikiquality-l-RusutVdil2icGmH+5r0DM0B+***@public.gmane.org
Subject: Re: [Wikiquality-l] Wikipedia colored according to trust

Daniel is making some very good points.

Our current algorithm is vulnerable to two kinds of attacks:
Sock puppetsPeople who split an edit into many smaller ones, done with sock puppets or not, in order to raise the trust of text.

We think we know how to fix or at least mitigate both problems. This is why I say that a "real-time" system that colors revisions as they are made is a couple of months (I hope) away. The challenge is not so much to reorganize the code to work from wikipedia dumps to real-time edits. The challenge for us is to analyze, implement, and quantify the performance of versions of the algorithms that are resistant to attack. For those of you who have checked our papers, you would have seen that not only we propose algorithms, but we do extensive performance studies on how good the algorithms are. We will want to do the same for the algorithms for fighting sock puppets.


About the proposal by Daniel: time alone does not cover our full set of concerns.
I can every day use identity A to erase some good text, and identity B to put it back in. Then, the reputation of B would grow a bit every day, even though B did not do much effort.

We are thinking of some other solutions... but please forgive us for keeping this to ourselves a little bit longer... we would like to have a chance to do a full study before shooting our mouths off...

Luca


On Dec 19, 2007 6:05 PM, Daniel Arnold <arnomane-***@public.gmane.org> wrote:

Hello Luca,
Post by Luca de Alfaro
1. The trust coloring rightly colored orange (low-trust) some
unreliable content,
Yes I was lost in translation. ;-)
Post by Luca de Alfaro
2. and the Wikipedia people were quick in reverting it.
Yes.
Post by Luca de Alfaro
Note that we also highlight as low trust text that is by anonymous
contributors. The text will then gain trust as it is revised.
One possible weakness came into my mind after I also read your paper. Your

algorithm is perhapes a bit vulnerable to "sock puppets". Imagine person A
with one account and person B with two accounts. Both have a medium
reputation value for their accounts. User A edits an article with his account

4 times. All 4 subsequent edits are taken together and the article has a
maximum trust value according to the user's reputation. User B makes as well
4 edits to an article but switches between his accounts and thus "reviews"

his own edits. If I understand your algorithm correctly the sock puppeted
article is trusted more than the other one.

Quite some time ago I reflected how to avoid incentives for sock puppets in
karma systems without even knowing which accounts are sock puppets:

http://meta.wikimedia.org/wiki/Meritokratischer_Review (sadly in German ;-).
The system described there differs from your approach but the idea on how to

avoid incentives for sock puppets without even knowing who a sock puppet is
could perhapes adapted to your system.

The basic idea for a sock puppet proof metric is is that a person has only a
limited amount of time for editing (I don't consider bots cause they are

easily detectable by humans). A single person needs the same time for e.g. 4
edits (in the following I assume each edit has the same length in bytes)
regardless how much accounts are used but two different people with each 2

edits only need half of the (imaginary) time (you don't need to measure any
time untits at all).

So the maximum possible reliability person B can apply to the article with its
two accounts (let us say each acount has 2 edits = 4 total edits) has to be

the same as the one which is possible with person A's single account (4
edits). So in general two accounts with each X edits should never be able to
add more trust to an article than one person with 2*X edits (note: edit count

number is only for illustration, you can take another appropriate
contribution unit).
Post by Luca de Alfaro
About 2, I am very glad that bad edits are quickly reverted; this is the
whole reason Wikipedia has worked up to now.
Still, it might be easier for editors to find content to check via the
coloring, rather than by staring at diffs.
That's certainly true for articles not on your watchlist (or bad edits that

were forgotten and are still the latest version).
Post by Luca de Alfaro
- Finding when flagged revisions are out of date (there may be a new
high-trust version later)
Well as I said I'd love to see flagged revisions and your system combined (in

a way described by my previous mail). An automated system probably always has
some weaknesses some clever people can abuse but it is very fast, while a
hand crafted system depends on the speed of individual persons but is much

harder to fool.
Post by Luca de Alfaro
BTW, as the method is language-independent, we look forward to doing the
same for wikipedias in other languages.
Good to know. :-)

Arnomane


_______________________________________________
Wikiquality-l mailing list
Wikiquality-l-RusutVdil2icGmH+5r0DM0B+***@public.gmane.org

http://lists.wikimedia.org/mailman/listinfo/wikiquality-l




_________________________________________________________________
The best games are on Xbox 360. Click here for a special offer on an Xbox 360 Console.
http://www.xbox.com/en-US/hardware/wheretobuy/
John Erling Blad
2007-12-21 13:57:24 UTC
Permalink
A very efficient stop on bumping the trust is to only bump the trust
metrics when actual content is added. When I experimented with similar
code I found that Shannon entropy could be used as a measure, but then,
the entropy of weird looking iw-links would bump the metrics. To adjust
for that I had to introduce some logic, and then I got some asymmetry
and was once more susceptible for sock puppetry. This should be avoidable.

One possible solution is to only use Shannon entropy for words that
exists in a vocabulary, and every other word are given a flat rating.
This seems to work well.

John E
Post by Aaron Schulz
Having looked at several pages there, socks don't seem much of a
problem. Either it would take several very old "high-trusted" socks to
edit or many many new socks to get an article to slight orange/white
levels. It takes a good 5-7 edits of mainly average users to get out
of deep orange into near white.
What does bother me just a little is that all kinds minor
grammar/spelling fixing, tagging, categorizing, ect...edits seem to be
made for articles. Actually, when I wrote a JS regexp based history
stats tool, I noticed this a while ago. Some random IP/new users adds
a chuck of text or starts and article, then some users and bots make
from 5-10 really minor edits (style/category/tagging stuff), and then
occasionally some actual content edits are made. The Article Trust
code keeps bumping up the trust for pages. Looking at sample pages
there, it doesn't appear to be enough to get vandalism to near-white,
which is good. My only worry is the trollish user that adds
POV-vandalism and subtle vandalism (switching dates) will have it's
trust get too because bunch of users make maintenance edits
afterwards. Much of our main user base just does maintenance, so they
tend to have high "reputation" trust and make many such edits.
I think what would help, behind excluding bots (which is really a
no-brainer in my opinion) is to add some heuristics that devalue the
trust increase if someone is just make very small edits to the upper
or lower extremities of a page. This would catch a lot of
tag/category/interwiki link maintenance and stop it from bumping the
trust to much in the page's middle.
Still it is pretty good at picking up on the garbage, so it looks
promising. I'd be interesting in knowing what would happen if the
newest revision with all 7+ trust was marked as a stable version for
each page. Would that be good enough to pretty much be vandal-free?
-Aaron Schulz
------------------------------------------------------------------------
Date: Wed, 19 Dec 2007 19:44:14 -0800
Subject: Re: [Wikiquality-l] Wikipedia colored according to trust
Daniel is making some very good points.
* Sock puppets
* People who split an edit into many smaller ones, done with
sock puppets or not, in order to raise the trust of text.
We think we know how to fix or at least mitigate both problems.
This is why I say that a "real-time" system that colors revisions
as they are made is a couple of months (I hope) away. The
challenge is not so much to reorganize the code to work from
wikipedia dumps to real-time edits. The challenge for us is to
analyze, implement, and quantify the performance of versions of
the algorithms that are resistant to attack. For those of you who
have checked our papers, you would have seen that not only we
propose algorithms, but we do extensive performance studies on how
good the algorithms are. We will want to do the same for the
algorithms for fighting sock puppets.
About the proposal by Daniel: time alone does not cover our full set of concerns.
I can every day use identity A to erase some good text, and
identity B to put it back in. Then, the reputation of B would grow
a bit every day, even though B did not do much effort.
We are thinking of some other solutions... but please forgive us
for keeping this to ourselves a little bit longer... we would like
to have a chance to do a full study before shooting our mouths off...
Luca
Hello Luca,
Post by Luca de Alfaro
1. The trust coloring rightly colored orange (low-trust) some
unreliable content,
Yes I was lost in translation. ;-)
Post by Luca de Alfaro
2. and the Wikipedia people were quick in reverting it.
Yes.
Post by Luca de Alfaro
Note that we also highlight as low trust text that is by
anonymous
Post by Luca de Alfaro
contributors. The text will then gain trust as it is revised.
One possible weakness came into my mind after I also read your paper. Your
algorithm is perhapes a bit vulnerable to "sock puppets". Imagine person A
with one account and person B with two accounts. Both have a medium
reputation value for their accounts. User A edits an article
with his account
4 times. All 4 subsequent edits are taken together and the article has a
maximum trust value according to the user's reputation. User B
makes as well
4 edits to an article but switches between his accounts and thus "reviews"
his own edits. If I understand your algorithm correctly the sock puppeted
article is trusted more than the other one.
Quite some time ago I reflected how to avoid incentives for sock puppets in
http://meta.wikimedia.org/wiki/Meritokratischer_Review (sadly
in German ;-).
The system described there differs from your approach but the
idea on how to
avoid incentives for sock puppets without even knowing who a sock puppet is
could perhapes adapted to your system.
The basic idea for a sock puppet proof metric is is that a
person has only a
limited amount of time for editing (I don't consider bots cause they are
easily detectable by humans). A single person needs the same
time for e.g. 4
edits (in the following I assume each edit has the same length in bytes)
regardless how much accounts are used but two different people with each 2
edits only need half of the (imaginary) time (you don't need to measure any
time untits at all).
So the maximum possible reliability person B can apply to the
article with its
two accounts (let us say each acount has 2 edits = 4 total edits) has to be
the same as the one which is possible with person A's single account (4
edits). So in general two accounts with each X edits should
never be able to
add more trust to an article than one person with 2*X edits
(note: edit count
number is only for illustration, you can take another appropriate
contribution unit).
Post by Luca de Alfaro
About 2, I am very glad that bad edits are quickly reverted;
this is the
Post by Luca de Alfaro
whole reason Wikipedia has worked up to now.
Still, it might be easier for editors to find content to
check via the
Post by Luca de Alfaro
coloring, rather than by staring at diffs.
That's certainly true for articles not on your watchlist (or bad edits that
were forgotten and are still the latest version).
Post by Luca de Alfaro
- Finding when flagged revisions are out of date (there
may be a new
Post by Luca de Alfaro
high-trust version later)
Well as I said I'd love to see flagged revisions and your
system combined (in
a way described by my previous mail). An automated system
probably always has
some weaknesses some clever people can abuse but it is very fast, while a
hand crafted system depends on the speed of individual persons but is much
harder to fool.
Post by Luca de Alfaro
BTW, as the method is language-independent, we look forward
to doing the
Post by Luca de Alfaro
same for wikipedias in other languages.
Good to know. :-)
Arnomane
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
------------------------------------------------------------------------
The best games are on Xbox 360. Click here for a special offer on an
Xbox 360 Console. Get it now!
<http://www.xbox.com/en-US/hardware/wheretobuy/>
------------------------------------------------------------------------
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
Daniel Arnold
2007-12-20 12:45:21 UTC
Permalink
Post by Aaron Schulz
For those of you who have checked our papers, you would have seen that not
only we propose algorithms, but we do extensive performance studies on how
good the algorithms are. We will want to do the same for the algorithms for
fighting sock puppets.
I liked it very much that you spent a lot of thoughts on the robustness of
your algrorithm (I think this is one of its key advantages over so many other
naive karma systems out there) and that in this stage it is already
restistant to many attacks. So I am confident that sock puppet and minor edit
attack (Aaron's maintenance edit analysis is part of it) can be solved as
well by you. :-)
Post by Aaron Schulz
About the proposal by Daniel: time alone does not cover our full set of
concerns.
I can every day use identity A to erase some good text, and identity B to
put it back in. Then, the reputation of B would grow a bit every day, even
though B did not do much effort.
That's true. The reason is that it takes less effort (= personal work time) to
remove X bytes than to add them. Perhapes some weight factor on different
kinds of edits can avoid this.

Arnomane
John Erling Blad
2007-12-21 13:44:36 UTC
Permalink
This kind of problem will arise from all systems where an asymmetry is
introduced. One part can then fight the other part and win out, due tor
the ordering of the fight. The problem is rather difficult to solve as
the system must be symmetrical not only between two consecutive edits
but edits split out over several postings, merged, intermixed with other
edits etc.

John E
Post by Luca de Alfaro
Daniel is making some very good points.
* Sock puppets
* People who split an edit into many smaller ones, done with sock
puppets or not, in order to raise the trust of text.
We think we know how to fix or at least mitigate both problems. This
is why I say that a "real-time" system that colors revisions as they
are made is a couple of months (I hope) away. The challenge is not so
much to reorganize the code to work from wikipedia dumps to real-time
edits. The challenge for us is to analyze, implement, and quantify
the performance of versions of the algorithms that are resistant to
attack. For those of you who have checked our papers, you would have
seen that not only we propose algorithms, but we do extensive
performance studies on how good the algorithms are. We will want to
do the same for the algorithms for fighting sock puppets.
About the proposal by Daniel: time alone does not cover our full set
of concerns.
I can every day use identity A to erase some good text, and identity B
to put it back in. Then, the reputation of B would grow a bit every
day, even though B did not do much effort.
We are thinking of some other solutions... but please forgive us for
keeping this to ourselves a little bit longer... we would like to have
a chance to do a full study before shooting our mouths off...
Luca
Hello Luca,
Post by Luca de Alfaro
1. The trust coloring rightly colored orange (low-trust) some
unreliable content,
Yes I was lost in translation. ;-)
Post by Luca de Alfaro
2. and the Wikipedia people were quick in reverting it.
Yes.
Post by Luca de Alfaro
Note that we also highlight as low trust text that is by anonymous
contributors. The text will then gain trust as it is revised.
One possible weakness came into my mind after I also read your paper. Your
algorithm is perhapes a bit vulnerable to "sock puppets". Imagine person A
with one account and person B with two accounts. Both have a medium
reputation value for their accounts. User A edits an article with his account
4 times. All 4 subsequent edits are taken together and the article has a
maximum trust value according to the user's reputation. User B makes as well
4 edits to an article but switches between his accounts and thus "reviews"
his own edits. If I understand your algorithm correctly the sock puppeted
article is trusted more than the other one.
Quite some time ago I reflected how to avoid incentives for sock puppets in
http://meta.wikimedia.org/wiki/Meritokratischer_Review (sadly in German ;-).
The system described there differs from your approach but the idea on how to
avoid incentives for sock puppets without even knowing who a sock puppet is
could perhapes adapted to your system.
The basic idea for a sock puppet proof metric is is that a person has only a
limited amount of time for editing (I don't consider bots cause they are
easily detectable by humans). A single person needs the same time for e.g. 4
edits (in the following I assume each edit has the same length in bytes)
regardless how much accounts are used but two different people with each 2
edits only need half of the (imaginary) time (you don't need to measure any
time untits at all).
So the maximum possible reliability person B can apply to the article with its
two accounts (let us say each acount has 2 edits = 4 total edits) has to be
the same as the one which is possible with person A's single account (4
edits). So in general two accounts with each X edits should never be able to
add more trust to an article than one person with 2*X edits (note: edit count
number is only for illustration, you can take another appropriate
contribution unit).
Post by Luca de Alfaro
About 2, I am very glad that bad edits are quickly reverted;
this is the
Post by Luca de Alfaro
whole reason Wikipedia has worked up to now.
Still, it might be easier for editors to find content to check
via the
Post by Luca de Alfaro
coloring, rather than by staring at diffs.
That's certainly true for articles not on your watchlist (or bad edits that
were forgotten and are still the latest version).
Post by Luca de Alfaro
- Finding when flagged revisions are out of date (there may be
a new
Post by Luca de Alfaro
high-trust version later)
Well as I said I'd love to see flagged revisions and your system combined (in
a way described by my previous mail). An automated system probably always has
some weaknesses some clever people can abuse but it is very fast, while a
hand crafted system depends on the speed of individual persons but is much
harder to fool.
Post by Luca de Alfaro
BTW, as the method is language-independent, we look forward to
doing the
Post by Luca de Alfaro
same for wikipedias in other languages.
Good to know. :-)
Arnomane
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
------------------------------------------------------------------------
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
David Gerard
2007-12-20 13:06:04 UTC
Permalink
Post by Luca de Alfaro
we have a demo at http://wiki-trust.cse.ucsc.edu/ that features the whole
English Wikipedia, as of its February 6, 2007 snapshot, colored according to
text trust.
This is the first time that even we can look at how the "trust coloring"
looks on the whole of the Wikipedia!
We would be very interested in feedback (the
place).
If you find bugs, you can email us at
http://groups.google.com/group/wiki-trust
Is this something suitable to announce to wikien-l as well?

(Is it something that could survive a Slashdotting?)


- d.
d***@public.gmane.org
2007-12-20 13:10:41 UTC
Permalink
Post by David Gerard
Is this something suitable to announce to wikien-l as well?
(Is it something that could survive a Slashdotting?)
Not sure of Luca's timezone, but I would certainly stay away from
announcing to wikien-l at the moment -- doubt it can handle a
slashdotting, and something like this would be dugg and reddited as
well. Before we announce it to wikien-l we can probably arrange for a
dedicated server and some extra caching measures.
Luca de Alfaro
2007-12-20 15:47:56 UTC
Permalink
Thanks for your concerns!... we were slashdotted in August, and our servers
went down then (it was a 1 CPU machine).
Now the server uses memcached and squid, and it is an 8 CPU machine.
Also, the campus is making us go down via power cuts (they are replacing
some feeds somewhere), so in a sense, why also not go down due to
Slashdotting, which is more honor and more fun? :-)

So perhaps I should announce on wikien-l ?

Luca
Post by d***@public.gmane.org
Post by David Gerard
Is this something suitable to announce to wikien-l as well?
(Is it something that could survive a Slashdotting?)
Not sure of Luca's timezone, but I would certainly stay away from
announcing to wikien-l at the moment -- doubt it can handle a
slashdotting, and something like this would be dugg and reddited as
well. Before we announce it to wikien-l we can probably arrange for a
dedicated server and some extra caching measures.
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
d***@public.gmane.org
2007-12-20 15:54:35 UTC
Permalink
Post by Luca de Alfaro
Thanks for your concerns!... we were slashdotted in August, and our servers
went down then (it was a 1 CPU machine).
Now the server uses memcached and squid, and it is an 8 CPU machine.
Also, the campus is making us go down via power cuts (they are replacing
some feeds somewhere), so in a sense, why also not go down due to
Slashdotting, which is more honor and more fun? :-)
So perhaps I should announce on wikien-l ?
Luca
If you've got 8 CPUs and you're running memcached, nothing we procure
(besides an entire server farm) could be more fun to observe crashing.
Go for it, and keep some traffic logs :)

For added fun, contact some of the top Digg members directly and ask
them to go all out in getting it popular on Digg.com. Instant masses
of traffic :)

---
Akash
David Gerard
2007-12-20 16:09:56 UTC
Permalink
Post by d***@public.gmane.org
Post by Luca de Alfaro
some feeds somewhere), so in a sense, why also not go down due to
Slashdotting, which is more honor and more fun? :-)
So perhaps I should announce on wikien-l ?
If you've got 8 CPUs and you're running memcached, nothing we procure
(besides an entire server farm) could be more fun to observe crashing.
Go for it, and keep some traffic logs :)
For added fun, contact some of the top Digg members directly and ask
them to go all out in getting it popular on Digg.com. Instant masses
of traffic :)
It's in the Slashdot firehose:

http://slashdot.org/firehose.pl?op=view&id=433426

So shall we vote it up or down? ;-)


- d.
d***@public.gmane.org
2007-12-20 16:20:08 UTC
Permalink
It's in the Slashdot firehose <snip>
So shall we vote it up or down? ;-)
Is that a rhetorical question? :)

---
Akash
Daniel Arnold
2007-12-20 16:40:41 UTC
Permalink
Post by d***@public.gmane.org
It's in the Slashdot firehose <snip>
So shall we vote it up or down? ;-)
Is that a rhetorical question? :)
I personally don't like press coverage on things that are not ready yet and
months away from any practical impact to daily Wikipedia life. We have had
*too* many news articles on "soon-to-come" stable versions/Single
Login/whatever...

Arnomane
d***@public.gmane.org
2007-12-20 16:46:25 UTC
Permalink
Post by Daniel Arnold
I personally don't like press coverage on things that are not ready yet and
months away from any practical impact to daily Wikipedia life. We have had
*too* many news articles on "soon-to-come" stable versions/Single
Login/whatever...
If the press wish to misrepresent and misreport on the activities of
the WMF projects, there is little we can do about it. This, however,
is simply an attempt to get the word out to the community -- not the
press -- that these systems are on their way.

In the past we have not had effective demonstrations of our stable
versioning / SSO plans; this demonstration does not fall short in this
manner. At any rate, news of the trust system should go some way to
restoring general public faith in Wikipedia after the thoroughly
exaggerated controversy over secret mailing lists, conspiracies and
Durova.
Daniel Arnold
2007-12-20 17:29:11 UTC
Permalink
This, however, is simply an attempt to get the word out to the community --
not the press -- that these systems are on their way.
There are mechanism inside the community to inform each other (like the
mentioned wikien-l). I don't want to reach the community via external news
but I admit it is a very difficult thing to keep the right balance if you
want a certain project to keep its developement pace.
In the past we have not had effective demonstrations of our stable
versioning / SSO plans; this demonstration does not fall short in this
manner.
By the way have a look at http://test.wikipedia.org/wiki/Meow. Stable versions
have at least reached the official test wiki (found via
http://de.wikipedia.org/wiki/Wikipedia:Projektneuheiten#20._Dezember).
At any rate, news of the trust system should go some way to
restoring general public faith in Wikipedia after the thoroughly
exaggerated controversy over secret mailing lists, conspiracies and
Durova.
These specific topics are mainly a problem of en.wikipedia, not *.wikipedia.
de.wikipedia has other troubles and one of it is the problem that often
german language news agencies are saying "they promised $technical-novelties
long time ago, but failed to keep that promise up to now". And this is at
least partly our fault cause we quite often said curious reporters what we
are dreaming about and not so much about novelties that currently happened
(for example the Gadget extension from Duesentrieb has quite some impact on
editors, not on readers, but it is probably too techie to be of interest for
news people ;-).

Arnomane
David Gerard
2007-12-20 17:34:06 UTC
Permalink
Post by Daniel Arnold
These specific topics are mainly a problem of en.wikipedia, not *.wikipedia.
de.wikipedia has other troubles and one of it is the problem that often
german language news agencies are saying "they promised $technical-novelties
long time ago, but failed to keep that promise up to now". And this is at
least partly our fault cause we quite often said curious reporters what we
are dreaming about and not so much about novelties that currently happened
(for example the Gadget extension from Duesentrieb has quite some impact on
editors, not on readers, but it is probably too techie to be of interest for
news people ;-).
With the delays, a lot of the problem is the Foundation's two
technical employees, Brion and Tim, being waaaay busy with fundraiser
and moving (St Petersburg to San Francisco). So when people ask, I've
been saying "two technical employees, fundraiser duties, sorry for
delay, give us money and it'll be faster ;-p"


- d.
Aaron Schulz
2007-12-21 02:15:48 UTC
Permalink
I've got the software running on testwikipedia now. It also got some rapid input from Tim, as well as some fixes.

-Aaron Schulz
Date: Thu, 20 Dec 2007 17:34:06 +0000
Subject: Re: [Wikiquality-l] Wikipedia colored according to trust
Post by Daniel Arnold
These specific topics are mainly a problem of en.wikipedia, not *.wikipedia.
de.wikipedia has other troubles and one of it is the problem that often
german language news agencies are saying "they promised $technical-novelties
long time ago, but failed to keep that promise up to now". And this is at
least partly our fault cause we quite often said curious reporters what we
are dreaming about and not so much about novelties that currently happened
(for example the Gadget extension from Duesentrieb has quite some impact on
editors, not on readers, but it is probably too techie to be of interest for
news people ;-).
With the delays, a lot of the problem is the Foundation's two
technical employees, Brion and Tim, being waaaay busy with fundraiser
and moving (St Petersburg to San Francisco). So when people ask, I've
been saying "two technical employees, fundraiser duties, sorry for
delay, give us money and it'll be faster ;-p"
- d.
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
_________________________________________________________________
Don't get caught with egg on your face. Play Chicktionary!
http://club.live.com/chicktionary.aspx?icid=chick_wlhmtextlink1_dec
John Erling Blad
2007-12-21 13:35:24 UTC
Permalink
I did some checks on a few articles where very highly reputed authors
doing checks on articles, and it seems like the system isn't able to
detect very rare edits from such editors as it should. In fact it has
marked the _correct_ versions as dubious while later versions with
slight corrections from admins that are completely unfamiliar with the
matter are marked as high quality.

Not much trustworty (oh shit, is that me??)
http://wiki-trust.cse.ucsc.edu/index.php?title=Stave_church&oldid=25557785
Lots of copyedits
http://wiki-trust.cse.ucsc.edu/index.php?title=Stave_church&diff=36649796&oldid=25953416
A thrustworthy version (!)
http://wiki-trust.cse.ucsc.edu/index.php/Stave_church

This is a very central problem to this kind of trust metrics, people are
rated according to what they do at some point in time, without taking
into account who they relates to and what is they previous history in
other contexts. In the history of the Stave Church article, there is an
archaeologist anyone able to locate the person? I think it should be
possible to identify a person as an expert within a limited field of
expertise, but it isn't easy to figure out how this should be done.

There is also the problem of the persons history. If someone edits and
increases its own rating, how should that be handled? I think this
should have implications on previous edits.

John E
Post by Luca de Alfaro
Dear All,
we have a demo at http://wiki-trust.cse.ucsc.edu/ that features the
whole English Wikipedia, as of its February 6, 2007 snapshot, colored
according to text trust.
This is the first time that even we can look at how the "trust
coloring" looks on the whole of the Wikipedia!
We would be very interested in feedback (the
place).
If you find bugs, you can email us at
http://groups.google.com/group/wiki-trust
<http://groups.google.com/group/wiki-trust>
Happy Holidays!
Luca
PS: yes, we know, some images look off. It is currently fairly
difficult for a site outside of the Wikipedia to fetch Wikipedia
images correctly.
PPS: there are going to be a few planned power outages on our campus
in the next days, so if the demo is off, try again later.
------------------------------------------------------------------------
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
d***@public.gmane.org
2007-12-21 13:49:30 UTC
Permalink
Post by John Erling Blad
This is a very central problem to this kind of trust metrics, people are
rated according to what they do at some point in time, without taking
into account who they relates to and what is they previous history in
other contexts. In the history of the Stave Church article, there is an
archaeologist anyone able to locate the person? I think it should be
possible to identify a person as an expert within a limited field of
expertise, but it isn't easy to figure out how this should be done.
This is what we started the Citizendium project for. Wikipedia maintains a
degree of anonymity that forms the basis for one of the many cultures of the
WMF projects. If we start verifying qualifications, we throw this all out.

The middle ground is a peer-based "reputation" system, as found on many
forums, where for various actions peers can allocate reputation to a user.
This type of ad-hoc, informal peer-review is the only way to achieve
verification of authority without taking the Citizendium approach.

---
Akash
John Erling Blad
2007-12-21 14:20:27 UTC
Permalink
I am familiar with the Citizendium project, and if you think that I try
to argue for something similar, then you are way off. The interesting
thing is that there are people in the history of that article having
absolutely no clue on what they are writing about, and there are people
knowing what they writes about. Inspecting the trust coloring it is
apparent that the persons without any clue at all has much higher trust
metrics than the rest, bumping the trust upwards. To me this does not
seem to be what we want. We want to have a system that is much better on
identifying the actual experts from those just fuzzing around fixing
spelling errors. An article with perfect spelling can be in complete
error on the main topic, and if the system does not fix that problem, it
isn't a fix at all.

John E
Post by John Erling Blad
This is a very central problem to this kind of trust metrics, people are
rated according to what they do at some point in time, without taking
into account who they relates to and what is they previous history in
other contexts. In the history of the Stave Church article, there is an
archaeologist anyone able to locate the person? I think it should be
possible to identify a person as an expert within a limited field of
expertise, but it isn't easy to figure out how this should be done.
This is what we started the Citizendium project for. Wikipedia
maintains a degree of anonymity that forms the basis for one of the
many cultures of the WMF projects. If we start verifying
qualifications, we throw this all out.
The middle ground is a peer-based "reputation" system, as found on
many forums, where for various actions peers can allocate reputation
to a user. This type of ad-hoc, informal peer-review is the only way
to achieve verification of authority without taking the Citizendium
approach.
---
Akash
------------------------------------------------------------------------
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
Jonathan Leybovich
2007-12-21 18:01:30 UTC
Permalink
One thing that stood out for me in the small sample of articles I
examined was the flagging of innocuous changes by casual users to
correct spelling, grammar, etc. Thus a "nice-to-have" would be a
"smoothing" algorithm that ignores inconsequential changes such as
spelling corrections, etc. or the reordering of semantically-contained
units of text (for example, reordering the line items in a list w/o
changing the content of any particular line item, etc., or the
reordering of paragraphs and perhaps even sentences.) I think this
would cover 90% or more of changes that are immaterial to an article's
credibility.
Luca de Alfaro
2007-12-21 18:34:47 UTC
Permalink
If you want to pick out the malicious changes, you need to flag also small
changes.

"Sen. Hillary Clinton did *not* vote in favor of war in Iraq"

"John Doe, born in *1947*"

The ** indicates changes.

I can very well make a system that is insensitive to small changes, but then
the system would also be insensitive to many kinds of malicious tampering,
and one of my goals was to make it hard for anyone to change without leaving
at laest a minimal trace.

So it's a matter of goals, really.

Luca
Post by Jonathan Leybovich
One thing that stood out for me in the small sample of articles I
examined was the flagging of innocuous changes by casual users to
correct spelling, grammar, etc. Thus a "nice-to-have" would be a
"smoothing" algorithm that ignores inconsequential changes such as
spelling corrections, etc. or the reordering of semantically-contained
units of text (for example, reordering the line items in a list w/o
changing the content of any particular line item, etc., or the
reordering of paragraphs and perhaps even sentences.) I think this
would cover 90% or more of changes that are immaterial to an article's
credibility.
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
Aaron Schulz
2007-12-21 19:12:22 UTC
Permalink
Right. Also, we need to be clear what we want this to do. It will never be great at determining fact-checked material. What it is good at is spotting the more dubious stuff, like possible vandalism. This makes the possibility of having "most trusted" stable version as discussed earlier. Small changes not only can be big in meaning, but they still attest to the trust.

If I read a sentence to change some minor thing, I still read it. If a wrongly says "he identifies himself as bisexual" or "born in 1885" rather than 1985 in a page when I edit, I am going to revert if I catch it. Even if just making some grammar/syntax cleanup. So each time people look at stuff if still attest to the page a little bit, from a vandalism perspective.

The algorithms can be made more strict to catch more general dubious info better, but it is not that bad at that already, and the stricter it gets, the more it gets under inclusive as to what is considered unlikely to be vandalized.

-Aaron Schulz

Date: Fri, 21 Dec 2007 10:34:47 -0800
From: luca-***@public.gmane.org
To: wikiquality-l-RusutVdil2icGmH+5r0DM0B+***@public.gmane.org
Subject: Re: [Wikiquality-l] Wikipedia colored according to trust

If you want to pick out the malicious changes, you need to flag also small changes.

"Sen. Hillary Clinton did *not* vote in favor of war in Iraq"

"John Doe, born in *1947*"

The ** indicates changes.


I can very well make a system that is insensitive to small changes, but then the system would also be insensitive to many kinds of malicious tampering, and one of my goals was to make it hard for anyone to change without leaving at laest a minimal trace.


So it's a matter of goals, really.

Luca

On Dec 21, 2007 10:01 AM, Jonathan Leybovich <jleybov-***@public.gmane.org> wrote:

One thing that stood out for me in the small sample of articles I
examined was the flagging of innocuous changes by casual users to
correct spelling, grammar, etc. Thus a "nice-to-have" would be a
"smoothing" algorithm that ignores inconsequential changes such as

spelling corrections, etc. or the reordering of semantically-contained
units of text (for example, reordering the line items in a list w/o
changing the content of any particular line item, etc., or the
reordering of paragraphs and perhaps even sentences.) I think this

would cover 90% or more of changes that are immaterial to an article's
credibility.

_______________________________________________
Wikiquality-l mailing list

Wikiquality-l-RusutVdil2icGmH+5r0DM0B+***@public.gmane.org
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l





_________________________________________________________________
Get the power of Windows + Web with the new Windows Live.
http://www.windowslive.com?ocid=TXT_TAGHM_Wave2_powerofwindows_122007
John Erling Blad
2007-12-21 19:41:06 UTC
Permalink
It is wise to make a note about the fact that such systems make it
possible to deduce earlier in the mean that someone is a vandal or not,
but it can't replace a good reader that responds to an error. This
creates the rather annoying situation where a response from a casual
reader should be weighted more than non-beginners, but this makes the
system suceptible to users wanting to skew its metrics on specific users.

John E
Post by Aaron Schulz
Right. Also, we need to be clear what we want this to do. It will
never be great at determining fact-checked material. What it is good
at is spotting the more dubious stuff, like possible vandalism. This
makes the possibility of having "most trusted" stable version as
discussed earlier. Small changes not only can be big in meaning, but
they still attest to the trust.
If I read a sentence to change some minor thing, I still read it. If a
wrongly says "he identifies himself as bisexual" or "born in 1885"
rather than 1985 in a page when I edit, I am going to revert if I
catch it. Even if just making some grammar/syntax cleanup. So each
time people look at stuff if still attest to the page a little bit,
from a vandalism perspective.
The algorithms can be made more strict to catch more general dubious
info better, but it is not that bad at that already, and the stricter
it gets, the more it gets under inclusive as to what is considered
unlikely to be vandalized.
-Aaron Schulz
------------------------------------------------------------------------
Date: Fri, 21 Dec 2007 10:34:47 -0800
Subject: Re: [Wikiquality-l] Wikipedia colored according to trust
If you want to pick out the malicious changes, you need to flag also small changes.
"Sen. Hillary Clinton did *not* vote in favor of war in Iraq"
"John Doe, born in *1947*"
The ** indicates changes.
I can very well make a system that is insensitive to small
changes, but then the system would also be insensitive to many
kinds of malicious tampering, and one of my goals was to make it
hard for anyone to change without leaving at laest a minimal trace.
So it's a matter of goals, really.
Luca
One thing that stood out for me in the small sample of articles I
examined was the flagging of innocuous changes by casual users to
correct spelling, grammar, etc. Thus a "nice-to-have" would be a
"smoothing" algorithm that ignores inconsequential changes such as
spelling corrections, etc. or the reordering of
semantically-contained
units of text (for example, reordering the line items in a list w/o
changing the content of any particular line item, etc., or the
reordering of paragraphs and perhaps even sentences.) I think this
would cover 90% or more of changes that are immaterial to an article's
credibility.
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
------------------------------------------------------------------------
Get the power of Windows + Web with the new Windows Live. Get it now!
<http://www.windowslive.com?ocid=TXT_TAGHM_Wave2_powerofwindows_122007>
------------------------------------------------------------------------
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
Luca de Alfaro
2007-12-21 19:46:13 UTC
Permalink
I am not sure I am replying to the correct point, but, the system weighs an
author feedback as a function of the reputation of the author.
Reputation is "linear" in the sense that new feedback is simply added to the
reputation.
A user of reputation r gives weight log(1 + r) to hiers feedback.
We use this logarithmic scaling to prevent long-time editors from forming a
clique that is essentially impervious to feedback from the rest of the
community (will this kind of comments get me skinned? :-)

Luca
Post by John Erling Blad
It is wise to make a note about the fact that such systems make it
possible to deduce earlier in the mean that someone is a vandal or not,
but it can't replace a good reader that responds to an error. This
creates the rather annoying situation where a response from a casual
reader should be weighted more than non-beginners, but this makes the
system suceptible to users wanting to skew its metrics on specific users.
John E
Post by Aaron Schulz
Right. Also, we need to be clear what we want this to do. It will
never be great at determining fact-checked material. What it is good
at is spotting the more dubious stuff, like possible vandalism. This
makes the possibility of having "most trusted" stable version as
discussed earlier. Small changes not only can be big in meaning, but
they still attest to the trust.
If I read a sentence to change some minor thing, I still read it. If a
wrongly says "he identifies himself as bisexual" or "born in 1885"
rather than 1985 in a page when I edit, I am going to revert if I
catch it. Even if just making some grammar/syntax cleanup. So each
time people look at stuff if still attest to the page a little bit,
from a vandalism perspective.
The algorithms can be made more strict to catch more general dubious
info better, but it is not that bad at that already, and the stricter
it gets, the more it gets under inclusive as to what is considered
unlikely to be vandalized.
-Aaron Schulz
------------------------------------------------------------------------
Post by Aaron Schulz
Date: Fri, 21 Dec 2007 10:34:47 -0800
Subject: Re: [Wikiquality-l] Wikipedia colored according to trust
If you want to pick out the malicious changes, you need to flag
also small changes.
"Sen. Hillary Clinton did *not* vote in favor of war in Iraq"
"John Doe, born in *1947*"
The ** indicates changes.
I can very well make a system that is insensitive to small
changes, but then the system would also be insensitive to many
kinds of malicious tampering, and one of my goals was to make it
hard for anyone to change without leaving at laest a minimal trace.
So it's a matter of goals, really.
Luca
One thing that stood out for me in the small sample of articles
I
Post by Aaron Schulz
examined was the flagging of innocuous changes by casual users
to
Post by Aaron Schulz
correct spelling, grammar, etc. Thus a "nice-to-have" would be
a
Post by Aaron Schulz
"smoothing" algorithm that ignores inconsequential changes such as
spelling corrections, etc. or the reordering of
semantically-contained
units of text (for example, reordering the line items in a list w/o
changing the content of any particular line item, etc., or the
reordering of paragraphs and perhaps even sentences.) I think this
would cover 90% or more of changes that are immaterial to an article's
credibility.
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
------------------------------------------------------------------------
Get the power of Windows + Web with the new Windows Live. Get it now!
<http://www.windowslive.com?ocid=TXT_TAGHM_Wave2_powerofwindows_122007>
------------------------------------------------------------------------
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
John Erling Blad
2007-12-21 20:49:30 UTC
Permalink
I think that kind of weighting is correct as long as it is symmetrical
(ie. do-undo -pairs weights approx the same). Clique-building is
interesting although when you are able to analyze which ones interacts.
Then you can punish people when their friends do bad edits. Its not as
bad as it sounds, the reasoning is that often someone writes the article
under guidance of some other. You don't want to punish only one user in
such situations but both.
John E
Post by Luca de Alfaro
I am not sure I am replying to the correct point, but, the system
weighs an author feedback as a function of the reputation of the author.
Reputation is "linear" in the sense that new feedback is simply added
to the reputation.
A user of reputation r gives weight log(1 + r) to hiers feedback.
We use this logarithmic scaling to prevent long-time editors from
forming a clique that is essentially impervious to feedback from the
rest of the community (will this kind of comments get me skinned? :-)
Luca
It is wise to make a note about the fact that such systems make it
possible to deduce earlier in the mean that someone is a vandal or not,
but it can't replace a good reader that responds to an error. This
creates the rather annoying situation where a response from a casual
reader should be weighted more than non-beginners, but this makes the
system suceptible to users wanting to skew its metrics on specific users.
John E
Post by Aaron Schulz
Right. Also, we need to be clear what we want this to do. It will
never be great at determining fact-checked material. What it is good
at is spotting the more dubious stuff, like possible vandalism.
This
Post by Aaron Schulz
makes the possibility of having "most trusted" stable version as
discussed earlier. Small changes not only can be big in meaning, but
they still attest to the trust.
If I read a sentence to change some minor thing, I still read
it. If a
Post by Aaron Schulz
wrongly says "he identifies himself as bisexual" or "born in 1885"
rather than 1985 in a page when I edit, I am going to revert if I
catch it. Even if just making some grammar/syntax cleanup. So each
time people look at stuff if still attest to the page a little bit,
from a vandalism perspective.
The algorithms can be made more strict to catch more general dubious
info better, but it is not that bad at that already, and the
stricter
Post by Aaron Schulz
it gets, the more it gets under inclusive as to what is considered
unlikely to be vandalized.
-Aaron Schulz
------------------------------------------------------------------------
Post by Aaron Schulz
Date: Fri, 21 Dec 2007 10:34:47 -0800
Subject: Re: [Wikiquality-l] Wikipedia colored according to
trust
Post by Aaron Schulz
If you want to pick out the malicious changes, you need to flag
also small changes.
"Sen. Hillary Clinton did *not* vote in favor of war in Iraq"
"John Doe, born in *1947*"
The ** indicates changes.
I can very well make a system that is insensitive to small
changes, but then the system would also be insensitive to many
kinds of malicious tampering, and one of my goals was to make it
hard for anyone to change without leaving at laest a minimal
trace.
Post by Aaron Schulz
So it's a matter of goals, really.
Luca
On Dec 21, 2007 10:01 AM, Jonathan Leybovich
One thing that stood out for me in the small sample of
articles I
Post by Aaron Schulz
examined was the flagging of innocuous changes by casual
users to
Post by Aaron Schulz
correct spelling, grammar, etc. Thus a "nice-to-have"
would be a
Post by Aaron Schulz
"smoothing" algorithm that ignores inconsequential changes
such as
spelling corrections, etc. or the reordering of
semantically-contained
units of text (for example, reordering the line items in a
list w/o
changing the content of any particular line item, etc.,
or the
Post by Aaron Schulz
reordering of paragraphs and perhaps even sentences.) I
think
Post by Aaron Schulz
this
would cover 90% or more of changes that are immaterial
to an
Post by Aaron Schulz
article's
credibility.
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
------------------------------------------------------------------------
Post by Aaron Schulz
Get the power of Windows + Web with the new Windows Live. Get it
now!
<http://www.windowslive.com?ocid=TXT_TAGHM_Wave2_powerofwindows_122007>
------------------------------------------------------------------------
Post by Aaron Schulz
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
<http://lists.wikimedia.org/mailman/listinfo/wikiquality-l>
------------------------------------------------------------------------
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
Aaron Schulz
2007-12-21 22:14:18 UTC
Permalink
How do you tell if users interact? Cross user talk page edits? If you go by edits or edits to talk, what is the difference between people in a clique and people that disagree all the time from the program's standpoint.

For identifying non-vandalized versions, cliques don't seem to pose much of a threat. The only way for that to be a problem is for people to edit stack, similar to the issue of users not using preview and making a ton of edits in a row. This could game the system sometimes, but it is also very obvious behavior when the same 2-3 new (otherwise thy would have probably been blocked by now) people pile on 8 edits after each other. Not only will it stand out, but all it would do is influence the "most trusted"/"likely unvandalized" stable version or the trust for the current revision in their favor...until someone just reverts it anyway. It's too much work for too little plus likely getting caught in the process.

The system will not be bullet proof. Admins have gone rouge before. Even for FlaggedRevs, "editor"/"surveyor" status may be abused by a very small number of people. By default it checks for email, a userpage, 150 edits spread out well, account age. Still, some people could get through and then troll. The thing is though, that it is just not worth, as they would have the rights removed and possibly be blocked immediately. And after what? Sighting some vandalism, which would just get reverted and fixed in a few minutes. To do it again would require another IP and another account back at square one again. That's just not worth it for a troll. We get vandals from people testing and joking around as well as people that just whip new accounts out their ass every minutes because it is so easy. FlaggedRevs makes it not worth it. Likewise, trying to game Article Trust does seem to very "worth it" much either.

Think of how long login didn't have captchas for such a large site. That's because nobody cared to sit there guessing around or writing bots then. If things are too hard to game, people won't care. The "rewards" of gaming this system is even way less than hacking an account. It's a bit better with FlaggedRevs because you commit to having done reviews, so reviewing vandalism goes straight against you. But still, tweaking junk edits 15 times or having the same two editors cluster 8 tiny edits after each other accomplishes little, is noticeable, and IMO, not really worth it. Also, we don't have to use AT or FR exclusively, and some combination of both (quality > sighted > autotrust > current) could avoid the multiple edit gaming issue altogether.

-Aaron Schulz
Date: Fri, 21 Dec 2007 21:49:30 +0100
Subject: Re: [Wikiquality-l] Wikipedia colored according to trust
I think that kind of weighting is correct as long as it is symmetrical
(ie. do-undo -pairs weights approx the same). Clique-building is
interesting although when you are able to analyze which ones interacts.
Then you can punish people when their friends do bad edits. Its not as
bad as it sounds, the reasoning is that often someone writes the article
under guidance of some other. You don't want to punish only one user in
such situations but both.
John E
Post by Luca de Alfaro
I am not sure I am replying to the correct point, but, the system
weighs an author feedback as a function of the reputation of the author.
Reputation is "linear" in the sense that new feedback is simply added
to the reputation.
A user of reputation r gives weight log(1 + r) to hiers feedback.
We use this logarithmic scaling to prevent long-time editors from
forming a clique that is essentially impervious to feedback from the
rest of the community (will this kind of comments get me skinned? :-)
Luca
It is wise to make a note about the fact that such systems make it
possible to deduce earlier in the mean that someone is a vandal or not,
but it can't replace a good reader that responds to an error. This
creates the rather annoying situation where a response from a casual
reader should be weighted more than non-beginners, but this makes the
system suceptible to users wanting to skew its metrics on specific users.
John E
Post by Aaron Schulz
Right. Also, we need to be clear what we want this to do. It will
never be great at determining fact-checked material. What it is good
at is spotting the more dubious stuff, like possible vandalism.
This
Post by Aaron Schulz
makes the possibility of having "most trusted" stable version as
discussed earlier. Small changes not only can be big in meaning, but
they still attest to the trust.
If I read a sentence to change some minor thing, I still read
it. If a
Post by Aaron Schulz
wrongly says "he identifies himself as bisexual" or "born in 1885"
rather than 1985 in a page when I edit, I am going to revert if I
catch it. Even if just making some grammar/syntax cleanup. So each
time people look at stuff if still attest to the page a little bit,
from a vandalism perspective.
The algorithms can be made more strict to catch more general dubious
info better, but it is not that bad at that already, and the
stricter
Post by Aaron Schulz
it gets, the more it gets under inclusive as to what is considered
unlikely to be vandalized.
-Aaron Schulz
------------------------------------------------------------------------
Post by Aaron Schulz
Date: Fri, 21 Dec 2007 10:34:47 -0800
Subject: Re: [Wikiquality-l] Wikipedia colored according to
trust
Post by Aaron Schulz
If you want to pick out the malicious changes, you need to flag
also small changes.
"Sen. Hillary Clinton did *not* vote in favor of war in Iraq"
"John Doe, born in *1947*"
The ** indicates changes.
I can very well make a system that is insensitive to small
changes, but then the system would also be insensitive to many
kinds of malicious tampering, and one of my goals was to make it
hard for anyone to change without leaving at laest a minimal
trace.
Post by Aaron Schulz
So it's a matter of goals, really.
Luca
On Dec 21, 2007 10:01 AM, Jonathan Leybovich
One thing that stood out for me in the small sample of
articles I
Post by Aaron Schulz
examined was the flagging of innocuous changes by casual
users to
Post by Aaron Schulz
correct spelling, grammar, etc. Thus a "nice-to-have"
would be a
Post by Aaron Schulz
"smoothing" algorithm that ignores inconsequential changes
such as
spelling corrections, etc. or the reordering of
semantically-contained
units of text (for example, reordering the line items in a
list w/o
changing the content of any particular line item, etc.,
or the
Post by Aaron Schulz
reordering of paragraphs and perhaps even sentences.) I
think
Post by Aaron Schulz
this
would cover 90% or more of changes that are immaterial
to an
Post by Aaron Schulz
article's
credibility.
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
------------------------------------------------------------------------
Post by Aaron Schulz
Get the power of Windows + Web with the new Windows Live. Get it
now!
<http://www.windowslive.com?ocid=TXT_TAGHM_Wave2_powerofwindows_122007>
------------------------------------------------------------------------
Post by Aaron Schulz
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
<http://lists.wikimedia.org/mailman/listinfo/wikiquality-l>
------------------------------------------------------------------------
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
_________________________________________________________________
i’m is proud to present Cause Effect, a series about real people making a difference.
http://im.live.com/Messenger/IM/MTV/?source=text_Cause_Effect
John Erling Blad
2007-12-21 23:42:27 UTC
Permalink
Note that I don't say clique detection should be implemented, I say this
is a very interesting option to get running if possible. There are
basically two possible solution, or rather I'm not aware of any other
that is possible to implement. One is to use identified cooperation
between two or more persons to be able to edit an otherwise locked
article, the other is to identify cooperation by checking how two or
more users interact. If two persons starting to interact in a positive
way at any one article it is highly unlikely the goes to war on another
article. War between users can be readily detected by simple statistical
means (aka automatic classification of statements).

Now the second problem. In trust-accumulating systems you don't have to
pile up edits in a single article to game the system. You use two bots.
One bad-bot introduces some identifiable error in many articles, and
then another good-bot fix those errors. The good-bot will quickly gain
high reputation. Even so if the bad-bot changes ip-address often so it
goes undetected. This has to be solved somehow, and to say it involves
to much work to do something like that is plainly a wrong solution to
the problem. I think it is solvable, and I think at least one solution
is the fact that all such systems will produce highly unusual usage
metrics. It is then mostly a matter of detecting the unusual metrics and
sound an alarm bell. If it is possible to incorporate "physical means"
that makes it impossible to game the system it is even better. For
example, a radio communication system can be jammed, but using frequency
hopping radios makes them susceptible to jamming. If they are hopping
fast enough they stays ahead of the jammer. If they are hopping faster
than the radio waves can reach the jammer then it is physically
impossible to use smart jammers.

Don't create a system that can be easilly fooled by more or less smart
bots. Make it physically impossible to fool the system with a bot.

1) decrease credits when time between edits goes down
2) never give more credits to revert an edit then given when making the
original edit
3) weight the edits according to the amount of contribution, yet
different styles of contributions should give the same net result in credits
4) decrease credits when cooperation are detected and the reason behind
the cooperation can't be detected
5) increase credits when there are a positive cooperation

A slight modification of 2 is to release it if there goes sufficient time.

John E Blad
Post by Aaron Schulz
How do you tell if users interact? Cross user talk page edits? If you
go by edits or edits to talk, what is the difference between people in
a clique and people that disagree all the time from the program's
standpoint.
For identifying non-vandalized versions, cliques don't seem to pose
much of a threat. The only way for that to be a problem is for people
to edit stack, similar to the issue of users not using preview and
making a ton of edits in a row. This could game the system sometimes,
but it is also very obvious behavior when the same 2-3 new (otherwise
thy would have probably been blocked by now) people pile on 8 edits
after each other. Not only will it stand out, but all it would do is
influence the "most trusted"/"likely unvandalized" stable version or
the trust for the current revision in their favor...until someone just
reverts it anyway. It's too much work for too little plus likely
getting caught in the process.
The system will not be bullet proof. Admins have gone rouge before.
Even for FlaggedRevs, "editor"/"surveyor" status may be abused by a
very small number of people. By default it checks for email, a
userpage, 150 edits spread out well, account age. Still, some people
could get through and then troll. The thing is though, that it is just
not worth, as they would have the rights removed and possibly be
blocked immediately. And after what? Sighting some vandalism, which
would just get reverted and fixed in a few minutes. To do it again
would require another IP and another account back at square one again.
That's just not worth it for a troll. We get vandals from people
testing and joking around as well as people that just whip new
accounts out their ass every minutes because it is so easy.
FlaggedRevs makes it not worth it. Likewise, trying to game Article
Trust does seem to very "worth it" much either.
Think of how long login didn't have captchas for such a large site.
That's because nobody cared to sit there guessing around or writing
bots then. If things are too hard to game, people won't care. The
"rewards" of gaming this system is even way less than hacking an
account. It's a bit better with FlaggedRevs because you commit to
having done reviews, so reviewing vandalism goes straight against you.
But still, tweaking junk edits 15 times or having the same two editors
cluster 8 tiny edits after each other accomplishes little, is
noticeable, and IMO, not really worth it. Also, we don't have to use
AT or FR exclusively, and some combination of both (quality > sighted
Post by Aaron Schulz
autotrust > current) could avoid the multiple edit gaming issue
altogether.
-Aaron Schulz
Post by Aaron Schulz
Date: Fri, 21 Dec 2007 21:49:30 +0100
Subject: Re: [Wikiquality-l] Wikipedia colored according to trust
I think that kind of weighting is correct as long as it is symmetrical
(ie. do-undo -pairs weights approx the same). Clique-building is
interesting although when you are able to analyze which ones interacts.
Then you can punish people when their friends do bad edits. Its not as
bad as it sounds, the reasoning is that often someone writes the article
under guidance of some other. You don't want to punish only one user in
such situations but both.
John E
Post by Luca de Alfaro
I am not sure I am replying to the correct point, but, the system
weighs an author feedback as a function of the reputation of the
author.
Post by Aaron Schulz
Post by Luca de Alfaro
Reputation is "linear" in the sense that new feedback is simply added
to the reputation.
A user of reputation r gives weight log(1 + r) to hiers feedback.
We use this logarithmic scaling to prevent long-time editors from
forming a clique that is essentially impervious to feedback from the
rest of the community (will this kind of comments get me skinned? :-)
Luca
It is wise to make a note about the fact that such systems make it
possible to deduce earlier in the mean that someone is a vandal or not,
but it can't replace a good reader that responds to an error. This
creates the rather annoying situation where a response from a casual
reader should be weighted more than non-beginners, but this makes the
system suceptible to users wanting to skew its metrics on specific users.
John E
Post by Aaron Schulz
Right. Also, we need to be clear what we want this to do. It will
never be great at determining fact-checked material. What it is good
at is spotting the more dubious stuff, like possible vandalism.
This
Post by Aaron Schulz
makes the possibility of having "most trusted" stable version as
discussed earlier. Small changes not only can be big in meaning, but
they still attest to the trust.
If I read a sentence to change some minor thing, I still read
it. If a
Post by Aaron Schulz
wrongly says "he identifies himself as bisexual" or "born in 1885"
rather than 1985 in a page when I edit, I am going to revert if I
catch it. Even if just making some grammar/syntax cleanup. So each
time people look at stuff if still attest to the page a little bit,
from a vandalism perspective.
The algorithms can be made more strict to catch more general dubious
info better, but it is not that bad at that already, and the
stricter
Post by Aaron Schulz
it gets, the more it gets under inclusive as to what is considered
unlikely to be vandalized.
-Aaron Schulz
------------------------------------------------------------------------
Post by Aaron Schulz
Post by Luca de Alfaro
Post by Aaron Schulz
Date: Fri, 21 Dec 2007 10:34:47 -0800
Subject: Re: [Wikiquality-l] Wikipedia colored according to
trust
Post by Aaron Schulz
If you want to pick out the malicious changes, you need to flag
also small changes.
"Sen. Hillary Clinton did *not* vote in favor of war in Iraq"
"John Doe, born in *1947*"
The ** indicates changes.
I can very well make a system that is insensitive to small
changes, but then the system would also be insensitive to many
kinds of malicious tampering, and one of my goals was to make it
hard for anyone to change without leaving at laest a minimal
trace.
Post by Aaron Schulz
So it's a matter of goals, really.
Luca
On Dec 21, 2007 10:01 AM, Jonathan Leybovich
One thing that stood out for me in the small sample of
articles I
Post by Aaron Schulz
examined was the flagging of innocuous changes by casual
users to
Post by Aaron Schulz
correct spelling, grammar, etc. Thus a "nice-to-have"
would be a
Post by Aaron Schulz
"smoothing" algorithm that ignores inconsequential changes such as
spelling corrections, etc. or the reordering of
semantically-contained
units of text (for example, reordering the line items in a list w/o
changing the content of any particular line item, etc.,
or the
Post by Aaron Schulz
reordering of paragraphs and perhaps even sentences.) I
think
Post by Aaron Schulz
this
would cover 90% or more of changes that are immaterial
to an
Post by Aaron Schulz
article's
credibility.
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
------------------------------------------------------------------------
Post by Aaron Schulz
Post by Luca de Alfaro
Post by Aaron Schulz
Get the power of Windows + Web with the new Windows Live. Get it
now!
<http://www.windowslive.com?ocid=TXT_TAGHM_Wave2_powerofwindows_122007>
------------------------------------------------------------------------
Post by Aaron Schulz
Post by Luca de Alfaro
Post by Aaron Schulz
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
<http://lists.wikimedia.org/mailman/listinfo/wikiquality-l>
------------------------------------------------------------------------
Post by Aaron Schulz
Post by Luca de Alfaro
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
------------------------------------------------------------------------
i’m is proud to present Cause Effect, a series about real people
making a difference. Learn more
<http://im.live.com/Messenger/IM/MTV/?source=text_Cause_Effect>
------------------------------------------------------------------------
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
Aaron Schulz
2007-12-22 00:55:04 UTC
Permalink
That would still be quite a feat. The bot would have to IP hop over several A-classes. Make some random vandalism that looks like random vandalism (rather than the same set of garbage or random characters/blanking). The IP would have to change enough for it to not look like one bot on RC. The good bot would have to give a time pause before reverting to let other beat it to the punch. Also, if all it did was revert bad bot, then as I said, the bad bot's IP would really have to be all over the place, unless the bot made random accounts too. It would still take a while to build up trust this way...and even with the max, it will still take several edits to get bad content white. This would have to be good enough to set the "most trusted" version. And even if this is done on some pages, it will get reverted and the user blocked, and they have to use another "good bot".

It certainly is always better to be less easily spoofed, and if there are good practical things that can stop this without making the software too underinclusive, I'm all for it. I said earlier that users in the formal bot group should not count as adding credibility. In this vein, we could do the opposite and have groups that increase the max trust (credits) a user can have. That way the default max trust can be lowered to deal further with stuff like good/bad bot networks. This would work if integrated with FlaggedRevs, as the 'editor' group could have a higher max trust limit.

-Aaron Schulz
Date: Sat, 22 Dec 2007 00:42:27 +0100
Subject: Re: [Wikiquality-l] Wikipedia colored according to trust
Note that I don't say clique detection should be implemented, I say this
is a very interesting option to get running if possible. There are
basically two possible solution, or rather I'm not aware of any other
that is possible to implement. One is to use identified cooperation
between two or more persons to be able to edit an otherwise locked
article, the other is to identify cooperation by checking how two or
more users interact. If two persons starting to interact in a positive
way at any one article it is highly unlikely the goes to war on another
article. War between users can be readily detected by simple statistical
means (aka automatic classification of statements).
Now the second problem. In trust-accumulating systems you don't have to
pile up edits in a single article to game the system. You use two bots.
One bad-bot introduces some identifiable error in many articles, and
then another good-bot fix those errors. The good-bot will quickly gain
high reputation. Even so if the bad-bot changes ip-address often so it
goes undetected. This has to be solved somehow, and to say it involves
to much work to do something like that is plainly a wrong solution to
the problem. I think it is solvable, and I think at least one solution
is the fact that all such systems will produce highly unusual usage
metrics. It is then mostly a matter of detecting the unusual metrics and
sound an alarm bell. If it is possible to incorporate "physical means"
that makes it impossible to game the system it is even better. For
example, a radio communication system can be jammed, but using frequency
hopping radios makes them susceptible to jamming. If they are hopping
fast enough they stays ahead of the jammer. If they are hopping faster
than the radio waves can reach the jammer then it is physically
impossible to use smart jammers.
Don't create a system that can be easilly fooled by more or less smart
bots. Make it physically impossible to fool the system with a bot.
1) decrease credits when time between edits goes down
2) never give more credits to revert an edit then given when making the
original edit
3) weight the edits according to the amount of contribution, yet
different styles of contributions should give the same net result in credits
4) decrease credits when cooperation are detected and the reason behind
the cooperation can't be detected
5) increase credits when there are a positive cooperation
A slight modification of 2 is to release it if there goes sufficient time.
John E Blad
Post by Aaron Schulz
How do you tell if users interact? Cross user talk page edits? If you
go by edits or edits to talk, what is the difference between people in
a clique and people that disagree all the time from the program's
standpoint.
For identifying non-vandalized versions, cliques don't seem to pose
much of a threat. The only way for that to be a problem is for people
to edit stack, similar to the issue of users not using preview and
making a ton of edits in a row. This could game the system sometimes,
but it is also very obvious behavior when the same 2-3 new (otherwise
thy would have probably been blocked by now) people pile on 8 edits
after each other. Not only will it stand out, but all it would do is
influence the "most trusted"/"likely unvandalized" stable version or
the trust for the current revision in their favor...until someone just
reverts it anyway. It's too much work for too little plus likely
getting caught in the process.
The system will not be bullet proof. Admins have gone rouge before.
Even for FlaggedRevs, "editor"/"surveyor" status may be abused by a
very small number of people. By default it checks for email, a
userpage, 150 edits spread out well, account age. Still, some people
could get through and then troll. The thing is though, that it is just
not worth, as they would have the rights removed and possibly be
blocked immediately. And after what? Sighting some vandalism, which
would just get reverted and fixed in a few minutes. To do it again
would require another IP and another account back at square one again.
That's just not worth it for a troll. We get vandals from people
testing and joking around as well as people that just whip new
accounts out their ass every minutes because it is so easy.
FlaggedRevs makes it not worth it. Likewise, trying to game Article
Trust does seem to very "worth it" much either.
Think of how long login didn't have captchas for such a large site.
That's because nobody cared to sit there guessing around or writing
bots then. If things are too hard to game, people won't care. The
"rewards" of gaming this system is even way less than hacking an
account. It's a bit better with FlaggedRevs because you commit to
having done reviews, so reviewing vandalism goes straight against you.
But still, tweaking junk edits 15 times or having the same two editors
cluster 8 tiny edits after each other accomplishes little, is
noticeable, and IMO, not really worth it. Also, we don't have to use
AT or FR exclusively, and some combination of both (quality > sighted
Post by Aaron Schulz
autotrust > current) could avoid the multiple edit gaming issue
altogether.
-Aaron Schulz
Post by Aaron Schulz
Date: Fri, 21 Dec 2007 21:49:30 +0100
Subject: Re: [Wikiquality-l] Wikipedia colored according to trust
I think that kind of weighting is correct as long as it is symmetrical
(ie. do-undo -pairs weights approx the same). Clique-building is
interesting although when you are able to analyze which ones interacts.
Then you can punish people when their friends do bad edits. Its not as
bad as it sounds, the reasoning is that often someone writes the article
under guidance of some other. You don't want to punish only one user in
such situations but both.
John E
Post by Luca de Alfaro
I am not sure I am replying to the correct point, but, the system
weighs an author feedback as a function of the reputation of the
author.
Post by Aaron Schulz
Post by Luca de Alfaro
Reputation is "linear" in the sense that new feedback is simply added
to the reputation.
A user of reputation r gives weight log(1 + r) to hiers feedback.
We use this logarithmic scaling to prevent long-time editors from
forming a clique that is essentially impervious to feedback from the
rest of the community (will this kind of comments get me skinned? :-)
Luca
It is wise to make a note about the fact that such systems make it
possible to deduce earlier in the mean that someone is a vandal or not,
but it can't replace a good reader that responds to an error. This
creates the rather annoying situation where a response from a casual
reader should be weighted more than non-beginners, but this makes the
system suceptible to users wanting to skew its metrics on specific users.
John E
Post by Aaron Schulz
Right. Also, we need to be clear what we want this to do. It will
never be great at determining fact-checked material. What it is good
at is spotting the more dubious stuff, like possible vandalism.
This
Post by Aaron Schulz
makes the possibility of having "most trusted" stable version as
discussed earlier. Small changes not only can be big in meaning, but
they still attest to the trust.
If I read a sentence to change some minor thing, I still read
it. If a
Post by Aaron Schulz
wrongly says "he identifies himself as bisexual" or "born in 1885"
rather than 1985 in a page when I edit, I am going to revert if I
catch it. Even if just making some grammar/syntax cleanup. So each
time people look at stuff if still attest to the page a little bit,
from a vandalism perspective.
The algorithms can be made more strict to catch more general dubious
info better, but it is not that bad at that already, and the
stricter
Post by Aaron Schulz
it gets, the more it gets under inclusive as to what is considered
unlikely to be vandalized.
-Aaron Schulz
------------------------------------------------------------------------
Post by Aaron Schulz
Post by Luca de Alfaro
Post by Aaron Schulz
Date: Fri, 21 Dec 2007 10:34:47 -0800
Subject: Re: [Wikiquality-l] Wikipedia colored according to
trust
Post by Aaron Schulz
If you want to pick out the malicious changes, you need to flag
also small changes.
"Sen. Hillary Clinton did *not* vote in favor of war in Iraq"
"John Doe, born in *1947*"
The ** indicates changes.
I can very well make a system that is insensitive to small
changes, but then the system would also be insensitive to many
kinds of malicious tampering, and one of my goals was to make it
hard for anyone to change without leaving at laest a minimal
trace.
Post by Aaron Schulz
So it's a matter of goals, really.
Luca
On Dec 21, 2007 10:01 AM, Jonathan Leybovich
One thing that stood out for me in the small sample of
articles I
Post by Aaron Schulz
examined was the flagging of innocuous changes by casual
users to
Post by Aaron Schulz
correct spelling, grammar, etc. Thus a "nice-to-have"
would be a
Post by Aaron Schulz
"smoothing" algorithm that ignores inconsequential changes such as
spelling corrections, etc. or the reordering of
semantically-contained
units of text (for example, reordering the line items in a list w/o
changing the content of any particular line item, etc.,
or the
Post by Aaron Schulz
reordering of paragraphs and perhaps even sentences.) I
think
Post by Aaron Schulz
this
would cover 90% or more of changes that are immaterial
to an
Post by Aaron Schulz
article's
credibility.
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
------------------------------------------------------------------------
Post by Aaron Schulz
Post by Luca de Alfaro
Post by Aaron Schulz
Get the power of Windows + Web with the new Windows Live. Get it
now!
<http://www.windowslive.com?ocid=TXT_TAGHM_Wave2_powerofwindows_122007>
------------------------------------------------------------------------
Post by Aaron Schulz
Post by Luca de Alfaro
Post by Aaron Schulz
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
<http://lists.wikimedia.org/mailman/listinfo/wikiquality-l>
------------------------------------------------------------------------
Post by Aaron Schulz
Post by Luca de Alfaro
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
------------------------------------------------------------------------
i’m is proud to present Cause Effect, a series about real people
making a difference. Learn more
<http://im.live.com/Messenger/IM/MTV/?source=text_Cause_Effect>
------------------------------------------------------------------------
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
_________________________________________________________________
Share life as it happens with the new Windows Live.
http://www.windowslive.com/share.html?ocid=TXT_TAGHM_Wave2_sharelife_122007
Aaron Schulz
2007-12-22 01:12:17 UTC
Permalink
Actually, account creation has a captcha, so it wouldn't be using accounts likely.

-Aaron Schulz

From: jschulz_4587-***@public.gmane.org
To: wikiquality-l-RusutVdil2icGmH+5r0DM0B+***@public.gmane.org
Date: Fri, 21 Dec 2007 19:55:04 -0500
Subject: Re: [Wikiquality-l] Wikipedia colored according to trust








That would still be quite a feat. The bot would have to IP hop over several A-classes. Make some random vandalism that looks like random vandalism (rather than the same set of garbage or random characters/blanking). The IP would have to change enough for it to not look like one bot on RC. The good bot would have to give a time pause before reverting to let other beat it to the punch. Also, if all it did was revert bad bot, then as I said, the bad bot's IP would really have to be all over the place, unless the bot made random accounts too. It would still take a while to build up trust this way...and even with the max, it will still take several edits to get bad content white. This would have to be good enough to set the "most trusted" version. And even if this is done on some pages, it will get reverted and the user blocked, and they have to use another "good bot".

It certainly is always better to be less easily spoofed, and if there are good practical things that can stop this without making the software too underinclusive, I'm all for it. I said earlier that users in the formal bot group should not count as adding credibility. In this vein, we could do the opposite and have groups that increase the max trust (credits) a user can have. That way the default max trust can be lowered to deal further with stuff like good/bad bot networks. This would work if integrated with FlaggedRevs, as the 'editor' group could have a higher max trust limit.

-Aaron Schulz



_________________________________________________________________
Get the power of Windows + Web with the new Windows Live.
http://www.windowslive.com?ocid=TXT_TAGHM_Wave2_powerofwindows_122007
John Erling Blad
2007-12-27 22:06:13 UTC
Permalink
There are public algorithms with methods to defeat captchas. I haven't
looked into the version used by Wikipedia, but compared to some other
versions that are cracked it seems likely it isn't safe. (Actually the
basic technique used by the version in Wikipedia has some serious design
flaws that makes it possible to crack it by brute force, not that I find
it very likely that anyone will try this.)

John E
Post by Aaron Schulz
Actually, account creation has a captcha, so it wouldn't be using accounts likely.
-Aaron Schulz
------------------------------------------------------------------------
Date: Fri, 21 Dec 2007 19:55:04 -0500
Subject: Re: [Wikiquality-l] Wikipedia colored according to trust
That would still be quite a feat. The bot would have to IP hop
over several A-classes. Make some random vandalism that looks like
random vandalism (rather than the same set of garbage or random
characters/blanking). The IP would have to change enough for it to
not look like one bot on RC. The good bot would have to give a
time pause before reverting to let other beat it to the punch.
Also, if all it did was revert bad bot, then as I said, the bad
bot's IP would really have to be all over the place, unless the
bot made random accounts too. It would still take a while to build
up trust this way...and even with the max, it will still take
several edits to get bad content white. This would have to be good
enough to set the "most trusted" version. And even if this is done
on some pages, it will get reverted and the user blocked, and they
have to use another "good bot".
It certainly is always better to be less easily spoofed, and if
there are good practical things that can stop this without making
the software too underinclusive, I'm all for it. I said earlier
that users in the formal bot group should not count as adding
credibility. In this vein, we could do the opposite and have
groups that increase the max trust (credits) a user can have. That
way the default max trust can be lowered to deal further with
stuff like good/bad bot networks. This would work if integrated
with FlaggedRevs, as the 'editor' group could have a higher max
trust limit.
-Aaron Schulz
------------------------------------------------------------------------
Get the power of Windows + Web with the new Windows Live. Get it now!
<http://www.windowslive.com?ocid=TXT_TAGHM_Wave2_powerofwindows_122007>
------------------------------------------------------------------------
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
P. Birken
2008-01-30 21:27:46 UTC
Permalink
Sorry for bringing this age old thread up again, but reading through
it again, I often thought: things would be easier, if Metadata were
not in the articles themselves, but separate. Then, the version
history would be trimmed dramatically and nobody could build up trust
(meaning both the real and the algorithmic sense) by category-edits
only. This would make life dramatically easier for the bots and easier
for users.

Is getting the metadata (category, interwikis, Personendaten on de)
out of the articles remotely feasible? Is the positive impact as good
as I think?

Best

Philipp
Luca de Alfaro
2008-01-30 21:34:20 UTC
Permalink
I am sure that, if it were of interest, one coudl remove interwiki links and
categories while analizing the pages... or am I missing something?
I however would hesitate to remove the information from the revision
history: how to revert damage to the meta-information?
(Yes, better ways to compress the revision history are very much needed, for
the working version, but this is another issue)...
I hope my comments are on topic; I am not sure I understand the point.

Luca
Post by P. Birken
Sorry for bringing this age old thread up again, but reading through
it again, I often thought: things would be easier, if Metadata were
not in the articles themselves, but separate. Then, the version
history would be trimmed dramatically and nobody could build up trust
(meaning both the real and the algorithmic sense) by category-edits
only. This would make life dramatically easier for the bots and easier
for users.
Is getting the metadata (category, interwikis, Personendaten on de)
out of the articles remotely feasible? Is the positive impact as good
as I think?
Best
Philipp
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
P. Birken
2008-01-30 21:41:46 UTC
Permalink
Currently, metadata is simply in the article body. However, in theory,
metadata could be stored and edited in a separate place with its own
revision history, like the discussion is separate from the article. I
hope the idea is now more clear?

Philipp
Post by Luca de Alfaro
I am sure that, if it were of interest, one coudl remove interwiki links and
categories while analizing the pages... or am I missing something?
I however would hesitate to remove the information from the revision
history: how to revert damage to the meta-information?
(Yes, better ways to compress the revision history are very much needed,
for the working version, but this is another issue)...
I hope my comments are on topic; I am not sure I understand the point.
Luca
Post by P. Birken
Sorry for bringing this age old thread up again, but reading through
it again, I often thought: things would be easier, if Metadata were
not in the articles themselves, but separate. Then, the version
history would be trimmed dramatically and nobody could build up trust
(meaning both the real and the algorithmic sense) by category-edits
only. This would make life dramatically easier for the bots and easier
for users.
Is getting the metadata (category, interwikis, Personendaten on de)
out of the articles remotely feasible? Is the positive impact as good
as I think?
Best
Philipp
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
Luca de Alfaro
2008-01-30 21:46:29 UTC
Permalink
I think separating the metadata makes sense because it cleans up the markup
language, and makes the metadata itself more easily extensible.
But, I do not think that the fact that the metadata is embedded in the pages
makes our trust/reputation computation more complex -- it would be fairly
easy to separate out the metadata at analysis stage, if we wanted to do
that.
On the negative side, separating the metadata requires new DB tables, etc
etc, making also the format of the dumps more complex (my guess).
In balance, though, it seems a good idea, but I am speaking here without
knowing the facts.

Luca
Post by P. Birken
Currently, metadata is simply in the article body. However, in theory,
metadata could be stored and edited in a separate place with its own
revision history, like the discussion is separate from the article. I
hope the idea is now more clear?
Philipp
Post by Luca de Alfaro
I am sure that, if it were of interest, one coudl remove interwiki links
and
Post by Luca de Alfaro
categories while analizing the pages... or am I missing something?
I however would hesitate to remove the information from the revision
history: how to revert damage to the meta-information?
(Yes, better ways to compress the revision history are very much
needed,
Post by Luca de Alfaro
for the working version, but this is another issue)...
I hope my comments are on topic; I am not sure I understand the point.
Luca
Post by P. Birken
Sorry for bringing this age old thread up again, but reading through
it again, I often thought: things would be easier, if Metadata were
not in the articles themselves, but separate. Then, the version
history would be trimmed dramatically and nobody could build up trust
(meaning both the real and the algorithmic sense) by category-edits
only. This would make life dramatically easier for the bots and easier
for users.
Is getting the metadata (category, interwikis, Personendaten on de)
out of the articles remotely feasible? Is the positive impact as good
as I think?
Best
Philipp
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
John Erling Blad
2007-12-27 22:00:09 UTC
Permalink
"The bot would have to IP hop over several A-classes", this is whats
called a tor-network.

One very problematic type of vandalism that usually goes undetected are
introduction of nonexisting iw-links. Another are wrong categories. The
list of possible options are fairly long.

Note that I don't say this isn't a good tool, I say that it should be
carefully inspected for ways to game the system, and if possible, adding
features that makes it impossible to use those flaws.

John E Blad
Post by Aaron Schulz
That would still be quite a feat. The bot would have to IP hop over
several A-classes. Make some random vandalism that looks like random
vandalism (rather than the same set of garbage or random
characters/blanking). The IP would have to change enough for it to not
look like one bot on RC. The good bot would have to give a time pause
before reverting to let other beat it to the punch. Also, if all it
did was revert bad bot, then as I said, the bad bot's IP would really
have to be all over the place, unless the bot made random accounts
too. It would still take a while to build up trust this way...and even
with the max, it will still take several edits to get bad content
white. This would have to be good enough to set the "most trusted"
version. And even if this is done on some pages, it will get reverted
and the user blocked, and they have to use another "good bot".
It certainly is always better to be less easily spoofed, and if there
are good practical things that can stop this without making the
software too underinclusive, I'm all for it. I said earlier that users
in the formal bot group should not count as adding credibility. In
this vein, we could do the opposite and have groups that increase the
max trust (credits) a user can have. That way the default max trust
can be lowered to deal further with stuff like good/bad bot networks.
This would work if integrated with FlaggedRevs, as the 'editor' group
could have a higher max trust limit.
-Aaron Schulz
Date: Sat, 22 Dec 2007 00:42:27 +0100
Subject: Re: [Wikiquality-l] Wikipedia colored according to trust
Note that I don't say clique detection should be implemented, I say this
is a very interesting option to get running if possible. There are
basically two possible solution, or rather I'm not aware of any other
that is possible to implement. One is to use identified cooperation
between two or more persons to be able to edit an otherwise locked
article, the other is to identify cooperation by checking how two or
more users interact. If two persons starting to interact in a positive
way at any one article it is highly unlikely the goes to war on another
article. War between users can be readily detected by simple statistical
means (aka automatic classification of statements).
Now the second problem. In trust-accumulating systems you don't have to
pile up edits in a single article to game the system. You use two bots.
One bad-bot introduces some identifiable error in many articles, and
then another good-bot fix those errors. The good-bot will quickly gain
high reputation. Even so if the bad-bot changes ip-address often so it
goes undetected. This has to be solved somehow, and to say it involves
to much work to do something like that is plainly a wrong solution to
the problem. I think it is solvable, and I think at least one solution
is the fact that all such systems will produce highly unusual usage
metrics. It is then mostly a matter of detecting the unusual metrics and
sound an alarm bell. If it is possible to incorporate "physical means"
that makes it impossible to game the system it is even better. For
example, a radio communication system can be jammed, but using frequency
hopping radios makes them susceptible to jamming. If they are hopping
fast enough they stays ahead of the jammer. If they are hopping faster
than the radio waves can reach the jammer then it is physically
impossible to use smart jammers.
Don't create a system that can be easilly fooled by more or less smart
bots. Make it physically impossible to fool the system with a bot.
1) decrease credits when time between edits goes down
2) never give more credits to revert an edit then given when making the
original edit
3) weight the edits according to the amount of contribution, yet
different styles of contributions should give the same net result in
credits
4) decrease credits when cooperation are detected and the reason behind
the cooperation can't be detected
5) increase credits when there are a positive cooperation
A slight modification of 2 is to release it if there goes sufficient
time.
John E Blad
Post by Aaron Schulz
How do you tell if users interact? Cross user talk page edits? If you
go by edits or edits to talk, what is the difference between people in
a clique and people that disagree all the time from the program's
standpoint.
For identifying non-vandalized versions, cliques don't seem to pose
much of a threat. The only way for that to be a problem is for people
to edit stack, similar to the issue of users not using preview and
making a ton of edits in a row. This could game the system sometimes,
but it is also very obvious behavior when the same 2-3 new (otherwise
thy would have probably been blocked by now) people pile on 8 edits
after each other. Not only will it stand out, but all it would do is
influence the "most trusted"/"likely unvandalized" stable version or
the trust for the current revision in their favor...until someone just
reverts it anyway. It's too much work for too little plus likely
getting caught in the process.
The system will not be bullet proof. Admins have gone rouge before.
Even for FlaggedRevs, "editor"/"surveyor" status may be abused by a
very small number of people. By default it checks for email, a
userpage, 150 edits spread out well, account age. Still, some people
could get through and then troll. The thing is though, that it is just
not worth, as they would have the rights removed and possibly be
blocked immediately. And after what? Sighting some vandalism, which
would just get reverted and fixed in a few minutes. To do it again
would require another IP and another account back at square one again.
That's just not worth it for a troll. We get vandals from people
testing and joking around as well as people that just whip new
accounts out their ass every minutes because it is so easy.
FlaggedRevs makes it not worth it. Likewise, trying to game Article
Trust does seem to very "worth it" much either.
Think of how long login didn't have captchas for such a large site.
That's because nobody cared to sit there guessing around or writing
bots then. If things are too hard to game, people won't care. The
"rewards" of gaming this system is even way less than hacking an
account. It's a bit better with FlaggedRevs because you commit to
having done reviews, so reviewing vandalism goes straight against you.
But still, tweaking junk edits 15 times or having the same two editors
cluster 8 tiny edits after each other accomplishes little, is
noticeable, and IMO, not really worth it. Also, we don't have to use
AT or FR exclusively, and some combination of both (quality > sighted
Post by Aaron Schulz
autotrust > current) could avoid the multiple edit gaming issue
altogether.
-Aaron Schulz
Post by Aaron Schulz
Date: Fri, 21 Dec 2007 21:49:30 +0100
Subject: Re: [Wikiquality-l] Wikipedia colored according to trust
I think that kind of weighting is correct as long as it is
symmetrical
Post by Aaron Schulz
Post by Aaron Schulz
(ie. do-undo -pairs weights approx the same). Clique-building is
interesting although when you are able to analyze which ones
interacts.
Post by Aaron Schulz
Post by Aaron Schulz
Then you can punish people when their friends do bad edits. Its
not as
Post by Aaron Schulz
Post by Aaron Schulz
bad as it sounds, the reasoning is that often someone writes the
article
Post by Aaron Schulz
Post by Aaron Schulz
under guidance of some other. You don't want to punish only one
user in
Post by Aaron Schulz
Post by Aaron Schulz
such situations but both.
John E
Post by Luca de Alfaro
I am not sure I am replying to the correct point, but, the system
weighs an author feedback as a function of the reputation of the
author.
Post by Aaron Schulz
Post by Luca de Alfaro
Reputation is "linear" in the sense that new feedback is
simply added
Post by Aaron Schulz
Post by Aaron Schulz
Post by Luca de Alfaro
to the reputation.
A user of reputation r gives weight log(1 + r) to hiers feedback.
We use this logarithmic scaling to prevent long-time editors from
forming a clique that is essentially impervious to feedback
from the
Post by Aaron Schulz
Post by Aaron Schulz
Post by Luca de Alfaro
rest of the community (will this kind of comments get me
skinned? :-)
Post by Aaron Schulz
Post by Aaron Schulz
Post by Luca de Alfaro
Luca
On Dec 21, 2007 11:41 AM, John Erling Blad
It is wise to make a note about the fact that such systems make it
possible to deduce earlier in the mean that someone is a vandal or not,
but it can't replace a good reader that responds to an error. This
creates the rather annoying situation where a response from a
casual
Post by Aaron Schulz
Post by Aaron Schulz
Post by Luca de Alfaro
reader should be weighted more than non-beginners, but this
makes the
Post by Aaron Schulz
Post by Aaron Schulz
Post by Luca de Alfaro
system suceptible to users wanting to skew its metrics on specific users.
John E
Post by Aaron Schulz
Right. Also, we need to be clear what we want this to do. It
will
Post by Aaron Schulz
Post by Aaron Schulz
Post by Luca de Alfaro
Post by Aaron Schulz
never be great at determining fact-checked material. What it
is good
Post by Aaron Schulz
Post by Aaron Schulz
Post by Luca de Alfaro
Post by Aaron Schulz
at is spotting the more dubious stuff, like possible vandalism.
This
Post by Aaron Schulz
makes the possibility of having "most trusted" stable version as
discussed earlier. Small changes not only can be big in
meaning, but
Post by Aaron Schulz
Post by Aaron Schulz
Post by Luca de Alfaro
Post by Aaron Schulz
they still attest to the trust.
If I read a sentence to change some minor thing, I still read
it. If a
Post by Aaron Schulz
wrongly says "he identifies himself as bisexual" or "born in
1885"
Post by Aaron Schulz
Post by Aaron Schulz
Post by Luca de Alfaro
Post by Aaron Schulz
rather than 1985 in a page when I edit, I am going to revert
if I
Post by Aaron Schulz
Post by Aaron Schulz
Post by Luca de Alfaro
Post by Aaron Schulz
catch it. Even if just making some grammar/syntax cleanup.
So each
Post by Aaron Schulz
Post by Aaron Schulz
Post by Luca de Alfaro
Post by Aaron Schulz
time people look at stuff if still attest to the page a
little bit,
Post by Aaron Schulz
Post by Aaron Schulz
Post by Luca de Alfaro
Post by Aaron Schulz
from a vandalism perspective.
The algorithms can be made more strict to catch more general
dubious
Post by Aaron Schulz
Post by Aaron Schulz
Post by Luca de Alfaro
Post by Aaron Schulz
info better, but it is not that bad at that already, and the
stricter
Post by Aaron Schulz
it gets, the more it gets under inclusive as to what is
considered
Post by Aaron Schulz
Post by Aaron Schulz
Post by Luca de Alfaro
Post by Aaron Schulz
unlikely to be vandalized.
-Aaron Schulz
------------------------------------------------------------------------
Post by Aaron Schulz
Post by Aaron Schulz
Post by Luca de Alfaro
Post by Aaron Schulz
Date: Fri, 21 Dec 2007 10:34:47 -0800
Subject: Re: [Wikiquality-l] Wikipedia colored according to
trust
Post by Aaron Schulz
If you want to pick out the malicious changes, you need to flag
also small changes.
"Sen. Hillary Clinton did *not* vote in favor of war in Iraq"
"John Doe, born in *1947*"
The ** indicates changes.
I can very well make a system that is insensitive to small
changes, but then the system would also be insensitive to many
kinds of malicious tampering, and one of my goals was to make it
hard for anyone to change without leaving at laest a minimal
trace.
Post by Aaron Schulz
So it's a matter of goals, really.
Luca
On Dec 21, 2007 10:01 AM, Jonathan Leybovich
One thing that stood out for me in the small sample of
articles I
Post by Aaron Schulz
examined was the flagging of innocuous changes by casual
users to
Post by Aaron Schulz
correct spelling, grammar, etc. Thus a "nice-to-have"
would be a
Post by Aaron Schulz
"smoothing" algorithm that ignores inconsequential changes such as
spelling corrections, etc. or the reordering of
semantically-contained
units of text (for example, reordering the line items in a list w/o
changing the content of any particular line item, etc.,
or the
Post by Aaron Schulz
reordering of paragraphs and perhaps even sentences.) I
think
Post by Aaron Schulz
this
would cover 90% or more of changes that are immaterial
to an
Post by Aaron Schulz
article's
credibility.
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
------------------------------------------------------------------------
Post by Aaron Schulz
Post by Aaron Schulz
Post by Luca de Alfaro
Post by Aaron Schulz
Get the power of Windows + Web with the new Windows Live. Get it
now!
<http://www.windowslive.com?ocid=TXT_TAGHM_Wave2_powerofwindows_122007>
------------------------------------------------------------------------
Post by Aaron Schulz
Post by Aaron Schulz
Post by Luca de Alfaro
Post by Aaron Schulz
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
<http://lists.wikimedia.org/mailman/listinfo/wikiquality-l>
------------------------------------------------------------------------
Post by Aaron Schulz
Post by Aaron Schulz
Post by Luca de Alfaro
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
------------------------------------------------------------------------
Post by Aaron Schulz
i’m is proud to present Cause Effect, a series about real people
making a difference. Learn more
<http://im.live.com/Messenger/IM/MTV/?source=text_Cause_Effect>
------------------------------------------------------------------------
Post by Aaron Schulz
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
------------------------------------------------------------------------
Share life as it happens with the new Windows Live. Share now!
<http://www.windowslive.com/share.html?ocid=TXT_TAGHM_Wave2_sharelife_122007>
------------------------------------------------------------------------
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
John Erling Blad
2007-12-21 19:31:18 UTC
Permalink
I think you misunderstand, its not about not flagging small changes. Its
about to high learning rate on small changes. When it is to high you can
simply run a spell checking bot for a few days to become an "expert".
You don't want this. You want someone who contribute a lot of text
without being reverted to become an expert. You probably also want to
analyse the form of the changes so a lot of small spell checkings is
weighted even less than correcting a year. Ie., changing from something
that is possibly an misspelling into something right is weighted less
than changing something correctly spelled into something else that is
also correctly spelled.
John E Blad
Post by Luca de Alfaro
If you want to pick out the malicious changes, you need to flag also
small changes.
"Sen. Hillary Clinton did *not* vote in favor of war in Iraq"
"John Doe, born in *1947*"
The ** indicates changes.
I can very well make a system that is insensitive to small changes,
but then the system would also be insensitive to many kinds of
malicious tampering, and one of my goals was to make it hard for
anyone to change without leaving at laest a minimal trace.
So it's a matter of goals, really.
Luca
One thing that stood out for me in the small sample of articles I
examined was the flagging of innocuous changes by casual users to
correct spelling, grammar, etc. Thus a "nice-to-have" would be a
"smoothing" algorithm that ignores inconsequential changes such as
spelling corrections, etc. or the reordering of semantically-contained
units of text (for example, reordering the line items in a list w/o
changing the content of any particular line item, etc., or the
reordering of paragraphs and perhaps even sentences.) I think this
would cover 90% or more of changes that are immaterial to an article's
credibility.
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
------------------------------------------------------------------------
_______________________________________________
Wikiquality-l mailing list
http://lists.wikimedia.org/mailman/listinfo/wikiquality-l
Daniel Arnold
2007-12-21 23:38:35 UTC
Permalink
Post by John Erling Blad
I think you misunderstand, its not about not flagging small changes. Its
about to high learning rate on small changes. When it is to high you can
simply run a spell checking bot for a few days to become an "expert".
You don't want this.
This has already been noted by Luca himself: The current algorithms has two
weaknesses (better say optimisation potential as it is already quite good at
this unfinished stage):
* Sock puppets pushing "trust"
* Splitting of contributions into small edits

See
http://lists.wikimedia.org/pipermail/wikiquality-l/2007-December/000405.html
for details (also about possible fixes).

Furthermore as noted by Aaron and by me a combination of automated (trust
color code) and hand crafted (stable versions) system maybe will be able to
overcome the weaknesses of both concepts, see:
http://lists.wikimedia.org/pipermail/wikiquality-l/2007-December/000393.html
http://lists.wikimedia.org/pipermail/wikiquality-l/2007-December/000394.html

Arnomane
Loading...