On 12/21/06, Fastfission <fastfission(a)gmail.com> wrote:
On 11/24/06, Earle Martin
<wikipedia(a)downlode.org> wrote:
Whether the copyvio is an inward or outward bound
one in each case is
sadly beyond the scope of my programming skills, so I leave that to
you.
I don't think this is a programming program -- its a conceptual problem.
A good copyvio bot -- one which doesn't waste one's time with false
positives or outward copyvios -- would be one which monitors NEW
additions and did not try to parse previously existing material. If
someone says, "This is new, original text" but it gets Google hits, it
is almost certainly copy-and-pasted (whether that makes it officially
a copyvio still needs to be decided, but it is a vastly simpler
problem than the previous one).
This is already being done
Trying to go through the entire database by finding
random pages and
taking random lines seems extremely hit-and-miss to me, and if you
have to worry about mirrors and false positives then I can't see how
that would possibly be productive. The odds of finding a copyvio are
going to be quite low, and the amount of time needed to sort through
them is going to be quite high.
Daniel Brandt managed it.
--
geni