[Wikitech-l] many-wiki spam solution

Tels nospam-abuse at bloodgate.com
Sat Mar 19 10:02:49 UTC 2005


-----BEGIN PGP SIGNED MESSAGE-----

Moin,

On Friday 18 March 2005 21:20, Evan Prodromou wrote:
> On Thu, 2005-17-03 at 20:12 -0700, Brian wrote:
> > Small wikis no-doubt get the most spam. I run several smallish wikis
> > and removing spam from them is nearly an everyday occurance. Take for
> > instance the bomis wiki. [1]. Scroll down to the bottom and take a
> > look at how hard it is getting hit (that's use mod)
> >
> > So the idea is to build into the mediawiki software an interwiki
> > watchlist, hosted by the foundation. This requires little
> > modification as we can use the RSS feed that is already generated by
> > the software and found at RC.  A simple opt-in during the install of
> > the software and a message will be sent home to retrieve the feed
> > every so often.
> >
> > What it is appropriate to do after that, i'm not sure. But there are
> > a lot of ways it could go.
> >
> > Just wanted to throw this out there and see if it has merit. I think
> > it is a simple solution at identifying lots of spam around the web.
>
> It's probably worth noting that there's already an excellent shared
> regexp list here:
>
> http://www.emacswiki.org/cw/BannedContent

Technical note:

	...
	foo\.com
	foos?\.com
	foos.com
	...

This list contains many rendund regexps, foos?\.com will match exactly 
(and only) foo.com and foos.com, so there is no need to list them again. 
Likewise, it would probably a good idea to combine many of the regexps 
like:

	...
	foo\.com
	bar\.com
	...

into	(foo|bar)\.com - this would reduce the number of matches to be done,
and so speed up the matching process. Similiar:

	foo\.(com|net|org)

instead of

	foo\.com
	bar\.com

Of course, combined regexps are harder to edit, but matching 10000 
small/short regexps takes way longer than matching 100 one with 100 
alternations each.

Best wishes,

Tels


- --
 Signed on Sat Mar 19 10:51:37 2005 with key 0x93B84C15.
 Visit my photo gallery at http://bloodgate.com/photos/
 PGP key on http://bloodgate.com/tels.asc or per email.

 "My other computer is your Windows box." -- Dr. Brad (19034) on
 2004-08-13 at /.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iQEVAwUBQjv4yXcLPEOTuEwVAQFxPwf8ConMNlhe8+i7KQjDE0Jr9XWtBRG+6e4J
WbV2405pdZihl+cVyuGI5vVIYDKu/M4Pa2F06hrquNioZJ30AHx0ZIp7Cmvbcv8Z
L8BA7rfSe5saoKU780F/8UbyzmwyjXWhNfMp6+vMWDpHneBpePTqADWnuY55PcYl
wIviFAyRcCiCnQiuimUrrNOxl8IHPl3Ak/HkF/+g7ayOcJOuTmP+DQctR75NfXcD
elUXdUgBVKbQXxyEhVJ9CLn7zOGQfai2pdGVGf0HjVsubg8R5MTR2ONmG5v9YtWb
HNXB/TzjQpgB+emzPozOk7vOjZAnTXdDi3Wlf7sNhBqcSb4oHCGd2g==
=6dTE
-----END PGP SIGNATURE-----



More information about the Wikitech-l mailing list