[SciPy-dev] Server spam problems spam spam: spam

Michael Abshoff michael.abshoff@googlemail....
Tue Feb 24 01:31:49 CST 2009


Pauli Virtanen wrote:
> Sun, 22 Feb 2009 13:40:20 -0800, Michael Abshoff wrote:

Hi,

> [clip]
>> two tips of fighting spammers from the Sage project's wiki:
>>
>>   * add a list of common Chinese words to LocalBadContent, i.e.
>>
>> http://wiki.sagemath.org/LocalBadContent
>>
>> Also make sure to clean out all the spammer attempts on the hard disk.
>> I.e I deleted 6,000 directories in "pages" of the Cython wiki since Spam
>> attempts are preserved and not actually deleted from disk. If you have a
>> couple ten thousand of those in one directory this might make every wiki
>> access painfully slow and impact the whole server.
> 
> Continuing Gael's work, I tried to expand the LocalBadContent list:
> 
> 	http://scipy.org/LocalBadContent
> 
> I wonder how useful this turns out to be in the end, this smells like an 
> arms race... I doubt the additions cause problems to real pages, but if 
> they do, some of them need to be reverted.

We added those six or seven words to out Wiki setup for various wikis 
and they just work. Chinese spam attempts went from dozens a day to none 
that were successful. I just got tired of despamming the wiki since it 
made the RecentChanges useless to me, so I spend a lot of time cleaning 
out spammer accounts (a couple thousand in the end). Another thing I 
regularly do for some of the wikis is to delete auto generated spammer 
accounts, i.e. zkjefgkjq1 to zkjefgkjq102 at some Chinese ISP were 
somehow not connected to the Sage project ;). Since I manage four 
different wikis hosted at the same IP which widely different audiences 
(sage, MPIR, l-functions and cython) simultaneous registration at two or 
more of them when I never heard of the person leads to automatic 
deletion. This policy is possible because l-functions requires account 
holder to use names along the lines of first letter first name + last 
name and it is enforced. Doing that at the scipy wiki is probably not 
possible.

> [Btw, shouldn't LocalBadContent editing be restricted to those in 
> EditorGroup? And could my account PauliVirtanen be added in the group?]

No spammer has edited LocalBadContent ever in our wikis. I would do it 
since deleting it would obviously open the gates for spam.

> Another thing is that there are apparently ca. 11600 pages in the 
> Scipy.org wiki. I'd make a wild guess that at most ~500 of these are 
> valid content; the rest is spam. I'm not sure if getting rid of the spam 
> pages improves Moin's performance. 
> 
> Do we have any valid pages with CJK characters? Much of the spam seems 
> Chinese, so mass-deleting at least this portion of it shouldn't be 
> impossible to do, given Moin's database format.


Well, 11600 directories in one directory does not exactly improve the 
directory lookup time (assuming you are using sqlite). I just deleted

  rm -rf \(e[0/-9]*

but a visual inspection might be appropriate first.

Cheers,

Michael


More information about the Scipy-dev mailing list