[SciPy-dev] Server spam problems spam spam: spam

Robert Kern robert.kern@gmail....
Mon Feb 23 19:58:27 CST 2009


On Mon, Feb 23, 2009 at 19:46, Pauli Virtanen <pav@iki.fi> wrote:
> Sun, 22 Feb 2009 13:40:20 -0800, Michael Abshoff wrote:
> [clip]
>> two tips of fighting spammers from the Sage project's wiki:
>>
>>   * add a list of common Chinese words to LocalBadContent, i.e.
>>
>> http://wiki.sagemath.org/LocalBadContent
>>
>> Also make sure to clean out all the spammer attempts on the hard disk.
>> I.e I deleted 6,000 directories in "pages" of the Cython wiki since Spam
>> attempts are preserved and not actually deleted from disk. If you have a
>> couple ten thousand of those in one directory this might make every wiki
>> access painfully slow and impact the whole server.
>
> Continuing Gael's work, I tried to expand the LocalBadContent list:
>
>        http://scipy.org/LocalBadContent
>
> I wonder how useful this turns out to be in the end, this smells like an
> arms race... I doubt the additions cause problems to real pages, but if
> they do, some of them need to be reverted.
>
> [Btw, shouldn't LocalBadContent editing be restricted to those in
> EditorGroup? And could my account PauliVirtanen be added in the group?]

Done and done.

> Another thing is that there are apparently ca. 11600 pages in the
> Scipy.org wiki. I'd make a wild guess that at most ~500 of these are
> valid content; the rest is spam. I'm not sure if getting rid of the spam
> pages improves Moin's performance.

Probably. Are you volunteering? Peter can give you a shell account. If
you are willing to take on the other upgrades Michael recommended, to
add the Captcha, for instance, that would go well, too.

> Do we have any valid pages with CJK characters? Much of the spam seems
> Chinese, so mass-deleting at least this portion of it shouldn't be
> impossible to do, given Moin's database format.

The Chinese localized Moin help pages are valid, but that should be it.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


More information about the Scipy-dev mailing list