[SciPy-dev] cleaning out wiki spam

Fernando Perez fperez.net@gmail....
Wed Feb 25 13:29:17 CST 2009

On Wed, Feb 25, 2009 at 4:20 AM, Peter Wang <pwang@enthought.com> wrote:
> On Feb 25, 2009, at 1:20 AM, Fernando Perez wrote:
>> Inspired by this, I just went and nuked ~1900 out of the ipython one,
>> leaving only the 128 that are probably for real.  I hope this helps
>> also reduce the load a bit more.
> Great, thank you!  One thing that occurs to me is that once you have a
> fairly high ratio of ham to spam, it might be worth saving the
> directory listing into a base "goodpages.txt" that can then be used as
> a whitelist filter in the future when blowing away spam via regexes.
> (Hopefully we won't have to do that on this scale again, but if
> history is any indicator, spammers always find a way...)

Good idea, I just did it (in fact it's only 97 long, I cleaned up a
few more after sending my email, so those are really 'pure ham' now,
since I checked every one of them).

BTW, I'm sure you have your tools by now for the cleanup, but in case
this is useful, here's the little script I used.  I found it easier to
check interactively in small batches by pattern rather than doing one
giant regexp run:


It still takes time, since you have to look for false positives.

In any case, many thanks for all your work, the moin wikis do feel
already a LOT more responsive.  I don't know how many times in the
last few weeks I got timeout errors on the scipy cookbook, and now
it's fairly snappy.  This was a real problem, and it's much better



More information about the Scipy-dev mailing list