[SciPy-dev] cleaning out wiki spam
Wed Feb 25 13:29:17 CST 2009
On Wed, Feb 25, 2009 at 4:20 AM, Peter Wang <firstname.lastname@example.org> wrote:
> On Feb 25, 2009, at 1:20 AM, Fernando Perez wrote:
>> Inspired by this, I just went and nuked ~1900 out of the ipython one,
>> leaving only the 128 that are probably for real. I hope this helps
>> also reduce the load a bit more.
> Great, thank you! One thing that occurs to me is that once you have a
> fairly high ratio of ham to spam, it might be worth saving the
> directory listing into a base "goodpages.txt" that can then be used as
> a whitelist filter in the future when blowing away spam via regexes.
> (Hopefully we won't have to do that on this scale again, but if
> history is any indicator, spammers always find a way...)
Good idea, I just did it (in fact it's only 97 long, I cleaned up a
few more after sending my email, so those are really 'pure ham' now,
since I checked every one of them).
BTW, I'm sure you have your tools by now for the cleanup, but in case
this is useful, here's the little script I used. I found it easier to
check interactively in small batches by pattern rather than doing one
giant regexp run:
It still takes time, since you have to look for false positives.
In any case, many thanks for all your work, the moin wikis do feel
already a LOT more responsive. I don't know how many times in the
last few weeks I got timeout errors on the scipy cookbook, and now
it's fairly snappy. This was a real problem, and it's much better
More information about the Scipy-dev