philwilson.org

Wikispam

14 December, 2004

I’ve always known that wikispam exists, just like comment spam exists, but up until now I’ve never run into much of either (all praise the Blogger walled-garden!). In the last few days however, I’ve been learning a little bit about Groovy, where most of their documentation resides in a MoinMoin wiki (there is also some duplication in their Confluence wiki, but the MoinMoin one seems to have more content overall), and it suffers badly from wikispam. By looking at recent changes it’s easy to see that there are a good number of people who do their best to clean it up (for my part, I do what I can), but there’s only so fast a human can act when they’re working against (what looks like) a bot.

screenshot of wiki changes by a spambot

From 7.30am to 9.30am someone, or some bot is just following every single link on the wiki and replacing the content of the page it finds with links to all sorts of junk, presumably to try and improve their PageRank. Of course what this means practically, is that while the wiki is defaced, no-one can learn anything about Groovy (assuming they’d want to), and of course it’s not an isolated incident, it happens to all kinds of wikis across the internet, denying access to their content (of course, if the content is that important then you could ask “what’s it doing on a wiki anyway?” but that’s beside the point). It’s a complete mess of a situation where you want to grant free access to legitimate users but also need to deny access to these spambots (it’s also an old subject which I’m regretting getting into, but that’s life :).

Step up to the plate, WikiMinion (via Abject Learning) which is a Bot by RichardP which cleans wiki spam. Very good it looks too.

The code uses two approaches for identifying edits made by spammers – examining external links and examining source IP addresses. In both approaches if a clean version can be identified the page is reverted back to it. Likewise, if all versions of a page appear to created by a spammer, the page is marked for deletion.

To my simple mind, that seems like an excellent first-step tool for maintaining existing wikis. Maybe the next step should be to incorporate this kind of thing into wiki software itself so that the changes by spambots can’t be made in the first place. It would remove having to set up authentication for a wiki (which would kind of defeat the point) whilst inconveniencing the fewest number of people (I guess there would be IP clashes when the spambot learns it has to spoof the IP of someone who has already made a successful changes). The Meatball Wiki also has a list of other possible solutions and in fact it turns out there’s plenty of discussion on the MoinMoin site itself about how to combat spam. If only I’d read that before starting to write this, eh? 🙂

See other posts tagged with general and other posts made in December 2004.

Latest posts: