On our sister site CenterNetworks, we use the open source content management system (CMS) Drupal. I’ve used nearly every CMS package, both free and the systems costing large sums of money and Drupal is my favorite.
On CN we get hammered with spam every day. So much spam that I had to turn off comments after three months because every post (some 2000+) gets hit.
One thing I’ve noticed is that within moments of posting some new content, spams begin to come in. Not minutes, hours or days, but moments. And after doing some research into our system and into the way Drupal handles content, I’ve realized why this happens.
Drupal basically offers you two URL structures: clean or not clean. The not clean structure looks like this: http://www.centernetworks.com/node/x where x=post number or the clean structure looks like this: http://www.centernetworks.com/lending-club-social-lending. We use the clean structure on CN.
Update: a commenter points out that the clean urls and the pathauto module is what creates the structure we use not just the clean urls.
When you select the clean structure, the not clean structure still exists as well. As an aside, this creates the potential for huge duplicate content. WordPress has similar issues as well but not to this extent. While it would be great to somehow block access to /node, the issue with doing so is that /node/x/edit is the way to edit a story.
Here is what the spammers do: they send spam to not yet created /node/x links. Let’s assume that /node/4100 is the latest post. They know this and send spam to /node/4101, 4102, etc. and then when those posts are created, the spam appears. Pretty freaking smart.
If anyone has ideas on how to beat the spammers using this method, please let me know. I refuse to install a captcha as I tested with and without and comment levels drop with.