Thursday, February 15, 2007

The solution to comment spam

I think I have the solution for comment spam. You know, those annoying automated "comments" every moderately popular blog receives in bulk which advertise, well, all the usual things spam advertises.

I think most solutions today are severely lacking. Identifying number of links produces both false positives and false negatives, asking users to answer capchas is annoying, problematic for blind people, and sometimes machine readable anyways (and, when not, tends to be non-human readable either).

The solution I'm proposing isn't new, as such. It's just that I have never seen anyone apply it to comment spam, and I think it might work. It is, in a very abstract way, based on Merkel's puzzles. Here it is.

Introduce into every form that is meant for sending a comment to the blog a hidden field with a "seriousness proof number". This number will be different for each post in the blog, and will change once every few days (giving several hours of grace time for the old number to still be active). If a comment is posted with incorrect number, dump it without asking the moderator.

So far, I have not said anything really new. Spammers are already fairly adept at parsing the incoming HTML, identifying the authentication number, and making sure it appears in the spam they send. So far, we see, that this method is not very effective.

So, the next step in making this system more effective is to encrypt the authentication number. We'll send the authentication number AES encrypted, and send a small javascript program that decrypts the number and places it in the form.

Any and all of you who know anything about cryptography will twitch in pain at me calling this "encryption". After all, the thing that defines encryption is the (in)availability of the key, much more than the actual algorithm used. In order for the legitimate user to be able to post comment, the Javascript must provide the key for decryption. The spammers, after a couple of days, will simply teach their parsers to extract the key from the javascript, and use it to send their spams. We see that this twist can be effective for a single blogger protecting his own blog (security by anonymity), but not when implemented in standard platform, such as "blogger" or wordpress.

So we introduce a third modification to our plan. We now supply the authentication token encrypted, but we do not provide the key! Instead, we provide a javascript program that brute-forces the key. Of course, the encrypted text needs to contain a piece of known plain text, so that the program can tell when it successfully decrypted the key. Also, the key length must not be longer than the authentication token length, or it will be easier to brute force the token directly, rather than the key. Still, this method will surely keep the spammer out.

Wait, don't call me insane just yet. I'm not really serious. While it will keep the spammers out (who can afford to brute force 128 bit of key just to send spam?), it will also keep the legitimate commenters out (who can afford to brute force a 128 bit key just to send a comment?). However, we can now turn it into a competition over "who has more available resources in order to post a comment".

What I suggest is that we give each commenter all bits required to decrypt the authentication token BUT, say, 16. Brute forcing 216 isn't beyond the ability of any modern computer, and should not take too long either. However, for a spammer, this ups the cost of each comment sent, and thus reduces the number (and, therefor, economic interest) of spams sent.

Of course, the number "16" can be tweaked as necessary. However, I do believe that a number exists such that legitimate users will not find it onerous to post comments, while spammers will.

A few points to keep in mind:
  • This solution does not require user registration in order to comment.
  • There is no need for the actual user to do anything. Everything is done, automatically, by the computer.
  • In particular, this solution doesn't have any problems with blind terminals and other handicaps.
On the other hand, this solution does require Javascript to be enabled, and does require a significant CPU time investment. It's probably a good idea to form the javascript in such a way as it will only solve the puzzle if the user actually wants to post a comment, and not every time they view the blog.

Now, all that is missing is for someone who has the time to implement to do so....

Shachar