Robots In Disguise
In 1950, computer scientist Alan Turing wrote that within 50 years it would become difficult to distinguish between humans and computers. In 2004, Bill Gates claimed that it would take two years to solve the problem of spam e-mail.Turing turned out to be right, if a couple of years off on his projections. Gates seems to have been just plain wrong.
In 1950, computer scientist Alan Turing wrote that within 50 years it would become difficult to distinguish between humans and computers. In 2004, Bill Gates claimed that it would take two years to solve the problem of spam e-mail.
Turing turned out to be right, if a couple of years off on his projections. Gates seems to have been just plain wrong.
While free Web services produced just 6% of the Internet's pill-hawking and scam-pitching e-mail a year ago, automated spam programs exploiting Web e-mail services now produce more than 12% of digital junk mail, with Microsoft's Webmail services alone accounting for around 4%.
In Pictures: Protect Yourself From ID Theft
Because Webmail spam comes from a reputable company's servers, it's more likely to defeat filters and end up in users' inboxes. (And the volume of Web-based spam, according to MessageLabs, was affected far less than other spam by the disconnection of the notorious McColo hosting company from the Internet earlier in November, an event that wiped out about two-thirds of total spam volumes.)
The weak link that has allowed robots to turn Webmail into a spam hose? The CAPTCHA--that much-loathed image of distorted words and numbers found on some Web pages that users must type in order to create or access an account or post a message. A CAPTCHA--or "completely automated public Turing test to tell computer and humans apart"--helps Web sites keep out robot spammers (see "Meta Data: An Invisible CAPTCHA").
The problem, according to many security researchers, is that computers are now capable of far too many human abilities. In February, Web security firm Websense announced that CAPTCHAs at Google, Microsoft and Yahoo!'s Web e-mail services had all been broken. Company founder Jeremiah Grossman posted screen shots that documented a PC hijacked by cybercriminal software successfully reading the CAPTCHA to set up one Gmail account after another for spamming purposes.
Some security researchers even believe that the CAPTCHA-breaking software was using Tesseract--an open source image recognition software sponsored by Google itself and used in its book-scanning projects--to recognize the distorted characters.
Google, Microsoft and Yahoo! have all since made their CAPTCHAs more difficult to break. But in July, reports surfaced that the services' upgraded CAPTCHAs had also been defeated. After another round of patching, the makers of xRumer, a $450 spamming program, claimed in October that they had evaded the Gmail and Microsoft's CAPTCHAs yet again.
"It's a cat-and-mouse that has no end," says David Dagon, a professor of cyber security at Georgia Tech. "The cyber security challenge of the next decade will be distinguishing humans and computers."
Some computer scientists are trying more creative approaches. Microsoft last year introduced a CAPTCHA that asked users to look at a grid of nine images of dogs and cats and challenged them to identify the cats. Another experiment showed users pictures of women or men pulled from the rating site "Hotornot.com" and asked the user to choose which three were "hot."
The trouble with those clever tests of humanity, according to hackers analyzing the CAPTCHA on the sla.ckers.org message board, is that both images sources are public. It's too simple, writes hacker Robert Hansen, also known as rSnake, to find the source of the images, index them with the same tools the CAPTCHA-builders used and exploit that index to break the puzzle.
More generally, CAPTCHAs have a bigger problem: Teams of people in India and China working to solve the puzzles for pennies. Some sites even import CAPTCHAs and tempt users to break them in exchange for a pornographic image and then export the results back to a spambot.
But even setting aside that human-based cheating aside, the challenge of building a CAPTCHA that can't be understood by automated software has become far more complex. In August, the National Federation of the Blind won a $6 million settlement against retail giant Target for using a CAPTCHA on its site that couldn't be passed by the sight-impaired. Because blind Web surfers use automatic reading software to interpret pages, CAPTCHAs represented insurmountable walls on Target's site.
Now, thanks to the Target ruling, CAPTCHAs must also offer an audio option--a distorted recording of letters and numbers read aloud that's often far easier for a computer to interpret than a distorted image.
In a presentation at August's DefCon hacker conference, security researcher Michael Brooks showed how those audio CAPTCHAs can be disassembled by computers. Brooks demonstrated a program he'd written that guesses the audio file's contents, overlaying those guesses on the pitch level chart of the noisy audio file. By analyzing the distance between points on the CAPTCHA's distorted pitch chart and the pitch charts of possible answers to the puzzle, his program was able to remove the noise and "listen" to the file just as accurately as a human.
Luis von Ahn, one of the original inventors of the CAPTCHA and a professor at Carnegie Mellon University, says there's still hope for the battle to weed out spambots. Last year he founded reCaptcha, a new take on the CAPTCHA that he says has yet to be broken and is already being used on thousands of sites, including Facebook, Twitter and Craigslist.
ReCaptcha takes a new approach: The project scans and digitizes thousands of pages of real-world text, including old books and the archives of The New York Times. When its image-recognition software can't interpret a passage of aged or faded text, it further distorts the image and serves it up as a CAPTCHA. That means the project not only produces an enormous output of digitized records for preservation purposes--it also filters text to find far more difficult words for computers to recognize.
"Humans still get our tests right about 96% of the time," von Ahn says. "Amazingly they're still much better than computers at some things."
But reCaptcha still has to deal with the same problem as every other test of human abilities: access for the blind. And von Ahn admits his audio CAPTCHA is far harder for humans to solve than its visual counterpart: While computers can't crack it, humans also get it wrong 30% of the time.
In a test of my own superiority to robots, I decided to submit myself to reCaptcha's audio test. The first time I played the file, the stream of sounds came out as a garble of indecipherable voices, words played in reverse and static. I guessed at the answer to the test--and failed.
But listening to the audio CAPTCHA again just a few seconds later, numbers began to emerge from the noisy mess, spoken by a mix of high and low voices. By the fourth time through, I could pass the test consistently. In a few minutes I had been labeled human by reCaptcha's audio test seven out of 10 times, just as von Ahn had predicted. My brain had done something that fixed software algorithms dont--it had adapted.
Perhaps there's hope for humanity yet.
In Pictures: Protect Yourself From ID Theft
When Everyone Can Mine Your Data
Economic Bust, Cybercrime Boom
Banking's Security Crisis