RSS Feed twitter airdump.cz Follow RemoteSec on the Facebook
Remote Security Random Tips
News & Ads

Breaking CAPTCHA protection

captcha image hack Web pages are protecting it selfs from spammers by system called captcha (Completely Automated Public Turing test to tell Computers and Humans Apart). It is a test that decide if page is viewed by human or bot. Today it’s notorously known. Recongizing symbols from image, counting numbers or decide which picture doesn’t match with others. Captcha shouldn’t require any knowledge from user because that would only decide how much is user educated but doesn’t determine if is huma or not. That is why captcha works with abstraction.

It can be more difficult for disabled users. In the worst case computer with 9″ monochrome monitor without sound adapter. Let’s think about how to help all the people determined as non-human. Let the captcha break.

First captcha was text based simple task. How muxh is 4 + 7? What doesn’t match in the row? “apple, car, carrot, sky, chair, space-shuttle”. And so on. There was performed some tricks like display the number by a digit or a word (three + 12 is ?). Text captcha is limited by character set and length of captcha string. That decrements variability. Nobody will read a pages of text to prove humanity. That is why text captcha isn’t safe test to determine human. Little bit of regexp, vocab, light algorithm and bot transforms to human.

Next to come is popular visual captcha. Big advantage of visual captcha instead of audio captcha is that you can watch it quietly. First visual captcha was pure unmodifed printed text in the picture. No big deal for bot with OCR (Optical Character Recognition). Converting printed text into digital text form is wery common these days (see gocr). This method uses a lot http://books.google.com and warez e-books. Human thanks to his brain and mainly the brain part that processes the visual sense and imagination is cappable to discriminate shapes, faces, etc. That allows to do captcha more complicated to be solved by bot. But not for long time.

For visual disabled ones exists sound captcha. It’s quite rare. Sound is easy to compute medium so bot can handle it quite easily.
Enough of background. Let’s get some foreground.

Break the visual captcha can be sometimes easier than break text captcha. It depends on inteligence of captcha programmer. You should now about captcha how it’s made. Nothing is random (even random function isn’t random). Everything has it’s origin. In particular captcha as an image with simple printed symbols. These symbols are result of some algorithm. If we can determine the algoritmus we are able to forecast the symbols in captcha and solve it withous seeing it. Next important thing is to bind the right conclusion of captcha to each user. User should not get any clues to solving the captcha. All important data shouldn’t leave the server and user may only get the identificator.

Some particular examples.

http://vybrali.sme.sk/register

You will get captcha to solve after the sending the form. In cookies is nothing useful. Important information to us is the name of the captcha image ts_image.php?ts_random= By little experimenting we discover that captcha is generated by ts_random parameter which value we have got. Search some “ts_image.php” on google and yes you will get the source code of this captcha. In the source code is variable called $site_key which should contain unknown value to us to protect precisely what happens now. Everything is fine until we know the value of that variable. Unfortunate to the server sme.sk this variable is empty (discovered by tryout). To prove it:

http://vybrali.sme.sk/ts_image.php?ts_random=01020304
http://tst.airdump.net/sme.php?ts_random=01020304

Captcha working with sessions can be bypassed some ways.

– Fail with session manegent. This flaw births when the session isn’t destroyed after successful captcha solving. That means that we can use the “successful” session to pass next captcha until the session exists.

– Little hard and improbable but possible case can appear when we have the access to the storage where sessions are stored. Thats for instance account on the same machine where the captcha we are trying to break is and webserver runs on the same user. If it happens we can read all the sessions data on the server including the captcha solution we are looking for.

http://registrace.seznam.cz/register.py/stageZeroScreen?service=email

This is another case. We get captcha with it’s identificator that’s realy only used to identificate you. No clues, no rule flaw. Now is time to use OCR (see gocr). Image has some protection from to by OCRed. Texture in background and crooked symbols. First what we need is to mark off symbols from background and each other. That can be done by “convert” from ImageMagic. After we have extraced symbols from image we can regnize them now. Standard database from gocr wouldn’t be enough because of morphed symbols. So we will tech gocr to read them. It takes a while. Do as follows:

// download the image

GET ‘http://registrace.seznam.cz/captchaImage?hash=LSBBQLGKCP’ > captcha.gif

// extract the symbols

convert captcha.gif -gamma -10 -paint 2 -monochrome captcha.jpg

// teach the gocr how to read them

gocr -d 2 -p ./seznam/ -m 256 -m 130 captcha.jpg

Gocr parameters:

-d ignore the noise in the image

-p path to the database

-m modificator

|– 256 turn off the default database

|– 130 expands our database by new symbols

‘– 2 uses the symbols from defined database

With this settings gocr will ask you what every symbol means (in filling always store to the database – option 2 after the symbol recognition).

GET ‘http://registrace.seznam.cz/captchaImage?hash=XXXXXXXXXX’ > captcha.gif

Captcha

convert captcha.gif -gamma -10 -paint 2 -monochrome captcha.jpg

Captcha

gocr -p ./seznam/ -m 256 -m 2 captcha.jpg

HBXPV

Now only one question remains: Freeze! Who is there? Yes or no?

Similar Posts: