Breaking the PayPal.com CAPTCHA
captcha, presentations, projects, publications May 12th, 2008![]()
The PayPal.com CAPTCHA suffers several weaknesses: fixed font face, fixed font size, no distortions, trivial background noise, and it’s easy to segment. In this experiment, a three step algorithm has been developed to break the PayPal CAPTCHA. The image is preprocessed to remove noise using thresholding and a simple cleaning technique, and then segmented using vertical projections and candidate split positions. Four classification methods have been implemented: pixel counting, vertical projections, horizontal projections and template correlations. The system was trained on a sample of twenty PayPal CAPTCHAs to create thirty-six training templates (one for each character: 0-9 and A-Z). A separate sample of 100 PayPal CAPTCHAs were used for testing. The following success rates have been achieved using the different classifiers: 8% pixel counting, vertical projections 97%, horizontal projections 100%, template correlations 100%. Three of the trained classifiers out perform the 88% success rate of Pwntcha.
Example
Preprocess
- Original:

- Grey Scale:

- Thresholding:

- Further Cleaning:

Segment
- Segmented:

- Padded:

Classify
- Pixel Counting: 8% Break Rate
- Vertical Projections: 97% Break Rate
- Horizontal Projections: 100% Break Rate
- Template Correlations: 100% Break Rate
Paper
The final paper including MATLAB source code, sample runs, and results can be downloaded here or from the RIT Digital Media Library.
Presentation
A copy of the slides used for a presentation of this experiment can be downloaded here.
Data
The 20 training and 100 testing PayPal CAPTCHA images are available to download here.
Source Code
Complete MATLAB code (281 lines, well commented) for preprocessing, segmenting, and classifying the images is available here.
YouTube Video
Note that this video wasn’t created by me. Skip forward to approximately the 1 minute mark.
I graduated from the
August 13th, 2008 at 4:19 pm
Dude, you need to make your CAPTCHAS harder. They’re too easy to solve. Seriously
August 15th, 2008 at 12:50 am
@slate: The CAPTCHA which I have on the comment system was not developed by me. It’s a system called reCAPTCHA which helps digitize un-readable words from old books. You can read more @ recaptcha.net.
November 18th, 2008 at 5:12 am
niet te kraken
January 26th, 2009 at 12:50 am
Nice work.
How would you crack the captcha described here?
http://ambigrams.flipscript.com/ambigram-captcha/
February 22nd, 2009 at 11:30 pm
@Mark: The Ambigram CAPTCHA is a neat idea, but unfortunately I think it fails to meet the criteria of a valid CAPTCHA. The four desirable properties of a CAPTCHA are (see page 16 of http://www.kloover.com/thesis/thesis.pdf for a complete description):
1. Automated
2. Open
3. Usable
4. Secure
The Ambigram CAPTCHA is neither automated (how would a computer automatically generate the challenges?) or usable (in my opinion…I could only read 1 of the words in the example provided). Since it doesn’t meet 2 of the required properties, it doesn’t even matter if it’s secure/unbreakable.
Thanks for passing the link along though. It’s always interesting to see stuff like that.
September 24th, 2009 at 9:06 pm
@slate
May be a bit late, but reCaptcha has proven its usefulness. Far from easy to break.
January 13th, 2010 at 12:58 am
There is a moving 3-D captcha called Emergence Technology coming up!!
February 9th, 2013 at 2:34 pm
[...] Breaking the Paypal.com CAPTCHA [...]
February 15th, 2013 at 12:50 pm
[...] Breaking the Paypal.com CAPTCHA [...]
February 18th, 2013 at 11:56 am
[...] Breaking the Paypal.com CAPTCHA [...]