Breaking the PayPal.com CAPTCHA
captcha, presentations, projects, publications May 12th, 2008![]()
The PayPal.com CAPTCHA suffers several weaknesses: fixed font face, fixed font size, no distortions, trivial background noise, and it’s easy to segment. In this experiment, a three step algorithm has been developed to break the PayPal CAPTCHA. The image is preprocessed to remove noise using thresholding and a simple cleaning technique, and then segmented using vertical projections and candidate split positions. Four classification methods have been implemented: pixel counting, vertical projections, horizontal projections and template correlations. The system was trained on a sample of twenty PayPal CAPTCHAs to create thirty-six training templates (one for each character: 0-9 and A-Z). A separate sample of 100 PayPal CAPTCHAs were used for testing. The following success rates have been achieved using the different classifiers: 8% pixel counting, vertical projections 97%, horizontal projections 100%, template correlations 100%. Three of the trained classifiers out perform the 88% success rate of Pwntcha.
![]()
Paper
The final paper including MATLAB source code, sample runs, and results can be downloaded here.
Presentation
A copy of the slides used for a presentation of this experiment can be downloaded here.
Data
The 20 training and 100 testing PayPal CAPTCHA images are available to download here.
I recently graduated from the Rochester Institute of Technology where I received both a BS and MS in Computer Science. My research interests are CAPTCHAs, web security, pattern recognition, and machine learning. My MS
August 13th, 2008 at 4:19 pm
Dude, you need to make your CAPTCHAS harder. They’re too easy to solve. Seriously
August 15th, 2008 at 12:50 am
@slate: The CAPTCHA which I have on the comment system was not developed by me. It’s a system called reCAPTCHA which helps digitize un-readable words from old books. You can read more @ recaptcha.net.