The PayPal.com CAPTCHA suffers several weaknesses: fixed font face, fixed font size, no distortions, trivial background noise, and it’s easy to segment. In this experiment, a three step algorithm has been developed to break the PayPal CAPTCHA. The image is preprocessed to remove noise using thresholding and a simple cleaning technique, and then segmented using vertical projections and candidate split positions. Four classification methods have been implemented: pixel counting, vertical projections, horizontal projections and template correlations. The system was trained on a sample of twenty PayPal CAPTCHAs to create thirty-six training templates (one for each character: 0-9 and A-Z). A separate sample of 100 PayPal CAPTCHAs were used for testing. The following success rates have been achieved using the different classifiers: 8% pixel counting, vertical projections 97%, horizontal projections 100%, template correlations 100%. Three of the trained classifiers out perform the 88% success rate of Pwntcha.
- Grey Scale:
- Further Cleaning:
- Pixel Counting: 8% Break Rate
- Vertical Projections: 97% Break Rate
- Horizontal Projections: 100% Break Rate
- Template Correlations: 100% Break Rate
A copy of the slides used for a presentation of this experiment can be downloaded here.
The 20 training and 100 testing PayPal CAPTCHA images are available to download here.
Complete MATLAB code (281 lines, well commented) for preprocessing, segmenting, and classifying the images is available here.
Note that this video wasn’t created by me. Skip forward to approximately the 1 minute mark.