You are invited to try a new video-based CAPTCHA developed within the Computer Science Department at RIT. A CAPTCHA is a challenge designed to distinguish humans from computer programs (’bots’) on the internet; they are typically implemented as a string of distorted characters which must be transcribed.
Many people find the text-based CAPTCHAs frustrating, so we have developed a video-based alternative. In our Video CAPTCHAs, a user must quickly label a video with three tags (words) describing its content.
We would appreciate it if you could help us evaluate the usability of this new approach by completing 20 Video CAPTCHAs. The experiment will only take about 15 minutes of your time. The task may be found at:
The PayPal.com CAPTCHA suffers several weaknesses: fixed font face, fixed font size, no distortions, trivial background noise, and it’s easy to segment. In this experiment, a three step algorithm has been developed to break the PayPal CAPTCHA. The image is preprocessed to remove noise using thresholding and a simple cleaning technique, and then segmented using vertical projections and candidate split positions. Four classification methods have been implemented: pixel counting, vertical projections, horizontal projections and template correlations. The system was trained on a sample of twenty PayPal CAPTCHAs to create thirty-six training templates (one for each character: 0-9 and A-Z). A separate sample of 100 PayPal CAPTCHAs were used for testing. The following success rates have been achieved using the different classifiers: 8% pixel counting, vertical projections 97%, horizontal projections 100%, template correlations 100%. Three of the trained classifiers out perform the 88% success rate of Pwntcha.
Example
Preprocess
Original:
Grey Scale:
Thresholding:
Further Cleaning:
Segment
Segmented:
Padded:
Classify
Pixel Counting: 8% Break Rate
Vertical Projections: 97% Break Rate
Horizontal Projections: 100% Break Rate
Template Correlations: 100% Break Rate
Paper
The final paper including MATLAB source code, sample runs, and results can be downloaded here.
Presentation
A copy of the slides used for a presentation of this experiment can be downloaded here.
Data
The 20 training and 100 testing PayPal CAPTCHA images are available to download here.
Source Code
Complete MATLAB code (281 lines, well commented) for preprocessing, segmenting, and classifying the images is available here.
As part of my Artifical Intelligence course, we developed a rule-based expert system that can autonomously govern a building’s environment to optimize user comfort and energy consumption, whilst providing safety and monitoring functions. The expert system has been developed using the Java programming language and the Java Expert System Shell (JESS). Rules are stored as an external resource and can be modified in real time without requiring a rebuild of the entire project. Write-up 1 includes problem description, design considerations, and implementation details. Write-up 2 includes testing results and a comparison to another system.
For my Computer Vision course project, I implemented the seam carving technique by Shai Avidan of Mitsubishi Electronic Research Labs and Ariel Shamir of The Interdisciplinary Center and MERL. My final paper, presentation, and code for my seam carving project is now available.
I developed a mIRC script to download, parse, and display news headlines from major news websites. It was basically a rudimentary RSS news reader. Approximately 3400 downloads to date. It can be downloaded here.
I recently graduated from the Rochester Institute of Technology where I received both a BS and MS in Computer Science. My research interests are CAPTCHAs, web security, pattern recognition, and machine learning. My MS thesis is on Video CAPTCHAs, where instead of transcribing a distorted string of text, users are asked to tag a video. I am currently a Software Engineer in Test working at Google in Manhattan.
Recent Comments