Google Translate in Beta (for a reason)

Posted by Kurt on August 24th, 2008

The Google Translate service is quiet useful. However, I just ran into this little bug when playing around with it. If you submit a chunk of English text and ask it to detect the language and then translate it to English, it brings up an warning saying that they are “not yet able to translate from English to English”. Whoops :) I guess it’s in Beta for a reason.

Click the thumbnail for a full-sized image.

Video CAPTCHA Demo

Posted by Kurt on August 22nd, 2008

If you haven’t already, check out the Video CAPTCHA demo at:

http://sudbury.cs.rit.edu/

There’s no entrance survey, exit survey and you can quit at any time (don’t feel obligated to finish all 22 videos).  Enjoy!

Evaluating the Usability and Security of a Video CAPTCHA

Posted by Kurt on August 18th, 2008

I just scheduled the time and location for my thesis defense. Everyone is welcome to come, watch, and try to stump me with questions. Hope to see you there! -Kurt

Thesis Statement

One can increase usability while maintaining security in a video CAPTCHA by intelligently extending the set of user-supplied and ground truth tags.

Abstract

A CAPTCHA is a variation of the Turing test, in which a challenge is used to distinguish humans from computers (”bots”) on the internet. They are commonly used to prevent the abuse of online services. CAPTCHAs discriminate using hard artificial intelligence problems: the most common type requires a user to transcribe distorted characters displayed within a noisy image. Unfortunately, many users find them frustrating and break rates as high as 60% have been reported (for Microsoft’s Hotmail).

We present a new CAPTCHA in which users provide three words (”tags”) that describe a video. A challenge is passed if a user’s tag belongs to a set of automatically generated ground-truth tags. In an experiment, we were able to increase human pass rates for our video CAPTCHAs from 69.7% to 90.2% (184 participants over 20 videos). Under the same conditions, the pass rate for an attack submitting the three most frequent tags (estimated over 86,368 videos) remained nearly constant (5% over the 20 videos, roughly 12.9% over a separate sample of 5146 videos). Challenge videos were taken from YouTube.com. For each video, 90 tags were added from related videos to the ground-truth set; security was maintained by pruning all tags with a frequency ≥ 0.6%. Tag stemming and approximate matching were also used to increase human pass rates. Only 20.1% of participants preferred text-based CAPTCHAs, while 58.2% preferred our video-based alternative.

Finally, we demonstrate how our technique for extending the ground truth tags allows for different usability/security trade-offs, and discuss how it can be applied to other types of CAPTCHAs.

Thesis Committee

Thesis Defense

Time: Thursday, August 28, 2008 at 10:00 a.m.
Location: Building 70, Room 3000

Downloads

Live Demo

http://sudbury.cs.rit.edu/

Bibtex Entry

@mastersthesis{KlueverMastersThesis,
	Title = {Evaluating the Usability and Security of a Video CAPTCHA},
	Author = {Kurt Alfred Kluever},
	School = {Rochester Institute of Technology},
	Address = {Rochester, NY, USA},
	Month = {August},
	Year = {2008}
}

My Hobby: Tossing whiteboard markers

Posted by Kurt on August 10th, 2008

In the spirit of the many xkcd comics, here is one of my (new) hobbies: tossing whiteboard markers. In the massive amounts of time I spend alone in my lab, I’ve developed a new game similar to an egg toss to entertain me while my code is executing. I’ll caution you in advance that it’s both extremely addicting and surprisingly loud. Oh yea, you will get some funny looks from anyone who witnesses the game.

The Game: Toss white board markers at the whiteboard and try to get them to land in the ledge/trough.

My current distance record is 18 feet, mostly because that’s the width of the lab.  I might have to open my door and start tossing them from the hallway…

Video CAPTCHA Experiment

Posted by Kurt on August 7th, 2008

You are invited to try a new video-based CAPTCHA developed within the Computer Science Department at RIT. A CAPTCHA is a challenge designed to distinguish humans from computer programs (’bots’) on the internet; they are typically implemented as a string of distorted characters which must be transcribed.

Many people find the text-based CAPTCHAs frustrating, so we have developed a video-based alternative. In our Video CAPTCHAs, a user must quickly label a video with three tags (words) describing its content.

We would appreciate it if you could help us evaluate the usability of this new approach by completing 20 Video CAPTCHAs. The experiment will only take about 15 minutes of your time. The task may be found at:

http://sudbury.cs.rit.edu/

Thank you very much for your time.

Regards,
Kurt Alfred Kluever (MS Student)
Richard Zanibbi (Supervisor)

Document and Pattern Recognition Lab
Department of Computer Science
Rochester Institute of Technology

Top Ten Most Frequent YouTube Tags

Posted by Kurt on August 5th, 2008

Over the past few months, I’ve been writing tons of code to perform analysis on YouTube tags. Luckily, they have a great API available in Java, .NET, Python, and PHP. I needed to obtain estimated tag frequencies on videos, but I couldn’t find this data available anywhere else online.  I ended up having to calculate them myself, so I thought I’d share the results:

  1. music, 1.78%
  2. video, 1.75%
  3. funny, 1.63%
  4. rock, 1.28%
  5. de, 1.11%
  6. dance, 0.95%
  7. film,0.88%
  8. 2008,0.82%
  9. live, 0.78%
  10. 2007, 0.77%

The list of tags to query was obtained by merging a standard Linux dictionary and the set of tags discovered during a random walk of the YouTube graph. The API was then queried to determine the estimated number of videos which contain the given tag. The above list is only a partial list, but the full list is available upon request.

Switched to WordPress Platform

Posted by Kurt on July 15th, 2008

Well I finally switched to a real blogging platform, namely WordPress. Since it will be a lot easier to post content, that means I should be posting more often (in theory at least). Also, I stole my theme choice from my friend’s blog @ RespectCapitalism.com. There are very few good looking WP themes in my opinion…sorry Ben :)

Video Tagging Experiment

Posted by Kurt on July 1st, 2008

Background:

As many of you know, I am in the process of completing my MS thesis in Computer Science at RIT. My area of research is online human verification (i.e., proving that a human is behind an online request, and not an automated computer program). When completing an online form, users are often presented with a distorted string of text which they are forced to transcribe. These are known as CAPTCHAs, and exist to prevent automated programs from abusing online services (humans can read the distorted text but most computer programs cannot). My thesis idea is to create a Video CAPTCHA, where instead of transcribing a string of distorted text, users must supply an appropriate label for a short video (a challenge which computers cannot complete but humans should be able to).

What you can do for me:

I have setup an online data collection website, which will allow me analyze how people label (tag) online videos. You will be asked to quickly tag 20 short online videos. I would greatly appreciate your help in completing the short (10-15 minutes) experiment at the following link:

http://sudbury.cs.rit.edu/

Feel free to forward this request to anyone else you know who may be interested in participating. If you have any questions, please let me know. The experiment will remain open until July 14th, 2008.

Breaking the PayPal.com CAPTCHA

Posted by Kurt on May 12th, 2008

The PayPal.com CAPTCHA suffers several weaknesses: fixed font face, fixed font size, no distortions, trivial background noise, and it’s easy to segment. In this experiment, a three step algorithm has been developed to break the PayPal CAPTCHA. The image is preprocessed to remove noise using thresholding and a simple cleaning technique, and then segmented using vertical projections and candidate split positions. Four classification methods have been implemented: pixel counting, vertical projections, horizontal projections and template correlations. The system was trained on a sample of twenty PayPal CAPTCHAs to create thirty-six training templates (one for each character: 0-9 and A-Z). A separate sample of 100 PayPal CAPTCHAs were used for testing. The following success rates have been achieved using the different classifiers: 8% pixel counting, vertical projections 97%, horizontal projections 100%, template correlations 100%. Three of the trained classifiers out perform the 88% success rate of Pwntcha.

Example

Preprocess

  1. Original:
  2. Grey Scale:
  3. Thresholding:
  4. Further Cleaning:

Segment

  1. Segmented:
  2. Padded:

Classify

  • Pixel Counting: 8% Break Rate
  • Vertical Projections: 97% Break Rate
  • Horizontal Projections: 100% Break Rate
  • Template Correlations: 100% Break Rate

Paper

The final paper including MATLAB source code, sample runs, and results can be downloaded here or from the RIT Digital Media Library.

Presentation

A copy of the slides used for a presentation of this experiment can be downloaded here.

Data

The 20 training and 100 testing PayPal CAPTCHA images are available to download here.

Source Code

Complete MATLAB code (281 lines, well commented) for preprocessing, segmenting, and classifying the images is available here.

YouTube Video

Note that this video wasn’t created by me. Skip forward to approximately the 1 minute mark.

GCCIS Welcomes 1st Graders from Canandaigua

Posted by Kurt on May 5th, 2008

RIT hosted the Golisano College Kids of 2023 for an activity inspired by CS Unplugged. We had a ton of fun with the 26 first graders from Canandaigua Primary School and even taught them how to convert to and from binary!  There’s a short blurb about it in the GCCIS Women in Computing 2007/2008 Year in Review.  The class was celebrating their internationally award-winning video that promotes women in technology.  You can watch the video below:


Modified version of Webby Blue
Copyright © 2008 kloover.com. All rights reserved.
**This is my personal blog. The views expressed on these pages are mine alone and not those of my employer.**