Google Translate in Beta (for a reason)

Posted by Kurt on August 24th, 2008

The Google Translate service is quiet useful. However, I just ran into this little bug when playing around with it. If you submit a chunk of English text and ask it to detect the language and then translate it to English, it brings up an warning saying that they are “not yet able to translate from English to English”. Whoops :) I guess it’s in Beta for a reason.

Click the thumbnail for a full-sized image.

Video CAPTCHA Demo

Posted by Kurt on August 22nd, 2008

If you haven’t already, check out the Video CAPTCHA demo at:

http://sudbury.cs.rit.edu/

There’s no entrance survey, exit survey and you can quit at any time (don’t feel obligated to finish all 22 videos).  Enjoy!

Evaluating the Usability and Security of a Video CAPTCHA

Posted by Kurt on August 18th, 2008

I just scheduled the time and location for my thesis defense. Everyone is welcome to come, watch, and try to stump me with questions. Hope to see you there! -Kurt

Thesis Statement

One can increase usability while maintaining security in a video CAPTCHA by intelligently extending the set of user-supplied and ground truth tags.

Abstract

A CAPTCHA is a variation of the Turing test, in which a challenge is used to distinguish humans from computers (”bots”) on the internet. They are commonly used to prevent the abuse of online services. CAPTCHAs discriminate using hard artificial intelligence problems: the most common type requires a user to transcribe distorted characters displayed within a noisy image. Unfortunately, many users find them frustrating and break rates as high as 60% have been reported (for Microsoft’s Hotmail).

We present a new CAPTCHA in which users provide three words (”tags”) that describe a video. A challenge is passed if a user’s tag belongs to a set of automatically generated ground-truth tags. In an experiment, we were able to increase human pass rates for our video CAPTCHAs from 69.7% to 90.2% (184 participants over 20 videos). Under the same conditions, the pass rate for an attack submitting the three most frequent tags (estimated over 86,368 videos) remained nearly constant (5% over the 20 videos, roughly 12.9% over a separate sample of 5146 videos). Challenge videos were taken from YouTube.com. For each video, 90 tags were added from related videos to the ground-truth set; security was maintained by pruning all tags with a frequency ≥ 0.6%. Tag stemming and approximate matching were also used to increase human pass rates. Only 20.1% of participants preferred text-based CAPTCHAs, while 58.2% preferred our video-based alternative.

Finally, we demonstrate how our technique for extending the ground truth tags allows for different usability/security trade-offs, and discuss how it can be applied to other types of CAPTCHAs.

Thesis Committee

Thesis Defense

Time: Thursday, August 28, 2008 at 10:00 a.m.
Location: Building 70, Room 3000

Downloads

Live Demo

http://sudbury.cs.rit.edu/

Bibtex Entry

@mastersthesis{KlueverMastersThesis,
	Title = {Evaluating the Usability and Security of a Video CAPTCHA},
	Author = {Kurt Alfred Kluever},
	School = {Rochester Institute of Technology},
	Address = {Rochester, NY, USA},
	Month = {August},
	Year = {2008}
}

My Hobby: Tossing whiteboard markers

Posted by Kurt on August 10th, 2008

In the spirit of the many xkcd comics, here is one of my (new) hobbies: tossing whiteboard markers. In the massive amounts of time I spend alone in my lab, I’ve developed a new game similar to an egg toss to entertain me while my code is executing. I’ll caution you in advance that it’s both extremely addicting and surprisingly loud. Oh yea, you will get some funny looks from anyone who witnesses the game.

The Game: Toss white board markers at the whiteboard and try to get them to land in the ledge/trough.

My current distance record is 18 feet, mostly because that’s the width of the lab.  I might have to open my door and start tossing them from the hallway…

Video CAPTCHA Experiment

Posted by Kurt on August 7th, 2008

You are invited to try a new video-based CAPTCHA developed within the Computer Science Department at RIT. A CAPTCHA is a challenge designed to distinguish humans from computer programs (’bots’) on the internet; they are typically implemented as a string of distorted characters which must be transcribed.

Many people find the text-based CAPTCHAs frustrating, so we have developed a video-based alternative. In our Video CAPTCHAs, a user must quickly label a video with three tags (words) describing its content.

We would appreciate it if you could help us evaluate the usability of this new approach by completing 20 Video CAPTCHAs. The experiment will only take about 15 minutes of your time. The task may be found at:

http://sudbury.cs.rit.edu/

Thank you very much for your time.

Regards,
Kurt Alfred Kluever (MS Student)
Richard Zanibbi (Supervisor)

Document and Pattern Recognition Lab
Department of Computer Science
Rochester Institute of Technology

Top Ten Most Frequent YouTube Tags

Posted by Kurt on August 5th, 2008

Over the past few months, I’ve been writing tons of code to perform analysis on YouTube tags. Luckily, they have a great API available in Java, .NET, Python, and PHP. I needed to obtain estimated tag frequencies on videos, but I couldn’t find this data available anywhere else online.  I ended up having to calculate them myself, so I thought I’d share the results:

  1. music, 1.78%
  2. video, 1.75%
  3. funny, 1.63%
  4. rock, 1.28%
  5. de, 1.11%
  6. dance, 0.95%
  7. film,0.88%
  8. 2008,0.82%
  9. live, 0.78%
  10. 2007, 0.77%

The list of tags to query was obtained by merging a standard Linux dictionary and the set of tags discovered during a random walk of the YouTube graph. The API was then queried to determine the estimated number of videos which contain the given tag. The above list is only a partial list, but the full list is available upon request.


Modified version of Webby Blue
Copyright © 2008 kloover.com. All rights reserved.
**This is my personal blog. The views expressed on these pages are mine alone and not those of my employer.**