Over the past few months, I’ve been writing tons of code to perform analysis on YouTube tags. Luckily, they have a great API available in Java, .NET, Python, and PHP. I needed to obtain estimated tag frequencies on videos, but I couldn’t find this data available anywhere else online.  I ended up having to calculate them myself, so I thought I’d share the results:

  1. music, 1.78%
  2. video, 1.75%
  3. funny, 1.63%
  4. rock, 1.28%
  5. de, 1.11%
  6. dance, 0.95%
  7. film,0.88%
  8. 2008,0.82%
  9. live, 0.78%
  10. 2007, 0.77%

The list of tags to query was obtained by merging a standard Linux dictionary and the set of tags discovered during a random walk of the YouTube graph. The API was then queried to determine the estimated number of videos which contain the given tag. The above list is only a partial list, but the full list is available upon request.