Google adds automatic captions to YouTube

This week, Google announced the addition of automatic captioning to YouTube. Captions can now be machine-generated by Google’s built-in automatic speech recognition (ASR) technology. This is a huge step toward making video content more accessible and more discoverable.

When I first came to MPOW, our group was piloting search technology that paired a video in RealPlayer side-by-side with a text transcript and then highlighted the words as the video played. It worked, but there was a significant bottleneck at the transcription stage.

To prepare the transcripts, we had to ship our videos on DVD to the vendor (who shall go unnamed) for them to complete the transcription which then took weeks or even months. The long-awaited end result was an XML document with each word and paragraph timecode-tagged. They told us they were using sophisticated automatic voice recognition technology and that the results then had to be proofread by humans. My suspicion is that most of the transcribing was people-powered. In any case, it was expensive and time-consuming to the point where the video content might be dated by the time it was finally available to the user.  It just didn’t seem to be scalable and I ultimately decided it wasn’t worth it.

In the past year, I’ve seen a couple of video-search platforms that do similar things but probably much better.  They’re still expensive though.

I’m anxious to see how well Google’s voice-to-text extraction works. I don’t hold lofty expectations since much of our content will have lots of medical terminology.  If it does work, the next step would be to embed the YouTube viewer into DSpace’s XMLUI and combine the searches somehow.

This entry was posted in Uncategorized. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>