I actually recorded an audio blog post about a month ago on the topic of indexing and searching audio files, but due to an equipment malfunction, all was lost, and I didn’t feel like trying to record it again. So I’ll write up what I had. (Oh, it was somewhat in response to Maciej Ceglowski’s Audioblogging Manifesto.)
Let’s go forward my friends. Forward, into the future! Imagine Google in 2019. I know, Google will be long gone by then, but as to not totally blow your mind, we’ll use Google as an example. Google has these different operators to use when searching, which might allow you to look in the url, or the title, or whatever, now imagine something like ‘inaudio’ or ‘invideo’ where you could found words contained within audio or video files? But how would Google (or it’s replacement) index words in audio files? Well, with some massive cluster of Linux computers of course, and some really smart software by really smart people. (Insert bit about advanced speech recognition here.)
While Maciej probably wrote up his manifesto, and then recorded it, most people will go the other way, first recording something, and then not typing it up. Yes, I said not typing it up. You shouldn’t have to, right? So that gives us these audio blobs (blobs, not blogs – ever notice how similar those words are?) that are these containers into which we can’t quite see into. Give it time. I’m saying 15 years because it’s a total guess, and if I look back at the world of computing and the internet of 15 years ago, I think we’ve come a long way.
Oh here’s another idea, what if the software or device you used to record yourself was able to do the speech recognition, and embed that in the file? Or able to recognize the background music or sounds and make note of them? Or add links based off of your mentioning web sites/urls?
Today’s MP3‘s hold metadata about the audio, but tomorrow’s format may hold the audio, video, and hypertext versions of something, all combined into one blob, fully indexable, and readable even on an old 2010 PowerMac G9.
So while rich media search is happening today (see Singingfish) the world of indexing the actual content, not just the metadata, is going to happen. I’m sure of it. It’s all just a matter of time.