Categories
Uncategorized

Indexing the Audioblog (Manifesto?)

I actually recorded an audio blog post about a month ago on the topic of indexing and searching audio files, but due to an equipment malfunction, all was lost, and I didn’t feel like trying to record it again. So I’ll write up what I had. (Oh, it was somewhat in response to Maciej Ceglowski’s Audioblogging Manifesto.)

Let’s go forward my friends. Forward, into the future! Imagine Google in 2019. I know, Google will be long gone by then, but as to not totally blow your mind, we’ll use Google as an example. Google has these different operators to use when searching, which might allow you to look in the url, or the title, or whatever, now imagine something like ‘inaudio’ or ‘invideo’ where you could found words contained within audio or video files? But how would Google (or it’s replacement) index words in audio files? Well, with some massive cluster of Linux computers of course, and some really smart software by really smart people. (Insert bit about advanced speech recognition here.)

While Maciej probably wrote up his manifesto, and then recorded it, most people will go the other way, first recording something, and then not typing it up. Yes, I said not typing it up. You shouldn’t have to, right? So that gives us these audio blobs (blobs, not blogs – ever notice how similar those words are?) that are these containers into which we can’t quite see into. Give it time. I’m saying 15 years because it’s a total guess, and if I look back at the world of computing and the internet of 15 years ago, I think we’ve come a long way.

Oh here’s another idea, what if the software or device you used to record yourself was able to do the speech recognition, and embed that in the file? Or able to recognize the background music or sounds and make note of them? Or add links based off of your mentioning web sites/urls?

Today’s MP3‘s hold metadata about the audio, but tomorrow’s format may hold the audio, video, and hypertext versions of something, all combined into one blob, fully indexable, and readable even on an old 2010 PowerMac G9.

So while rich media search is happening today (see Singingfish) the world of indexing the actual content, not just the metadata, is going to happen. I’m sure of it. It’s all just a matter of time.

Categories
Uncategorized

Podcasting’s Past

I know Adam has said he’s been trying to tie the bootstrap of podcasting for the last 5 years, and I’m pleased to see the knot being formed, but I’d like to take you back a bit, to the year 1997…

Streaming media was there, and so were we! Well, we were on dialup, and there were no really affordable “portable audio players” at the time that could connect to your computer and transfer files. It was all foreign in concept. What we did have was a handheld recorder, the same one we’d been using since the 1980’s mind you. So, we would play some streaming file and capture it via the tape recorder, and listen to it later. We had timeshifted audio, that was portable. Yeehaw…

I know, I could have fed the output into a stereo and done it right, but we’re hackers of the quick-n-dirty variety sometimes… We didn’t do this a lot, but we did do it.

Fast forward into 1998. (I think 1998, my memory is fuzzy…) My commute changed from being 15 minutes per day to close to 2 hours per day. I needed some audio. I needed… more cowbell! At the time you could still read slashdot and find value in the comments. I ended up writing code to convert text files to more phonetically correct words (like changing “Mac OS X” to “Mac Oh Ess Ten”) and then would have my Mac read those text files while my recorder recorded it all to audio tape. I did this almost daily, as long as I remembered to start the script before I went to sleep. Oh, I didn’t just listen to slashdot, but that’s one of the memorable ones.

Around the end of 1999 Geeks In Space appeared, and good gosh, doesn’t it look like what people are doing today, minus the automated delivery via RSS?

Ok, let’s move ahead to 2003, when I finally decide I can get an MP3 player for cheap, as I’m sick of making audio tapes of Your Mac Life and other internet radio-type shows. By now I’m getting sick of hitting web pages and downloading files, so I whip up some perl to grab the new files each week. Usually the shows did a weekly broadcast, so it was just a cron job and date calculation to get the file. It worked, and for a geek, it was alright.

Meanwhile the RSS 2.0 enclosure thing happens, and we start to see things change. Things get easier. Things are brewing. Sure, I have to email Doug Kaye and ask him to fix his feed, and ask the LugRadio guys to create a feed (which either broke, or they stopped updating) but it was working.

Along comes Adam with his iPodder thingy and manages to get others involved until we have this new thing called “podcasting” to deal with…

And it’s good…

Oh, just one more thing, Doc has been noticing the number of results for “podcasting” that Google gives, and how it’s growing. That’s a good thing, but I think what will really be cool is the day when Google stops putting: Did you mean: broadcasting on the page. ;)