Using Mozilla Data

I want more out of Mozilla. It’s got my bookmarks, and it keeps a history of URLs I’ve visited. I want that data! Here’s what I’ve hacked together thus far. bminer is an application that parses my Mozilla bookmarks.html file, grabbing the URL, title, and date added for each entry. It then shoves it all into MySQL. (If a URL already exists, we don’t bother to insert it again…) cron makes this run at regular intervals. So, we’ve in effect got a solid backup of the bookmarks that we can run queries against, and with a little CGI magic, access from elsewhere. Oh, we could (in theory) run this on all machines we have Mozilla on to get a comprehensive list of all of our bookmarks. So much for the bookmarks. Of course the code to do this is hacky perl code with regular expressions that parses an HTML file. Could there be a better solution? Probably, but it seems to work. I should probably clean up the code and release it.

Now, for the history file, it’s a bit more work. history.dat is in some insane format known as ‘mork db’ that McCusker came up with. Previously I was unable to find a good way to parse this beast. Luckily jwz solved this problem with recently. So next on the list is some code to store the data from history.dat in MySQL as well, with the ultimate goal of tracking where we’ve been and when. This is a little more tricky than the bookmarks, because honestly my bookmarks file might only change a few times a day, and we have cron set to parse it more than would be needed, but for the history we need to determine how often to parse it, and we can’t just parse the real file, we need to first copy it, and convert the line endings from classic Mac line endings to unix line endings. (Sigh, please, please, please! This is Mac OS X, banish all classic Mac line endings!)

Anyway, when complete we should have what we’re after, better tracking of our browser history. Need to find all pages with ‘perl’ in the URL that you visited last week Tuesday? Need to find a bookmark you added months ago with ‘foo’ in the title? We got it…

Nobody ever said parsing, cleaning, storing, and retrieving data was easy…

(Sidenote: Keep an eye on MozWho from Surfmind’s MozWho Lab which looks very interesting…)


Nearest Book, Page 23, 5th Sentence

Ok, it’s being done by many… the whole Nearest Book, Page 23, 5th Sentence thing, so here’s the instructions if you missed them, and mine:

  1. Grab the nearest book.
  2. Open the book to page 23.
  3. Find the fifth sentence.
  4. Post the text of the sentence on your blog along with these instructions.

The sentence in question, from The Perl Cookbook:

It removes the leading whitespace from the text of the here document.

Now this is tricky, because the nearest “book” would have probably been a PDF file on my Mac, as the keyboard and mouse are the closest things to me. Heck, even the hard drive is closer than a paper-based book. Of course by that notion, I suppose any electronic book could have been considered as close. Perhaps Lessig’s Free Culture should have been the book of choice, as it’s just a click away…

The whole exercise was also tricky because the book I chose contained a code sample, so I had to determine exactly what a “sentence” consisted of. I chose a string of words followed by a period, which when read seemed to make sense. This is somewhat fitting considering the book I chose. Of course it would have been more fitting if I had chosen the book under it, Mastering Regular Expressions which may have actually been nearer, but I didn’t measure, so who knows.

(Hmmm, I just realized, the nearest book is actually an old QuarkXPress manual that is under my monitor, which isn’t really very accessible. Since we’re all about accessibility, we won’t even think about using that book!)

Sheesh, do I make things complicated or what?


Hey y'all!

Ya know, people from the south, I mean the South, actually type things like "y'all" – that's right, they don't just say
"y'all" they actually type "y'all" in written communications.

I can't decide if this is quaint, cute, charming, or just plain goofy…


Diver Mark does Hot RSS

I tell ya, that Diver Mark cracks me up, what with his Hot RSS and what not:

I would like to applaud CNET for their courageous invention of a completely new and incompatible version of RSS. They call it dlhottitles, but I think it deserves to be named something sexy, like "Hot RSS".

(I don’t want to mention what I first read when I saw the element dlhottitles…)


Better than Word

When we last mentioned Word, we spoke of alternatives to Word. what we’ve come to realize at the time is that almost any of the alternatives listed are better than using Microsoft Word under Classic on Mac OS X. sure, I’ve still got Microsoft Office 98, which worked fine in Mac OS 9, and it still sometimes works… But more and more I’m finding that it tells me it can’t open certain Word files. After it tells me it can’t open certain Word files, I use one of the alternatives – most often using Antiword to first take a quick peek in the terminal, and if I have to, actually create a PostScript, then use ps2pdf to create a PDF file.

Damn you incompatible (with yourself) binary formats that just aren’t very open!