Categories
Uncategorized

Using Mozilla Data

I want more out of Mozilla. It’s got my bookmarks, and it keeps a history of URLs I’ve visited…

I want more out of Mozilla. It’s got my bookmarks, and it keeps a history of URLs I’ve visited. I want that data! Here’s what I’ve hacked together thus far. bminer is an application that parses my Mozilla bookmarks.html file, grabbing the URL, title, and date added for each entry. It then shoves it all into MySQL. (If a URL already exists, we don’t bother to insert it again…) cron makes this run at regular intervals. So, we’ve in effect got a solid backup of the bookmarks that we can run queries against, and with a little CGI magic, access from elsewhere. Oh, we could (in theory) run this on all machines we have Mozilla on to get a comprehensive list of all of our bookmarks. So much for the bookmarks. Of course the code to do this is hacky perl code with regular expressions that parses an HTML file. Could there be a better solution? Probably, but it seems to work. I should probably clean up the code and release it.

Now, for the history file, it’s a bit more work. history.dat is in some insane format known as ‘mork db’ that McCusker came up with. Previously I was unable to find a good way to parse this beast. Luckily jwz solved this problem with mork.pl recently. So next on the list is some code to store the data from history.dat in MySQL as well, with the ultimate goal of tracking where we’ve been and when. This is a little more tricky than the bookmarks, because honestly my bookmarks file might only change a few times a day, and we have cron set to parse it more than would be needed, but for the history we need to determine how often to parse it, and we can’t just parse the real file, we need to first copy it, and convert the line endings from classic Mac line endings to unix line endings. (Sigh, please, please, please! This is Mac OS X, banish all classic Mac line endings!)

Anyway, when complete we should have what we’re after, better tracking of our browser history. Need to find all pages with ‘perl’ in the URL that you visited last week Tuesday? Need to find a bookmark you added months ago with ‘foo’ in the title? We got it…

Nobody ever said parsing, cleaning, storing, and retrieving data was easy…

(Sidenote: Keep an eye on MozWho from Surfmind’s MozWho Lab which looks very interesting…)