Month: June 2003

R$$ and Privacy

Post author By Pete Prodoehl
Post date 2003-06-24

Tim Bray had this idea, and I must admit, I had the same idea as well, an RSS feed of my financial transactions. I know, it’s most likely a long way off… or is it? While driving home last night I heard a commercial promoting email alerts from a bank. They seemed to be saying you could get sent account information via email. Now, I don’t know what kind of information they are sending, and I hope it’s encryped with PGP/GPG or something, but here’s where it gets interesting. If my bank sends me email with useful data, I can easily parse that data and build it into some sort of RSS feed for my own use. I know, it’s a lot more complex than that, but it’s the start of an idea anyway…

Which brings up another interesting issue. Privacy of RSS feed subscription information. Many people share their subscription file on their sites, which is a good idea, and does neat things, but I found when I did this, I first had to delete a feed, not because of a privacy concern, but because it was a resource that could not be reached by the world, and internal project server. So, I’d propose the following to the aggregator makers, add a way for a feed to be marked as private, so that when I export out my file, it can provide me with a list of public feeds that you subscribe to. It would also be useful for people wishing to avoid the embarrasing “You subscribe to what feed?” question…

Uncategorized

Site Outline!

Post author By Pete Prodoehl
Post date 2003-06-24

I also found some code to build a site outline, as mentioned yesterday. I’m using WWW::SimpleRobot. It just took a few small tweaks to the example to get what I needed. What I’m really after is a spider I can point to a site, and have it show me all the urls it can find, so I can compare it against the files of the site (on my local filesystem) to see what doesn’t get spidered. It’s a search engine robot simulator.

(Note: WWW::SimpleRobot does not respect the robots.txt file, so use it with care.)

Uncategorized

md5checker

Post author By Pete Prodoehl
Post date 2003-06-24

I wrote a simple perl wrapper for my md5sum differ idea, and it does work well, but it’s slow, mainly due to the fact it’s checking large files across the network. Not much I can do about that right now, but it’s a start…

Uncategorized

Site Outline?

Post author By Pete Prodoehl
Post date 2003-06-23

Long ago I had a pretty simple perl script that you would point at a url, and it would spider the site and give you an outline. The output was something like this:

http://example.com/
http://example.com/about/
http://example.com/about/foo.html
http://example.com/contact/
http://example.com/help/
http://example.com/help/fee.html

I can’t find that code anywhere. Does anyone have something quick-n-dirty that might work? Let me know…

Uncategorized

Idea: md5sum differ

Post author By Pete Prodoehl
Post date 2003-06-23

Here’s my problem: I’ve got this application that deals with loading images, and no one thought to save any useful metadata when loading an image, they just save the name of the image, so on occasion an image will get loaded again after it’s been loaded. This might be fine because the image might have been edited in some way, but if it didn’t change we waste time loading it again. What to do? I suppose the right way would be to actually save the proper metadata for the image, but my short term solution might be this: use md5sum to check the already loaded image, and the image waiting to be loaded, if they are the same, don’t load it, just discard it. Do you see any problems with this idea? (In theory the md5sum should give a fingerprint of the file, and it should be unique, so if that changes, then the images changes.) I won’t be able to do it on the fly, as they are large files, and md5sum is not fast enough, but I can preprocess the list of files waiting to be loaded…

Just off the top of my head, I think the metadata I would save is:

name
width
height
date created
date modified
file size
md5sum

I’m sure there’s other bits as well, but that’s my quick list.