I also found some code to build a site outline, as mentioned yesterday. I’m using WWW::SimpleRobot. It just took a few small tweaks to the example to get what I needed. What I’m really after is a spider I can point to a site, and have it show me all the urls it can find, so I can compare it against the files of the site (on my local filesystem) to see what doesn’t get spidered. It’s a search engine robot simulator.
(Note: WWW::SimpleRobot does not respect the robots.txt file, so use it with care.)