Feed Checking

Dave is getting closer… He’s released a new service for people who want to be sure their feeds are in good shape.

I say getting closer because it’s a step in the right direction. It’s aimed toward people who want to make sure the feeds they have online are valid. That’s a good thing.

I went a step further. The software used to publish this site attempts to validate the feeds produced at publish time. Meaning, I create an entry, the feeds are produced, and then they are checked for validity. If I screw up and write some invalid markup, I know it right away, before the feeds ever get published.

Here’s an example: Let’s say I accidentally paste in some HTML and end up with an extraneous </a> in my text. When I render out my pages, I’d see an error like this: line 37, column 149: XML Parsing error: <unknown>:37:149: mismatched tag.

Now, here’s the interesting part. The feedvalidator doesn’t see any problems with the RSS 2.0 feed, it reports ‘No errors or warnings’ for it, but the Atom feed is the one that returns the error. Since I screwed up and created invalid markup, I’d expect an error. You can make your own call here as to whether having an unmatched </a> is screwed up or not. (Add stuff about XML, strictness, etc. if you wish.)

Oh, we also run the HTML output through Tidy to check for well-formedness, and lo and behold that stray </a> is reported by Tidy as: line 117 column 148 - Error: unexpected </a> in <p>.

Now all this does not mean that errors won’t slip through the cracks every now and then, but it makes it that much harder to let them slip through. Since we’re calling existing applications and libraries, the whole process of adding in these checks took very little time. The majority of time was spent installing the software and figuring out how it worked. There’s also less than a dozen lines of code to actually do the checks and report the errors. I know that this can’t be built into all publishing system quite so easily, but it is getting easier to do these sorts of things everyday.