Weblogging Tools of the Future

Over at the BloggerCon site is an interesting post about visions for the next generation of blogging tools. Read through the comments for some neat ideas. Some of them are very doable today, while some of them just seem crazy or don’t make sense. That’s ok, they’re all just ideas, and they’re from users. Hopefully all the weblog tool makers get a look at this list, or are open to feature requests from their users.

As for me, I tend to build my own tools, and I use my own tools. So the feature requests have a much shorter path from user to developer. ;)


A Tale of Two Formats

I recently did some work on a site that runs on a Windows server (gasp!) using IIS (yikes!) and ASP (eek!) If you know me, you know these are not my favored technologies. Nonetheless, we had a job to do.

The job was to add syndication feeds. We first added an RSS feed. Why did we choose to add an RSS feed first instead of an Atom feed? Well, not being familiar enough with ASP, we were not easily able to properly encode everything, In some cases data coming from SQL might need to be encoded, and our code didn’t always do the right thing. So we were faced with creating an RSS file that was occasionally invalid. We did not like this, but didn’t have another solution. We also knew that most of the RSS aggregators out there don’t mind invalid XML when it’s in the form of RSS. Still, while we were not pleased with the situation, we compromised.

At this point there was still not an Atom feed. We just couldn’t allow ourselves to bring an occasionally-invalid Atom feed to life, not until we discovered Server.HTMLEncode, which took care of the nasty characters that needed encoding. Once we had this final piece in place, we felt that it was OK to implement an Atom feed, and figured it would be valid (fingers were crossed of course.)

But wait, this wasn’t the final piece. There was one more, the dates and times… See, as an old Perl hacker, we’re used to using strftime to make any date/time we need. Since I couldn’t find a strftime function in ASP, I looked to the web and found A Customizable Date Formatting Routine which is close to strftime, well, close enough for my needs, but had some bugs. Anyway, we grabbed it, and it worked. More or less. (Ken Schaefer has pointed me to an update, though I still have to drop it in place.)

Where were we? Ah yes, the dates and times. Well, at first we just fudged the GMT which created valid feeds, but was lame. A little change in SQL eventually fixed that. Of course Atom requires a modified time, and it looks like it requires a created time, but if the created time is missing it should be considered the same as the modified time, and then there’s the time an item was issued as well, which is not required. I think.

Anyway, I’m pretty sure we’ve got it figured out now. The feeds both validate. Hopefully that will remain true. I think the dates and times even make sense now.


ASP Entities (and RSS and Atom!)

I’m starting to really like ASP… Ha! Just kidding!

On the bright side, I finally found Server.HTMLEncode, which makes creating a (more-often-that-not) valid RSS feed using ASP a little easier for me.

Who knows? Maybe I’ll attempt to use my (limited) ASP kung-foo skillz to create an Atom feed. ;)


Again with the Atom, XML, RSS, etc…

What to do with that bad XML? Aaron suggests I beat it with a stick. A stick made out of either HTML::Parser, or XML::Simple, or perhaps Tidy. At least I think that’s what he said…

As usual, Aaron is about 25 feet above me on this stuff, so I’ll take me some time to investigate his suggestions… I suppose I could just use the Universal Feed Parser, of course it’s written in Python. I might have to make some exceptions here…

Aaron also does some crazy transformations of XHTML 1.1 to Atom, and there’s some hackery titled atom03-to-rss as well.

Speaking of atom2rss, or as it’s called, the Feed Normalizer, is an Atom translator (to RSS) that Phillip Pearson put together.

Of course it’s written in Python… No matter. I’m starting to think this is the year I actually Dive into Python… I’ll let you know next month.


More (dirty) XML Secrets

As we all know, XML can be easily parsed with an XML parser. Right? Right… So what happens when XML is not really XML. Well, as we all know, when XML is not XML you resort to text and regular expressions. It’s one of the dirty secrets of XML. And hey, I’m not the only one who uses regex to parse XML. There’s also the speed/memory issue, but right now I’m just concerned with the not-really-XML part of it.

The Universal Feed Parser tries to use XML, and if that fails, does the regex dance.

If XML parsing fails due to well-formedness errors in the feed… …it will automatically fall back to the 2.x-style parser based on regular expressions.

If you’ve processed a form of XML commonly known as RSS, you might have run into these issues before, because there are feeds that are not well-formed, and therefor invalid, and if you want to be picky, they aren’t really XML… Perl needs a module that does the “try it as XML, and fall back on regex if it ain’t” module. Why? Because once again I figured I could just use something like XML::DOM to deal with an RSS file, which is supposed to be XML, but when you’ve got an & instead of an & in there, it all blows up. (Hmmm, perhaps we should go the other way around, create a pre-filter that takes in XML, fixes all the errors making it valid XML, and then passes it on to the XML parser! Could this be done?)

I guess I’ll blame the developers creating the software that creates the invalid XML/RSS. Want more secrets? I’m probably one of them. Most of the code that creates my RSS feeds, and Atom feed is a bunch of perl with home-brewed templates, and regular expressions… Why? Why don’t I use the proper tools? Laziness, lack of… whatever, it doesn’t matter. People are going to do it this way, and even though you would think RSS is simple and you could create valid markup, we don’t always do that. Sure, I’ve implemented feed checking into my system, as I don’t want to be a wonk that outputs garbage, but I still have to deal with the garbage out there, and damn is it frustrating.

To rephrase “Be liberal in what you accept, and conservative in what you send” I’d say: “Garbage in” is bad but “garbage out” is worse…

Is there hope? Well, there’s always hope, right? Will Atom save the day, doing what RSS can’t always do? It would be nice, but I’m just not sure… Should we rely on software that requires well-formed XML, and can fall back on plain old regular expressions if needed? I don’t know… I tend to think that’s a hack we shouldn’t need, but only time will tell…