philwilson.org

Can someone explain microformats to me?

11 May, 2005

I’ve spent a lot of time thinking about microformats, wondering what they are, creating docs and writing code to deal with them, and my eventual conclusion is: I don’t get it.

I quite genuinely don’t understand the point of microformats. Not only do I not understand why, but I don’t understand who they’re for, or how they’re supposed to be used.

I understand attention.xml slightly better, and it’s a nice idea that aggregators should be able to standardize on a format for indicating such information as, the last time the feed/post was accessed, the duration of time spent on the feed/post, recent times of feed/post access, user set (dis)approval of posts, etc. but I just don’t understand the decision to use HTML.

I think this is possibly also my problem with microformats in general. If they were in an arbitrary XML language, people would be looking at things like hCard, hCalendar and hReview decrying them as redundant, and point at existing XML formats for these things. I don’t see why putting them in HTML makes them any more relevant. In the same way, I don’t understand why attention.xml would be in HTML instead of by a strictly defined XML schema or even, Lord help us, OPML (the format, not the application).

The one benefit I can see is that they’re instantly displayable on a webpage, but quite frankly, so what? Is this the magic that I’m not getting? (yes, I’ve read “Can your website be your API?”). Maybe I’d understand it better if everyone was serving up completely valid XHTML and we could query webpages for inline reviews etc. using XPath, but they’re not, and this isn’t likely to ever change.

I understand structuredblogging.org slightly better, and I can understand both why, and who, even though I don’t have a copy of WordPress running for me to actually use the plugin they provide. It also makes more sense to me. This is how I’d serve up my data, XML alongside, or embedded within my webpage.

See other posts tagged with general and all posts made in May 2005.

Comments

l.m.orchard
11 May, 2005 at 16:11

Well, from my understanding, this is all about laziness and worse-is-better.

Going to write a about an upcoming concert in your blog? You’re probably going to mention the name of the band, so why not mark it up with a [span class=”summary”/] while you’re at it? And there’s probably going to be a mention of time, so why not throw in an [abbr title=”20050511…” /] along the way? Linking to the venue website? Slap a class=”url” on that link. Oh, and surround the bit that talks about the event with a span or a div with a class=”vevent”.

There. You’ve written what you were going to write about anyway, but now you’ve thrown in a few not-so-painful cues for a machine to understand it. That’s a microformat– an agreed upon way to mark up what you were going to write about anyway in HTML.

Ideally, you *should* be publishing in valid XHTML. But, if you’re not, there are tag soup parsers out there that can get some use out of old-school HTML crud.

See, writing pristine XML isn’t lazy. And if it’s not lazy, it doesn’t happen. And, yeah, I should be using a “calendar event” construction tool or some sort of forms-based UI. But there isn’t one on my blog, where I’m going to write about this thing.

So, without a microformat, things remain opaque to machines. But, with a microformat that doesn’t demand too much of a rise out of me, I just might choose to mark up my content in a way that is at least *easier* for a machine to parse– rather than being all but impossible.

It’s all about compromises to make things more available to machines in ways that people might actually use.

Pip
11 May, 2005 at 20:41

That’s a really interesting way of looking at it, which I’d have noticed was I still subscribed to your blog 😉

But to me it smells like a “boil the oceans” approach. Instead of producing RSS alongside websites, why didn’t we try and make everyone just use the same IDs and classes on their HTML elements? Oh, and surround the bit where you talk about your cat photos with a span or a div with a class=”catphoto”.

You see, to my mind, this is actually *more* work than publishing your data in another format entirely. Also, it assumes you’re handwriting your HTML, which, I’m sorry to say, just not that many people *actually* do (comparatively of course).

Also I’d actually disagree with the “you should be publishing in valid XHTML”, because I basically agree with Anne van Kesteren, and I think XHTML is a bad idea (NB: that’s a pretty short summary :), but that’s a side-issue.

You write on your blog about why not put the data in the content, but equally, why not put it in the feed? Tantek says “Does it make sense to ask everyone to write envelopes differently just because you’ve figured out a new way to write letters or new things to put in your letters?” well in that case, shoot all versions after Netscape’s 0.90 dead on the spot. This argument doesn’t hold water for me.

It seems to me that microformats are generating pseudo-standards (remember kids, you can extend these any time you like! That’s why things that begin with X are so cool!) in order to artificially generate million-dollar markup.

Cori Schlegel
12 May, 2005 at 10:29

I haven’t made up my mind about microformats yet, and my understanding of them’s also not very deep (yet), but WRT Pip’s comment, I can’t determine if he means that the microformats are extensible or not. If that’s the message then it’s incorrect, because most of them reference the GMPG Principals statement (see the item on Interoperability) and are copyright Technorati.

Pip
12 May, 2005 at 11:18

Blimey. That’s interesting.

In which case, maybe I was just imagining all of Tantek & co’s references to a benefit of choosing XHTML being its extensibility? I don’t think so though. Very interesting.