Finding related items in your RSS datastore

Jon Udell’s latest screencast “The on-demand blogosphere” shows how he mines the data stored by his aggregator to find out who on his blogroll has made a post related to what he’s reading at a given moment, based on anything from the currently selected word in a block of text, to the URLs the text links to.

It’s all very clever, and probably quite useful if you only skim read your aggregator, or have a veritable mountain of sources to look at, but it’s clearly version 0.1 as far as this kind of thing goes (as Jon himself suggests).

For me, the next step would be to automate the searching; using either Greasemonkey or an extension to display either a sidebar or a section in-page containing the related links.

Taking Jon’s step of then searching out through the blogrolls of the people you’re subscribed to is harder by an order of difficulty, primarily because there’s no easy way of discovering exactly who they subscribe to (if they even make it public). There’s always the chance that they might have a FOAF file, but then the problem becomes one of identifying which link in their FOAF file is to a blogroll file (probably in OCS or OPML) before you’ve actually downloaded it.

I imagine there’s some way you could centralise it all by using Bloglines and associating the blogs you read with user IDs, and then harvesting the public blogrolls, but that’s just a wild guess.

Tantalisingly though, all this stuff seems just out of reach. I can’t think of any truly practical way of going about it (especially in a decentralised way, which would be the most useful). I’m sure this is the moment where I’m supposed to say that the semantic web will save us, but I really can’t think of any way it helps here, sorry 🙂

Published by

2 thoughts on “Finding related items in your RSS datastore”

  1. Exactly.

    It’s a nightmare. Where’s the RSD file for blogrolls? WSIL seems a little too obscure to me. I’ve seen some people put links to OPML files in the <head> section of their webpage, but with no common identifier (and, urgh, OPML), it’s pretty useless.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.