Understanding reviews in FOAF

A few months ago I signed up for FilmTrust. It’s an interesting project in which you can rate and reviews films and also add friends and add how much you trust their film ratings. The natural progression would be a personalised film recommendation system, but this doesn’t appear to exist yet.

FilmTrust output a FOAF file for each user. For example, here’s my FOAF file as FilmTrust sees me.

In it, you can see that I’ve added Danny Ayers and Dan Brickley as contacts, and you can also see that I’ve rated several films. All these films are listed in my <foaf:Person> element as <foaf:made rdf:resource='#tt0066921'/> but then my reviews are referenced like this: <rdf:Description rdf:about="http://imdb.com/title/tt0066921/"> – so am I missing something, or do these not actually refer to one another?

Oops, just to clarify, the review itself is actually referenced in the generated FOAF file by <review:Review rdf:ID="tt0066921-pip">, but this doesn’t match the foaf:made resource ID either.

Hurrah! That’s now been fixed thanks to a quick email exchange with Jennifer Goldbeck, and the graph now looks lovely. Thanks Jennifer!ma

Dear Gmail

Any mail I receive that only contains Russian characters is spam. Ditto for Asian languages.

Either Gmail‘s spam filter has been loosened, or the spammers are getting better, but I’m getting a lot more spam in my Gmail inbox these days. Oh well.

Google Jabber client and server

Google Talk is finally out and is based on XMPP for text chat. This means that any normal Jabber client can also connect to Google’s servers and use it. At the moment you can’t use Google Talk to connect to the wider Jabber network, and I can’t see that this will change any time soon. In theory you should be able to redirect which server it tries to connect to and access your normal Jabber userlist like that, although what exactly it would do with XMPP stanzas it doesn’t yet support I have no idea.

The built-in VoIP chat is currently a closed protocol which they will be publishing soon. As soon as they do, I can’t imagine other Jabber clients not releasing plugins to interoperate.

The client itself is clean and usable (a far, far cry from the monstrosity of MSN Messenger, which plagues your client with ads for FREE LOANS!), and ties directly into your Gmail contacts list. It’s also very beta. 🙂

It looks like Google are planning to use Talk to drive users to Gmail, and if they’re serious about it, this could mean one of the first massively scalable corporate XMPP server installations. This is a very attractive proposition, and since you can connect to it using a normal cross-protocol client, a nice way of allowing Jabber evangelists to finally point to a Jabber server which doesn’t keep falling over. Obviously the Google XMPP server doesn’t run any services like MSN transports itself so you’re still reliant on client plugins or third-party servers for that.

Feed on Feeds and dc:subject (again)

Well, after making my last brave post, I realised that FeedOnFeeds natively supports the dc:subject element and stores it in its database already. Hurray.

Or not, really. Because what it actually supports is a single dc:subject per feed item, which it just drops straight into a text field. If you have multiple subjects per item, say:

<dc:subject>Semantic Web</dc:subject>
<dc:subject>Programming</dc:subject>

Then what you end up with in your database is this:

Semantic WebProgramming

Which isn’t terribly useful at all. So, I can change the existing functionality to iterate and insert spaces and stuff, but what do I do with the existing 5,500 items in my database which already have tags associated with them? (crikey, that’ll teach me to subscribe my del.icio.us inbox) Clearly, I could write a script which looks for capital letters in the middle of words and inserts a space before it, but then you fall into the problem where things are tagged with acronyms like FOAF (which I still have trouble pronouncing in conversation, by the way), and plain ol’ typos.

“Bah” I say.

I (heart) the RDFers

Every time anyone dares to question RDF the RDFites assume they don’t know how it works.

That’s a really bad state of affairs, but also one that I’ve fortunately never encountered. All the RDFers I’ve spoken too (although I confess it’s not many) have been polite and willing to explain things in ever decreasing complexity until I finally understand what they’re talking about, including sending me code samples, and long drawn-out explanations which they’ve clearly written many times before.

On the other hand, I’ve read an awful lot about RDF zealots, the rudeness of RDFers and how RDF might be fine, but it’s the attitude of RDF proponents which really puts people off.

If I was coming to RDF new, I’d be reading this stuff and going “oh, well, even if RDF’s great, I’ll never be able to get any support from the community, they sound like real nutcases. I’d better use OPML instead.”

I think that’s sad, and from what I’ve experienced, undeserved, so hopefully this will give Google a slightly more balanced opinion. 🙂

Auto-tagging items in FeedOnFeeds

I’ve just added the facility to tag feeds in my version of FeedOnFeeds so that I can group blogs together and read through them more quickly, and in a more focussed manner.

I’m planning on adding support for storing automated per-item tags as well, but for a quite separate reason.

I want FeedOnFeeds to automatically detect and store any rel=”tag” attributes in feed items or dc:subject elements associated with items and to store them in its database.

To start with, I looked at two feeds which I knew stored tags alongside the main content: Danny Ayer’s RSS 1.0 feed and my del.icio.us RSS 1.0 feed. Both use <dc:subject> to define tags for each <item>. Sadly for me, they do it in different ways. Danny uses multiple <dc:subject elements per item, one <dc:subject> for each tag like this:

<dc:subject>Semantic Web</dc:subject>
<dc:subject>Programming</dc:subject>

Del.icio.us uses just one <dc:subject> element, using spaces to separate the tags, like this:

<dc:subject>syndication tools web</dc:subject>

According to Seth Ladd, dc:subject takes a Literal value, so Danny seems to have it right, but even he wasn’t, I’d still have to treat space-separated words as individual keywords for the cases where feeds had multiple tags were in the same element. Postel’s Law and all that.

This should all lead on to something far more interesting, which I’ll probably write about next week.

Dear Magpie: Don’t unescape my content, thanks

Bug 1212662: Don’t unescape html entities in ‘description’ or ‘content’

Currently, when MagpieRSS (otherwise marvellous general-purpose feed parser) encounters escaped content, it unescapes it, thus turning hard fought-for “&lt;”s into “<“s.

Obviously this makes it really hard to read other geek bloggers who post code samples which use pointy brackets. In fact, it makes it a complete pain. I’ve looked at the source, but I’m pretty much a PHP-lamer and can’t see where this unescaping is going on – the file is available through ViewCVS for you to look at so you can tell me what to change to fix this.

I look forward to seeing what it makes of this post 🙂