FOAF and social networking

The most interesting thing I’ve noticed about Flickr in the past week is that within one or two-degrees of my Flickr profile I’ve found so many people whose weblogs I subscribe to. Evhead, Kottke, Anil Dash, BluishOrange – all these are ones which just spring to mind now, there are plenty of others.

So what I want are FOAF profiles for the people I’m subscribed to which not only gives their usernames on systems like Flickr but then shows me how closely related to them I am (maybe via TouchGraph if I wanted it in a map style, but HTML will do me just fine – Orkut actually did a good job of this) based on what other relationships I have with them and any they may have with me.

So. We have part of this with Seth Ladd’s recent work on generating FOAF from Flickr. What else do we need? Well ideally, unless we want to spider the entire networks behind these systems we need these people to generate FOAF which links to FOAF files of the accounts they have on various networks (so in my FOAF file I should link to a FOAF file of my Flickr account, my del.icio.us account, IM account, etc).

Of course, this is pretty unlikely to happen. So what would happen if someone started doing it for them? Is centralising someone’s distributed persona a problem? Presumably they’ve put that much detail about themselves online for a reason, but as we saw recently when Plink was turned off because of FUD-fuelled complaints, some people don’t like it.

On the other hand, mapping Flickr relationships through FOAF-naut would be a great way to visualise the centres and communities in Flickr: find out who’s a ‘contact whore’ by adding people but isn’t added themselves (oh yes, Flickr supports the notion of reciprical contact acknowledgement very well), who belongs to the most communities and so on.

Sounds great to me! Seth should set his FOAFverse app to harvest his own FOAFlickr data and display it for the world to see via the SVG magic that is FOAFnaut 🙂

Kinja gets it right

Kinja is the web aggregator everyone forgot.

When I go somewhere without my computer and I want to check up on my feeds, I don’t log into my Bloglines page, I load my Kinja page.

Not only is it incredibly simple (in fact so simple that at first I didn’t get what use it was, and like everyone else, it seems, stopped using it once I’d first signed up back in April) but it’s fast because it gives you the first X words of every post too, so you can skim through just as fast as you like, not worrying about ‘read items per feed’ and all that rubbish. If you find a post you want to read later, you don’t mark it as unread, or flag it or whatever, you bookmark or del.icio.us the sucker!

If you’re scared stiff of missing even one post, and you’re subscribed to 100+ sites, then Kinja isn’t for you. But if you’re subscribed to that number or less and you’re not worried about missing one post because you know if it was important it’ll come up again, then you’ll be just fine (and hey, that’s what subscribable Feedster and PubSuv searches are for, right?).

It’s the perfect light tool for non-obsessives (although I know there are plenty out there, I was one myself for a while).

The bad side: sadly, Kinja doesn’t always seem to get it right, Or rather, it probably gets the basics right, but isn’t quite clever enough – as of writing, two posts that Leigh Dodds made back in September are at the top of my reading list as having been posted “2 hours ago”. I’d like to think that something with his RSS publishing has just gone wrong, and Kinja’s reflecting that, but it doesn’t seem to be the case. An inspection of his RSS 1.0 file shows that their time/date stamp is as it should be. This happens every now and again (read: frequently enough to be obviously noticable) and is a pain in the arse but the occassional duplicated items is still worth the simplicity and ease you get the rest of the time.

Tracking comments

Tracking conversations on other weblogs is a perennial problem. Some people swear by Feedster, others by Technorati, maybe you could track incoming links to your own site where new people have clicked through from a comment you left on another site (to give some idea of activity). Some weblogs (notably WordPress) provide a per-post RSS feed, but that can be annoying if your aggregator displays your feed list and you watch it spiral out of control and have to spend time managing it. Other weblogs like Movable Type (and, I guess, Typepad) let you subscribe to comments on posts via email. And there’s trackback, which is even usable, to an extent.

Needless to say, all of these methods are rubbish.

Until Firefox 1.0 I was using the Bookmark schedule/notify feature to keep track of posts I’d commented on – I’d make a comment, bookmark the post and set a daily schedule for checking if it had changed, but it was backed out for 1.0 and is currently on the “todo” list for post 1.0. But obviously I now need a new method.

There are quite a few other ideas about how comment tracking could work, and out of these there are a few practical ideas and suggestions.

First of all Blogger, Typekey and LJ (and other gated communities) should let you track comments you’ve made on those systems in the same way that Flickr lets you track comments you’ve made on other people’s photos. If you haven’t used Flickr yet, the feature looks like this:

A screenshot of how 'my comments' work in Flickr

Of course that still leaves every other system under the sun. At first I thought the solution would be a personal web proxy which stores the contents of textareas when you POST a FORM, but that seems a) pretty heavy implementation-wise and b) isn’t portable – as soon as I move to another PC I’d have to install my proxy again.

Bill Kearney’s long post musing on the subject gave me a better idea: Never post directly to the site. Instead use a bookmarklet which pulls out the FORMs on a page and passes them through to a hosted proxy where you can then fill it out etc. etc. (or use a bookmarklet to rewrite the form so that it posts to your proxy with the old POST URL as a hidden INPUT so you can stay on the same page). Then your proxy server can poll the original page for updates and inform you when it changes by whatever means you like: RSS, Email, IM, etc.

I’m aware that there are any number of desktop applications or hosted services which will monitor webpages for changes for you, but the desktop apps are out for reasons of portabiity and all of the hosted ones I know of are commercial.

I don’t think this would be too hard to get going, at least in a rough fashion. Now someone just needs to write it for me 🙂 How about you, lazyweb?

Technorati’s Attention.xml

I’m sure they mean well. But really.

When you say something like attention.xml will be simple, minimalist, and easy to use and implement and your (fairly minimal) example looks like this:

<li><a href="permalink or guid" rel="votelink">title</a>
        <dl><dt>lastread</dt><dd>ISO-8601 date</dd>
            <dt>duration</dt><dd>seconds value</dd>
            <dt>followedlinks</dt>
            <dd><ul><li><a href="http://example.com/1">link1</a></li>
                    <li><a href="http://example.com/2">link2</a></li></ul></dd>
        </dl>
</li>

Then excuse me if I don’t laugh in your face too hard.

Things I want for Christmas from Blogger

I’ve been using Blogger to power this blog for more than two years now, and whilst the frills have become nicer, the core functionality provided hasn’t changed much in that time (except for comments! yay comments!), which indicates (to me at least) that it’s pretty good. In which case, we need better frills. Herewith some frills that I’d like:

  • Per-post comment feeds. Currently I generate these for my blog (for my own consumption) by checking file modifications and scraping the HTML files. Yuck.
  • Recent comments feed/page, like the ones Flickr provide
  • Comments I recently left feed/page, like the ones Flickr provide (this is invaluable for keeping track of conversations on Flickr – obviously it would only work for Blogger sites, but it’s a start – there’s no reason Typekey couldn’t also do this)
  • Better post previewing using my actual blog template please. If I use an inline style I want to check that it doesn’t conflict with anything else and my images float where I expect them to float, thanks.
  • The return of per-user stats. They don’t even have to be on-demand dynamic. Just generated once a week would do. The fact that a users’ Blogger profile page represents their profile as of about two months ago is genuinely worse than no profile page at all.
  • Don’t scrap the archive index page! Dear god please! Just because there’s a template tag to include links to all my archives on my front page doesn’t mean I want to! I have something like 28 months worth of archives – what if I wanted to display these weekly? A 112-item long list on each and every page of my weblog? Yeah, because that would be worth it. What it really means is that I now have to maintain my archive index page by hand 🙁

These are just the things which spring to mind. Does anyone reading this use Blogger and have something they’d like adding/changing/fixing? Official support for some of the Firefox Blogger extensions like the Google Toolbar maybe?

Flickr-tastic

If you’ve not used Flickr yet, do. It’s brilliant.

When I first signed up I didn’t really like the emphasis on the social aspect of it, I would have preferred to be slightly more like del.icio.us, where the social aspect is emergent rather than integral, just putting my photos up for my own reference and ease of insertion into my blog; but as soon as you start getting comments from people on your photos and start joining groups Flickr really takes on a life of its own.

Overall an excellent, excellent piece of software that goes as far as you need it to and beyond (not to mention the open APIs which some enterprising chap has used to make a Flickr screensaver). Great stuff.

Browser and Aggregator Usage Statistics

So, a quick twelve-monthly roundup for usage of philwilson.org is in order I think.

For these stats I only looked at the fifty most frequent visitng user-agents

Overall, hits were pretty 50-50 between browsers and aggregators. I don’t have last years’ logs, but if I did I bet I’d have seen a massive surge in the number of aggregators being used.

Browsers

Browser usage of philwilson.org
Browser %
Mozilla-based 58.13
Internet Explorer 36.57
Safari 2.58
Opera 1.59
Netscape 4.x 1.13

A pretty massive win for Mozilla-based browser there, I’d say.

Aggregators

Aggregator usage of philwilson.org
Aggregator %
SharpReader 29.6
Bloglines 28.3
FeedThing 9.2
FeedDemon 6.7
FeedOnFeeds 6.5
Thunderbird 4.9
NewsGatorOnline 4.6
PubSub.com RSS Reader 3.3
NetNewsWire 2.9
RssReader 2.8
Pluck 1.3

I’m not entirely sure what to read into the dominance of SharpReader and Bloglines in this list. From looking at my logs, SharpReader is clearly in use by a much larger number of IPs than any of the other aggregators (although I know the issue isn’t quite that simple due to people whose IP changes when they connect etc.), but does Bloglines’ high showing just mean that it’s hitting my feed too hard? Or does it reflect growing usage? Or what?

The list of lower-scoring aggregators is actually much more interesting, revealing diverse, closely-fought aggregator usage (hardly any macs, and no SauceReader). It’ll be interesting to see what happens to this market in the next twelve months and whether some die off to be replaced by others or we just end up with two or three dominant aggregators which do everything.

Finally, in news just in from the Syndication Wars, my Atom feed (available since May sometime) finally got more subscribers than my RSS feed at the beginning of December with a lot of people subscribing after reading my post about making a bigger back button in Firefox.

David Blunkett resigns

Within the last thirty minutes, David Blunkett (wikipedia, BBC), the Home Secretary has resigned.

Blunkett is an MP for Sheffield, which is where I live, and is the main driving force behind introducing ID cards to Britain (wikipedia, BBC: iCan, BBC: ID cards at-a-glance).

I’m absolutely no fan of ID cards at all, and I can’t deny that when I heard he’d resigned (publishing an autobiography slating the rest of the Cabinet can’t have helped) I jumped out of my seat and punched the air, but the Government will still try and push them through, especially now they have the backing of the Conservatives, but I think they’ll be lucky to get it cleared by 2012 unless Charles Clarke (the guy who its currently mooted will take over) decides to really push it as hard as he can.

Also check out the relevant article on David Blunkett is an arse and watch out for the reaction from Big Blunkett.

From the more technical perspective, Wikipedia had updated their page on Blunkett within minutes of the announcement. I look forward to reading the updated version of the Encyclopedia Brittanica later 😉

Wikispam

I’ve always known that wikispam exists, just like comment spam exists, but up until now I’ve never run into much of either (all praise the Blogger walled-garden!). In the last few days however, I’ve been learning a little bit about Groovy, where most of their documentation resides in a MoinMoin wiki (there is also some duplication in their Confluence wiki, but the MoinMoin one seems to have more content overall), and it suffers badly from wikispam. By looking at recent changes it’s easy to see that there are a good number of people who do their best to clean it up (for my part, I do what I can), but there’s only so fast a human can act when they’re working against (what looks like) a bot.

screenshot of wiki changes by a spambot

From 7.30am to 9.30am someone, or some bot is just following every single link on the wiki and replacing the content of the page it finds with links to all sorts of junk, presumably to try and improve their PageRank. Of course what this means practically, is that while the wiki is defaced, no-one can learn anything about Groovy (assuming they’d want to), and of course it’s not an isolated incident, it happens to all kinds of wikis across the internet, denying access to their content (of course, if the content is that important then you could ask “what’s it doing on a wiki anyway?” but that’s beside the point). It’s a complete mess of a situation where you want to grant free access to legitimate users but also need to deny access to these spambots (it’s also an old subject which I’m regretting getting into, but that’s life :).

Step up to the plate, WikiMinion (via Abject Learning) which is a Bot by RichardP which cleans wiki spam. Very good it looks too.

The code uses two approaches for identifying edits made by spammers – examining external links and examining source IP addresses. In both approaches if a clean version can be identified the page is reverted back to it. Likewise, if all versions of a page appear to created by a spammer, the page is marked for deletion.

To my simple mind, that seems like an excellent first-step tool for maintaining existing wikis. Maybe the next step should be to incorporate this kind of thing into wiki software itself so that the changes by spambots can’t be made in the first place. It would remove having to set up authentication for a wiki (which would kind of defeat the point) whilst inconveniencing the fewest number of people (I guess there would be IP clashes when the spambot learns it has to spoof the IP of someone who has already made a successful changes). The Meatball Wiki also has a list of other possible solutions and in fact it turns out there’s plenty of discussion on the MoinMoin site itself about how to combat spam. If only I’d read that before starting to write this, eh? 🙂