Storing Atom

04 September, 2006

So then, the Atom Syndication Format is a great standard for storing data but once you start getting a large number of entries (even if it’s only in the hundreds or thousands), does storing your data as an XML file start to become unwieldly? A small part of my brain starts to cry at the thought, so I’m going to guess “yes” but there must surely be loads of companies and individuals doing this, so how do they manage?

Although I’ve previously harrumphed over native XML support in databases, is this a reasonable use-case? Storing each atom:entry in a single field in a database row would seem to make some sort of sense because you could use the unchanging entry:id and entry:updated as a key for each version of the entry. I imagine the problem then might be one of containing the data in the XML and making sure it didn’t start spilling out into its own fields and tables (things like tags and categories for example).

I know that on previous occassions, Joe Gregorio has dreamed about this and I guess now that Google, WordPress and Movable Type are effectively Atom Stores all of their very own but aren’t designed as such and just provides APIs to their own data stores.

I notice that the MySQL 5.1 beta has its own XML functions so maybe it really is time to start taking a look at this approach, although most of the Java apps I work on use Hibernate at some level, and I’m not really sure how well it supports XML data fields (if at all).

See other posts tagged with general and all posts made in September 2006.

Comments

Anthony

05 September, 2006 at 08:29

“like a wheel within a wheel…”

Elias Torres

06 September, 2006 at 14:33

Check this APP implementation out for WordPress.

http://torrez.us/archives/2006/08/21/486/

Aristotle Pagaltzis

06 September, 2006 at 22:39

Why wouldn’t you have the data spill out? How else would you properly query it?

All of my own design approaches have focussed on exactly that approach: storing the XML fragment for an entry verbatim in a column, with additional columns storing the various other bits of metadata. The other columns are always derived from the XML fragment, of course. Consider them an intentional denormalisation for the purpose of easier querying – a common practice anyway.

Works for me.

Pip

07 September, 2006 at 09:37

OK, maybe I meant controlled spillage. I think it would be easy to start with a single field storing the data and have subsequent developers extract more and more of the info until it was all stored in unique fields.

As for querying, I’d use the XPath features of the database I was using.

Even as it is, I’d be unconvinced by code that stores the XML then writes to fields – I’d prefer to do that using triggers (despite my normal utter hatred of all things trigger-related).

Aristotle Pagaltzis

09 September, 2006 at 01:37

OK, you can query it using XPath functions… but does the engine do anything to optimise for XPath queries? If it just parses the fragments on every query, there’s no gain over sticking everything in a single XML document and querying it with XPath directly – no relational DB involved. Would probably be faster, even.

Using triggers in conjunction with XPath functions to autopopulate the columns sounds like a very good approach, though. (I don’t hate triggers, I just find that they’re easily overused as lipstick on a schema pig.)

Pip

09 September, 2006 at 16:36

At work, for example, we use Oracle, which seems to have very good support for native XML. My knowledge is lacking in the particulars, but http://www.oracle.com/technology/tech/xml/xmldb/Current/TWP_Mastering_XMLDB_Queries.pdf seems like a nice introduction to the features available, although there’s no detail on how it manages things like caching and optimisation (I can’t imagine that Oracle wouldn’t have thought of this though).

I think the really nice feature is that it can take an XML Schema and create a DB with native fields. The user can then query it in whichever way they like.

Obviously MySQL’s support is currently beta, and I have absolutely no idea about support in other RDBMSs.