Dumbledore Gets It, Why Doesn’t Data?

I’m a big fan of Google Docs. One of my favorite features is "Revisions": when a file is saved, an immutable state or versioned is saved and can be recalled. With a simple drop down box, I can restore a previous version.

Later, I’m talking with my brother-in-law about upcoming features of Mac OS X 10.5. And we spend a good deal of time postulating how Time Machine will change the world. (If you haven’t checked out the UI twist on a fairly standard storage backup infrastructure.

The other night, I finished Harry Potter and the Deathly Hallows. (Loved it, by the way). In this final chapter of Harry’s tragic teenage angst, Harry brings us with him in to the Pensieve just as he did in previous books to learn of critical plot details by ‘experiencing’ firsthand other people’s memories. A proxied flashback, if you will.

And then it hits me.

Why isn’t all data modeled to be versioned? This universe we live in has one constant, we are on a linear timeline. (Yeah, yeah. Relativistic effects. But as far as I know, science has yet to coerce time to reverse, much less stand still.) Versions, memories, immutable state, what have you. We experience life as a series of observations. Conjuring a ‘view’ of those observations is a ‘memory’. It seems so entwined with our very nature that it seems like major oversight that all data is not versioned.


View Comments to “Dumbledore Gets It, Why Doesn’t Data?”  

  1. 1 Patrick Mueller

    Jazz, http://jazz.net , has been doing a pretty good job in playing with the space. There are plenty of issues and challenges, but lots of benefit.

  2. 2 brandon

    Pat,

    True. IBM Rational Jazz does version data in its Repository implementation. But I believe, like most systems that version its data, it is solved at the application layer rather than the persistence layer. Generally not true when the data is called ‘content’.

    To most, data is synonymous with SQL. As such, is there a relational database that versions data transparent to the application, yet provide hooks for the application to retrieve versions?

    So, perhaps it may be relevant to ask whether all data is so called ‘content’?

  3. 3 Bill Higgins

    Jazz currently uses an RDB as its persistence mechanism, though we *claim* this is an implementation detail. On the server-side, Jazz clients interact with a logical repository via a set of Java APIs. The superclass of all ‘things’ in a Jazz repository is called “Item” (which is like Object but with persistence and more reflective capabilities). Under Item there are several subclasses, one of which is called “Auditable”. Any Jazz component may define 0-n new Item types (e.g. “Requirement”, “Iteration Plan”, etc). If a Jazz component defines a new Item type and subclasses Auditable, they get versioning for free. To be precise, each successful save of an item descending from Auditable results in a new immutable persistent version of the item in the Repository. Likewise clients can use other APIs to request either the current version or one of the earlier versions of an Auditable item.

    From an RDB perspective, saving an Auditable results in the creation of a new row in some table in the database, rather than updating a pre-existing row (I think JoeG said this was a pretty common pattern, but I’m not a persistence wonk so I’m not sure).

    In theory, since clients interact with high-level repository APIs rather than RDB-centric APIs (like JDBC), if a persistence mechanism came along that provided native versioning support, and still provided adequate concurrency and query capabilities, we could switch to that native mechanism without clients noticing. But I don’t expect this to happen anytime soon. Also, it’s a stretch to claim that “the RDB is an implementation detail” when it’s the only implementation; I’m sure the law of leaky abstractions would rear its ugly head if/when we tried to plug in the second implementation.

    I don’t follow what you’re saying about “Content”. To me “Content” means “A dataset whose structure is opaque to the Repository”. I.e. from the server’s perspective it’s just a sequence of bytes (though it might have some structure which the server simply doesn’t understand). We have such a notion in Jazz, and it’s straightforward to use the same Auditable versioning mechanism. Simply wrap a Content object in an Auditable. E.g. define a new Auditable type called “File” which has a single attribute “content” referencing a content object (obviously you’d want more attributes for a file). I don’t know if what I’ve just described has anything to do with what you were talking about but I’d be interested to hear if what I said raised or settled any issues.

Leave a Reply


blog comments powered by Disqus