Archive Page 2



From Amazon SimpleDB homepage:

GET, PUT or DELETE items in your domain

Sounds RESTful? It isn’t. In fact, when Amazon’s SimpleDB went public on Friday, December 12, 2007, the opening paragraphs on its homepage claimed it "RESTfulness". Now, after community backlash, the homepage seems to be mysteriously lacking such a claim and you must dig deeper into documentation to uncover such fallacies:

Amazon SimpleDB REST calls are made using HTTP GET requests. The Action query parameter provides the method called and the URI specifies the target of the call. Additional call parameters are specified as HTTP query parameters. The response is an XML document that conforms to a schema.

Some time ago I reviewed, what I believe, to be the major misunderstandings about RESTful principles. I categorized three different "interpretations" of REST:

  1. (Unicorn) REST describes those pure implementations that the fanboys aspire to (think Atom Publishing Protocol). These are only slightly more common than unicorns but more and more “true” REST implementations are appearing.
  2. (Anything HTTP that is not SOAP) REST is an umbrella labeling implementations that are riding the REST hype but have never read Fielding’s dissertation. These are more appropriately called HTTP-POX.
  3. (Somewhere in between) REST

Amazon clearly interprets REST as category #2.

Technorati Tags:

Five Things

I was tagged by Joe Gregorio. A very long time ago. At the time my motive was to stop the viral madness. But, in sweeping out my drafts folder, I figured why not post the five things without ‘tagging’ anyone else:

  1. I lived in Durban, South Africa from 1997 to 1999 serving a mission for my church. I’m saddened that many of the life-long friendships I made there have already ended in the 8 years since due to the AIDS crisis in South Africa. It really is a tragedy.
  2. I have the most amazing wife and mother of my two children of 3 years and 9 months.
  3. I lack the confidence that many exhibit in this industry. This explains why many of my thoughts never get to this blog. I just need to write confidently.
  4. I like being the center of attention on my own terms. For example, I can be annoyingly outgoing in one setting only to appear extremely shy in another.
  5. I played bass guitar in Before Braille, a now defunct rock band. Toured our little hearts out but couldn’t make a dent in the music industry. We opened for Jimmy Eat World, the Used, and the Format, among others. It was a lot of fun and, if anything, touring months on end taught me that I wanted the married life.

Entities are not resources

I’ve been thinking about data access a lot lately from two fronts:

  1. Distributed–in particular HTTP, siding primarily with standardizing on the Atom Publishing Protocol as the default implementation of REST
  2. In-process–in particular Java

As a result of my pondering, I believe there are two larger classifications of applications in the real world:

  • Entity-oriented applications define a clear set of entities for the application and use them for all operations.
  • Data-oriented applications split up code in one logic layer and one data access layer. However, the data access layer manipulates the tables directly, not always passing through an entire entity representation.

In my opinion, the data-oriented method is the way people actually code their applications. Recently, I’ve spent time on a mailing list trying to teach the principles of REST, a highly arguable topic because it means different things to different people. To some, it means anything HTTP that isn’t SOAP. To others, it is an architectural style, HTTP the way it was designed.

I side with the later.

So, in thinking about distributed data access in context of the discussions on this mailing list, I’ve come to realize that people are cramming their traditional, in-process data access mindset into HTTP. Its not their fault, they’ve been conditioned that way because of API creators.

As an example, consider the typical scenario below:

You have customer and order tables and need to display in a grid each customer and the total number of orders.

In entity-oriented applications, there are two primitives: entity (or resource in REST) and lists of entities. Here, the developer codes to simply load all the customers. Then, in each customer entity, a link points to list of order entities.

In data-oriented applications, most of the time, you will have a query in your access layer such as "SELECT c.name, count(o) FROM customer c INNER JOIN order o ON c.id = o.customer_id GROUP BY c.name". The result of this query can not be mapped to an entity. If you want to use the result, you will need to one of the following:

  • Bind the ResultSet to managable grid of sorts
  • Load the ResultSet into some disconnected structure for later usage
    • DataSet
    • XML
    • other structure

Both solutions can lead to clean, well-defined applications. It all depends on your needs.

In the data-oriented application, it is often hard to know what are the data structures that are passed across tiers because they may vary by implementation. For instance, Array for a single column select, XML for complex results, DataSet, custom classes (Hibernate, JPA).

However, in entity-oriented appications, other problems may appear: performance, partial entity loading, complex database structure, etc.

User-driven, Pre-emptive APIs

Aneel makes an argument for "tools vs. methods". To summarize, should tools enforce the methodology or methodology the tool.

Tools need to move when we do. And they need to be made to be moved by us. But, not in a vacuum. The idea of user-driven innovation should be built into professional tools. In organizations where policy and methodology comes down from on high, when a method-enforcing tool is modified by an end-user (hi!), the bigwigs should go: “huh, Bob is doing this differently.. wonder if he’s onto something?”. Then someone should go find out if he is.

Source: tools vs. methods

IBM Rational has the concept of patterns built into the tool. Internally, IBM has contests (accompanied with decent rewards) for developing patterns for Rational.

The same argument can be made for APIs in general. Can APIs be designed in such a flexible way that when a use case changes, so can the result when the API is invoked? Can we ship with n patterns and best practices but leave the API wide open for anybody to plug in their own functionality?

You bet we can. And then, falling back on Aneel’s argument, once the bigwigs see that users are creating patterns, perhaps include it into the API as convenience methods.

Pluggable Data API

For instance, I’m working on a data access API that wraps JDBC access and ships with IBM’s pureQuery. The main goals of the API is to enable clearer persistence code by using code to describe what is being done instead of how it is being done. Second, drastically reduce the boilerplate code associated with JDBC. Third, use collections and beans rather than ResultSet. Fourth, do not be a full-featured object relational mapper.

Let’s look at some of the methods available. The variable "data" in these examples are an instance of the "Data" interface, which defines these methods, and assumes it has been constructor injected with a DataSource.

First, you can get a Map out of a table row:

Map<String,Object> brian = data.queryFirst("SELECT * FROM person WHERE person.name=?", "Brian");

If you want to get at many rows:

List<Map<String,Object>> people = data.queryList("SELECT * FROM person WHERE person.name LIKE ?", "Br%");

Second, you can get a POJO bean out of a table row:

Person brian = data.queryFirst("SELECT * FROM person WHERE person.name=?", Person.class, "Brian");

Likewise, beans as from many rows:

List<Person> people = data.queryList("SELECT * FROM person", Person.class);

But what if you want to get a the result(s) of a single column, say from an aggregate function, for instance, you could retrieve a String by:

String name = data.queryFirst("SELECT name FROM person WHERE person.name=?", new ScalarRowFactory<String>(String.class),"Brandon");

What’s with the "ScalarFowFactory", you ask? Well, I’m glad you did. This is the pluggable part, where user-driven extendibility makes all the difference. It would be naive to think that an API could be created that would meet all needs for all people. In fact, I often tell my collegues I work with "If you try to create for all, you provide value for none". That’s worth repeating…

If you try to create for all, you provide for none.

This is the problem with JEE. It tries to boil the ocean and be the end all framework. Then we get lighter weight contenders like Spring and Co. Granted, with 1.4, we see reactive APIs for those uncomfortable with the heavyweight requirement imposed by JEE. Others see an easier method and create APIs to meet the need. Now JEE is adapting.

So this is all about creating pre-emptive APIs. What is a pre-emptive API? It is an API that leaves open doors for developers to customize the programming experience. Yes, and that means allowing them to shoot themselves in the foot, too. Sure, we can provide convenience methods on top in the API itself, but it should not play a Microsoft and have special privileges that the user-driven portion does not.

Would JEE be different had it been built with a pre-emptive APIs?

So, in the above example, the Map and bean examples simply drive through the same methods that the String example did. They are simply helper methods.

The API, however, is wide open. Methods can be of type queryFirst, queryList, queryIterator, or just plain query. queryFirst deals with one result. queryList deals with a collection of results and pre-processes results. queryIterator is like queryList but provides a lazy retrival of results on demand in user code and is more efficient in some situations.

Dumbledore Gets It, Why Doesn’t Data?

I’m a big fan of Google Docs. One of my favorite features is "Revisions": when a file is saved, an immutable state or versioned is saved and can be recalled. With a simple drop down box, I can restore a previous version.

Later, I’m talking with my brother-in-law about upcoming features of Mac OS X 10.5. And we spend a good deal of time postulating how Time Machine will change the world. (If you haven’t checked out the UI twist on a fairly standard storage backup infrastructure.

The other night, I finished Harry Potter and the Deathly Hallows. (Loved it, by the way). In this final chapter of Harry’s tragic teenage angst, Harry brings us with him in to the Pensieve just as he did in previous books to learn of critical plot details by ‘experiencing’ firsthand other people’s memories. A proxied flashback, if you will.

And then it hits me.

Why isn’t all data modeled to be versioned? This universe we live in has one constant, we are on a linear timeline. (Yeah, yeah. Relativistic effects. But as far as I know, science has yet to coerce time to reverse, much less stand still.) Versions, memories, immutable state, what have you. We experience life as a series of observations. Conjuring a ‘view’ of those observations is a ‘memory’. It seems so entwined with our very nature that it seems like major oversight that all data is not versioned.