27 September 2007

I've been thinking about data access a lot lately from two fronts:

  1. Distributed--in particular HTTP, siding primarily with standardizing on the Atom Publishing Protocol as the default implementation of REST
  2. In-process--in particular Java

As a result of my pondering, I believe there are two larger classifications of applications in the real world:

  • Entity-oriented applications define a clear set of entities for the application and use them for all operations.
  • Data-oriented applications split up code in one logic layer and one data access layer. However, the data access layer manipulates the tables directly, not always passing through an entire entity representation.

In my opinion, the data-oriented method is the way people actually code their applications. Recently, I've spent time on a mailing list trying to teach the principles of REST, a highly arguable topic because it means different things to different people. To some, it means anything HTTP that isn't SOAP. To others, it is an architectural style, HTTP the way it was designed.

I side with the later.

So, in thinking about distributed data access in context of the discussions on this mailing list, I've come to realize that people are cramming their traditional, in-process data access mindset into HTTP. Its not their fault, they've been conditioned that way because of API creators.

As an example, consider the typical scenario below:

You have customer and order tables and need to display in a grid each customer and the total number of orders.

In entity-oriented applications, there are two primitives: entity (or resource in REST) and lists of entities. Here, the developer codes to simply load all the customers. Then, in each customer entity, a link points to list of order entities.

In data-oriented applications, most of the time, you will have a query in your access layer such as "SELECT c.name, count(o) FROM customer c INNER JOIN order o ON c.id = o.customer_id GROUP BY c.name". The result of this query can not be mapped to an entity. If you want to use the result, you will need to one of the following:

  • Bind the ResultSet to managable grid of sorts
  • Load the ResultSet into some disconnected structure for later usage
    • DataSet
    • XML
    • other structure

Both solutions can lead to clean, well-defined applications. It all depends on your needs.

In the data-oriented application, it is often hard to know what are the data structures that are passed across tiers because they may vary by implementation. For instance, Array for a single column select, XML for complex results, DataSet, custom classes (Hibernate, JPA).

However, in entity-oriented appications, other problems may appear: performance, partial entity loading, complex database structure, etc.