Data Access Object (DAO) is a commonly used pattern to persist domain objects into a database. The most common form of a DAO pattern is a class that contains CRUD methods for a particular domain entity type.
Assumes that I have a domain entity class “Account”:
package com.thinkinginobjects.domainobject; public class Account { private String userName; private String firstName; private String lastName; private String email; private int age; public boolean hasUseName(String desiredUserName) { return this.userName.equals(desiredUserName); } public boolean ageBetween(int minAge, int maxAge) { return age >= minAge && age <= maxAge; } }
Follow the common DAO approach, I create a DAO interface:
package com.thinkinginobjects.dao; import com.thinkinginobjects.domainobject.Account; public interface AccountDAO { Account get(String userName); void create(Account account); void update(Account account); void delete(String userName); }
The AccountDAO interface may have multiple implementations which use some kind of O/R mapper or executing plan sql queries.
The pattern has these advantages:
- It separates the domain logic that use it from any particular persistence mechanism or APIs.
- The interface methods signature are independent of the content of the Account class. When you add a telephone number field to the Account, you don’t need to change the AccountDAO interface nor its callers’.
The pattern has many questions unanswered however. What if I need to query a list of accounts having a specific last name? Am I allow to add a method to update only the email field of an account? What if I change to use a long id instead of userName? What exactly a DAO is responsible for?
The problem of the DAO pattern is that it’s responsibility is not well-defined. Many people think it as a gateway to the database and add methods to it when they find potential new ways they’d like to talk to the database. Hence it is not uncommon to see a DAO getting bloated like the one below.
package com.thinkinginobjects.dao; import java.util.List; import com.thinkinginobjects.domainobject.Account; public interface BloatAccountDAO { Account get(String userName); void create(Account account); void update(Account account); void delete(String userName); List getAccountByLastName(String lastName); List getAccountByAgeRange(int minAge, int maxAge); void updateEmailAddress(String userName, String newEmailAddress); void updateFullName(String userName, String firstName, String lastName); }
In the BloatAccountDAO, I added two query methods to look up Accounts with different parameters. If I had more fields and more use cases that query the account differently, I may end up with written more query methods. The consequences are:
- Mocking the DAO interface becomes harder in unit test. I need to implement more methods in the DAO even my particular test scenario only use one of them.
- The DAO interface becomes more coupled to the fields of Account object. I have to change the interface and all its implementations if I change the type of fields those stored in Account.
To make things even worse, I added two additional update methods to the DAO as well. They are the direct result of two new use cases which update different subset of the fields of an account. They seem like harmless optimisation and fit into the AccountDAO interface if I naively treat the interface as a gateway to the persistence store. Again, the DAO pattern and its class name “AccountDAO” is too loosely defined to stop me doing this.
I end up with a fat DAO interface and I am sure it will only encourages my colleagues to add even more methods to it in the future. One year later I will have a DAO class with 20+ methods and I can only blame myself chosen this weakly defined pattern.
Repository Pattern:
A better pattern is Repository. Eric Evans gave it a precise description in his book [DDD], “A Repository represents all objects of a certain type as a conceptual set. It acts like a collection, except with more elaborate querying capability.”
I go back and design an AccountRepository follow this pattern.
package com.thinkinginobjects.repository; import java.util.List; import com.thinkinginobjects.domainobject.Account; public interface AccountRepository { void addAccount(Account account); void removeAccount(Account account); void updateAccount(Account account); // Think it as replace for set List query(AccountSpecification specification); }
The “add” and “update” methods look identical to the save and update method of my original AccountDAO. The “remove” method differs to the DAO’s delete method by taking an Account object rather than the userName (Account’s identifier). It you think the Repository as a Collection, this change makes a lot of sense. You avoid to expose the type of Accounts identity to the Repository interface. It makes my life easy if I’d like to use long values to identify the accounts.
If you every wonder the contracts of the add/remove/update method, just think about the Collection metaphor. If you ever think about whether to add another update methods to the Repository, think if it make sense to add an extra update method to a Collection.
The “query” method is special however. I wouldn’t expect to see a query method in a Collection class. What does it do?
The Repository is different to a Collection when we consider its querying ability. With in memory collection, it is simple to iterate through and find the one I am interested in. A repository deals with a large set of objects that typical not in memory when the query is performed. It is not feasible to load all the instances of the Account from the database if all I want is an Account with a particular user name. Instead, I pass a criterion to the Repository, and let the repository to find this object/objects that satisfies my criteria in its own way. The Repository may decide to generate a sql against the database if it is backed by a database table, or it may simply iterate through its collection if it is backed by a collection in memory.
One common implementation of a criterion is Specification pattern. A specification is a simple predicate that takes a domain object and returns a boolean.
package com.thinkinginobjects.repository; import com.thinkinginobjects.domainobject.Account; public interface AccountSpecification { boolean specified(Account account); }
Therefore, I can create one implementation for each different way I’d like to query AccountRepository.
The standard Specification works well with in memory Repository, but cannot be used with database backed repository because of inefficiency.
To work with a sql backed AccountRepository implementation, my specifications need to implement SqlSpecification interface as well.
package com.thinkinginobjects.repository; public interface SqlSpecification { String toSqlClauses(); }
A plan sql backed repository can take advantage of this interface and use the produced partial sql clauses to perform database query. If I use a hibernate backed repository, I may use the HibernateSpecification interface instead, which generates a hibernate Criteria when invoked.
The sql and hibernate backed repositories does not use the “specified” method, however I found it is very beneficial to implement it in all cases. Therefore I can use the same implementation classes with a stub AccountRepository for testing purpose and also with a caching implementation of the repository before the query hit the real one.
We can even take a step further to composite Specifications together with ConjunctionSpecification and DisjunctionSpecification to perform more complicate queries. However I feel it is out of the scope of this article. You can find more detail and examples about this in Evan’s book [DDD] if you are interested.
package com.thinkinginobjects.specification; import org.hibernate.criterion.Criterion; import org.hibernate.criterion.Restrictions; import com.thinkinginobjects.domainobject.Account; import com.thinkinginobjects.repository.AccountSpecification; import com.thinkinginobjects.repository.HibernateSpecification; public class AccountSpecificationByUserName implements AccountSpecification, HibernateSpecification { private String desiredUserName; public AccountSpecificationByUserName(String desiredUserName) { super(); this.desiredUserName = desiredUserName; } @Override public boolean specified(Account account) { return account.hasUseName(desiredUserName); } @Override public Criterion toCriteria() { return Restrictions.eq("userName", desiredUserName); } }
package com.thinkinginobjects.specification; import com.thinkinginobjects.domainobject.Account; import com.thinkinginobjects.repository.AccountSpecification; import com.thinkinginobjects.repository.SqlSpecification; public class AccountSpecificationByAgeRange implements AccountSpecification, SqlSpecification{ private int minAge; private int maxAge; public AccountSpecificationByAgeRange(int minAge, int maxAge) { super(); this.minAge = minAge; this.maxAge = maxAge; } @Override public boolean specified(Account account) { return account.ageBetween(minAge, maxAge); } @Override public String toSqlClauses() { return String.format("age between %s and %s", minAge, maxAge); } }
Conclusion:
DAO pattern offers only a loosely defined contract. It suffers from getting potential misused and bloated implementations. The repository pattern uses a metaphor of a Collection. This metaphor gives the pattern a tight contract and make it easier to understand by your fellow colleagues.
References:
[DDD] – Domain-Driven Design Tackling Complexity in the Heart of Software. By Eric Evans.
Hi, interesting article – but at the heart of it at some point this needs to be translated into sql queries, doesn’t it? It appears you have re-organized your domain logic and the operational criteria into more manageable/composable granular segments (unless I’ve missed the point).
How do you go about tackling the generation of the final SQL that gets executed?
It is up to the implementations of the repositories to generate SQLs. A popular choice is to use hibernate or some sort of data mapping framework. First we need to define mappings between the domain object and database tables in xml files. Then you can pass the domain objects directly to a hibernate session to persist. Hibernate will generate and execute SQLs behind the scene. If you use hibernate, the implementation of the repository simply delegate add/update/remove calls to the hibernate session.
Alternatively you may hand craft your own implementation of the repositories, which use getter methods to extract data from the domain object and then generate sqls manually.
Unless you are not using a SQL store to persist data. Then you are going to have to carry your SQL idioms over to an interface that doesn’t work in the same manner.
@nwang0 : so if we are using Hibernate, what’s the point of using the repository pattern if all it does is delegating, aren’t you saying that Hibernate internally implements the Repository pattern ? what’s the point to add an extra repository layer ?
Hi Jeff, I agree with you that the Session interface of the Hibernate follows the Repository pattern already. However, by introducing your own repository interfaces, you decouple your domain objects from the hibernate implementation, make the domain model package easy to test and reuse.
How about performance?
For example how Repository pattern can handle this SQL query: UPDATE accounts SET deposit = deposit + 1 WHERE active = 1;
In Repository patternt we need first to query collection (WHERE active = 1). Read ALL fields of each Account. Map them to Account objects. Then update only ONE field in our programming language in loop. Then hibernate will save updated entities at the session end (many update queries). In performance sense it is VERY inefficient.
Thinking about DB as collection is maybe good for unit testing and decoupling but not for a performant SQL queries if our Data is still in Relational DB.
So maybe Repository pattern is viable but only for very simple use cases where we don’t need complex sql updates (in my practice I need them very often). For simple CRUD for example.
I think this would be a case for adding one more method to the AccountRepository class:
List query(AccountSpecification specification);
List modify(AccountModificationSpecification specification);
where AccountModificationSpecification is similar to AccountSpecification but its toSqlClauses method produces a full query rather than just the “WHERE” segment.
Let’s do some over engineering then just paste the parameter values into strings (instead of using obsolete binds I assume). Nice.
This pattern may seem like overkill of all you are using is a relational database for the the backing store, but the story changes when you are integrating systems that are now using redis, cassandra, several APIs you need to interact with from other systems. Instead of a surfeit different interfaces and access methods, they can be abstracted by providing a Repository interface to them all that will greatly simplify the implementations in your domain.
“Am I allow to add a method to update only the email field of an account?” How is this problem solved using the Repository pattern?
Sure, write a Specification class for the modify function that handles that particular behavior. I would add an enum to the Specification that allows you to define the call, and the Specification implements map, so you can add your parameters as key/value pairs, pass that in as an argument on modify(spec), and Bob’s your uncle… The intent of this pattern is not to write the absolute minimum code, it is there to make your data access interface consistent, instead of a sea of get, fetch, update. change, etc., prepended to methods named usually by a AndBy(…); structures. Instead, your interface becomes Repo.modify(spec), it is consistent everywhere.
Your code would end up looking like this:
Spec spec = new Spec();
spec.type = Spec.EMAIL_UPDATE;
spec.add(“id”, 42);
spec.add(“email”, “somename@someaddress.com”);
Repo.modify(spec);
or you could use a SpecBuilder pattern…
Spec spec = Spec.Type(Spec.EMAIL_UPDATE).add(“id”, 42).add(“email”, “somename@someaddress.com”);
Repo.modify(spec);
or
Repo.modify(Spec.Type(Spec.EMAIL_UPDATE).add(“id”, 42).add(“email”, “somename@someaddress.com”));
As a note, I would also add argument assertions to the modify call, so if the spec is malformed, you will get immediate error feedback. Additionally, the enum communicates to the user of the interface what actions are available to them, and can be easily extended in a localized, consistent manner.
But as a user of the system, I don’t have to know what your data implementation was to get access, or modify that information, that is handled in the abstraction, and I have clear access to the data and cleaner logic.
So when you have hundreds of data calls, they are consistent, short, and well-organized in a tree hierarchy that makes them much easier to find and use in an IDE, or any other mode of editing.
Agree 100%. Definitely use repository pattern. To go even further, I would auto-generate as much as I can my Data Access layer (repositories, DbContext, interfaces) based on my domain model using something like http://www.sswdataonion.com. That way your data layer implementation can become least of your worries and you can focus on your business and UI layers
Any suggestion on how you would implement the addAccount-method regarding transactions?
No. Use a DAO when you need a DAO, whereas you don’t need a repository. A DAO can basically be used as a messaging system, between the application and the data base. So if you need to generate a report, which is often a rendering of read only data, or update any logging tables, then a DAO should suffice. No need for managing such transactions in session, just quick data dumps and updates.
Yes. For managing actual domain entities, and not “value objects”, use a repository.
About the specification pattern. For what I saw, in your business layer you would have to call something like:
accountRepository.query(new AccountSpecificationByUserName(“testUser”));
But AccountSpecificationByUserName is bound to a HibernateSpecification (or SqlSpecification, or XmlSpecification, or etc.). Aren’t you coupling you business logic to your persistance method? If I want to change the persiscante of the accounts to Entity Framework, I have to either:
– Alter the AccountSpecificationByUserName implementation.
– Add the EntityFrameworkSpecification to AccountSpecificationByUserName, having both interfaces/methods (and posibly N persistance specific interfaces/methods).
I see no transparent way to allow the inyection of the persistance implementation. Did I miss something?
I do believe all of the ideas you have presented for your post.
They are really convincing and can definitely work. Nonetheless, the posts are very short for starters.
May you please prolong them a bit from subsequent time?
Thanks for the post.
Hi there friends, its wonderful paragraph concerning teachingand fully explained,
keep it up all the time.
Like for the blog. I’m also confused which methods belongs to the repository. But, there is one more defitnion from the book (http://www.amazon.com/Domain-Driven-Design-Tackling-Complexity-Software/dp/0321125215)
….
REPOSITORIES can implement a variety of queries that select objects based on whatever criteria the
client requires. They can also return summary information, such as a count of how many instances
meet some criteria. They can even return summary calculations, such as the total across all
matching objects of some numerical attribute.
….
Is that means that repository can still have granular methods (GetById, GetBySearchWord, GetByAge,….)?
Could you please add an example for Repository Implementation?
I think that DAOs and Repositories are both suited patterns for solving different problems. A DAO can become more bloated if it is used incorrectly, while a repository is also hinting at data being accessed from a bulk of data (a repo, there is more there).
A problem with the Repository pattern is that it may become too narrow. But if you use it together with DAOs then you can strike a balance. I think Spring.Data demonstrates this well. So to moderate your opinion, I would recommend using both DAOs and Repositories, not either nor the other.
Good day! What prevents you from implementing the queries based on specs in the DAO format? To me, it’s simply a matter of naming the object. Let me throw an answer here: DAO = Sun; Repository = Microsoft;
Microsoft wouldn’t come with a pattern name that is owned by it’s competitor.
The DAO pattern is well defined: http://www.oracle.com/technetwork/java/dataaccessobject-138824.html
How people make those classes evolve, that’s a matter of discipline 😉
In any case, naming UserDAO or UserRepository, if you need to retrieve a user by it’s social insurance number, people will most likely add a getUserBySIN() and not create a new Spec.
@Fernando: I understand the coupling will be between the implementations and not between the interfaces. So, it would not break the abstraction.
Great article,thank you! Regarding “delete”: that is difficult to implement within a REST architecture. because I only get an ID from the resource. Therefore I have to load that object from repository, before I can delete it?
In a REST architecture, I only get the ID of the object to delete. Should I load the domain object before I can delete it?