27 June 2010

Why enums in the database are a good idea

I've recently had a bit of a discussion about database enums on StackOverflow. My opinion is that enums are useful, while Richard thinks they are quite evil.

There are times when your schema is fairly simple and you only work with the database through your application and have a nice GUI, so you don't mind that your tables are filled with endless columns of opaque numbers.

However, I had a job where I spent heaps of time in psql running queries to track down issues or find patterns in data or produce one-off reports. I was using Postgres 8.3 at the time, and I found myself wishing for enum fields as they would give me

  • simpler, shorter, more readable queries 
  • easier to read query results in psql
  • fewer joins to type out and make mistakes in
  • fewer table names to remember.

Database enums produce similar benefits in your application code, regardless of whether you are using an ORM library, so I was pleased that they were introduced in Postgres 8.4.

I'd like to address Richard's concerns about enums:

1. Enums are case sensitive, spaces are significant

This is no different than a lot of programming languages, so developers are well equipped to handle this issue.

2. Enums are sometimes not comparable

True, and that's why you choose the right tool for the right job - don't use an enum when you need to compare values, or write your own comparison function. I would say that there are heaps of cases where you have 2-3 possible values of an attribute and you don't need to compare them - enums are great for that.

3. Enums do not allow translation into other languages

Absolutely, so you should treat enum values same as you would treat integers - they are internal values, and you look up the translation for the value as needed. Enums are there for the developer's benefit (i.e. better readability and simpler queries), and their values probably shouldn't go straight to the UI.

4. Enums are not ANSI standard and are therefore implemented differently (if at all) in different database engines

Sure, and this is where a layer of abstraction like an ORM library can hide those details.

5. When using a text string to compare within code it is possible to misspell it and the language will not warn you

People say similar things about dynamic languages but it turns out that better unit testing and the increase in productivity more than offset this issue.

6. Enums are ordered in sorts in the order in which they were created. 

If you've got integer constants, your sorts are only going to work until requirements change and you need to insert a new value right in the middle of your carefully arranged range to get the right sort order. In practice, you'll probably end up calculating an extra sort field in queries regardless of whether you have an enum or an int column, unless you are happy to update all the values. Otherwise it's the same as point 2 - don't use an enum if it's not going to do the job.

7. Enums are not easily maintained, and lock the database into a version of the world when the database was designed

It is possible to alter tables, and database migrations (e.g. in Ruby on Rails) are not a difficult task to handle. The previous point shows that integers have their share of problems anyway. As a side note, everyone is using automated migrations, right? I have run into issues with altering the schema when working with large tables (tens and hundreds of millions of rows) as it was simply taking unacceptable amounts of time. However, this is hardly an enum specific issue, and I maintain that at that scale you need to architect your database and applications with these issues in mind (e.g. by including a switchover node to handle schema changes).


To summarise, I am not advocating mixing identity and description with enums, what I am advocating is ease of use and enums provide that in a lot of cases. However, like any other tool they have to be used appropriately.

22 June 2010

How to use specs efficiently: exploration and time estimates

I think we had a successful approach to specifications in my previous job as we delivered the project on time and with only a couple of issues found in production.

The purpose of writing specs was twofold:
  • we used them to make time estimates
  • they were an exploratory tool to help us understand the scope of the project.
We wrote specs upfront for the whole product but without going into too much detail, and with an emphasis on the UI. This was a means for us to get a feel for what had to be done and for the scope of the project. After that we went through the specs and listed all the tasks that had to be carried out at the level of "create the options dialog layout", "implement getting and setting of options" and so on. This unearthed a lot of things that would otherwise have turned into surprises and potential delays during implementation.

The next step was to assign time to each task. There was two of us performing estimates, which is important because when we disagreed it often highlighted things we'd forgotten about and allowed us to improve estimates, or otherwise we just took the average of our numbers for a better guess. We used a granularity of 0.5 days for our estimates, and kept the "multiply your estimate by 2" rule firmly in mind.

We found that this works out to be fairly accurate. Although the averaged times for particular tasks can still end up being wrong, you finish some of them later and some earlier because of 0.5 day granularity.

When we started implementing things, we had to work everything out in a lot more detail and inevitably had to do some things differently from the spec. To that end, we got together and worked out the best approach to implementing each feature, sometimes with prototypes. We didn't update the original spec but we did make notes after the meetings as it's very easy to forget the details afterwards.

Depending on the project, it may also be a good idea to keep the original spec up to date as the application evolves, but we didn't need to do it this time.

In spite of the Agile bent of the project, we found that writing a complete spec in the beginning was a great investment of our time, even though we fully expected it to get outdated quickly.