Friday, February 09, 2007

Composite Primary Keys - Friend or Foe?

Effective data modelling can be a daunting task. Ensuring robustness, efficiency and normalisation are crucial and may determine the success or failure of the system(s) it relies upon. Fortunately the relational DBMS makes this task a lot easier by providing the infrastructure with which to model the data and the tools with which to do it.

One of the key practices when modelling data within the a RDBMS is the use of a unique identifier for each relation. A primary key. The primary key not only serves as an identifier unique identifier, it is also an index (speeding up searches using the key exponentially) and cna be used in relationships with other relations.

As such a primary key is often a single column of an integer data type with many database management systems providing a facility to have its value generated automatically upon insertion of a record.

It is also possible however to create a composite primary key containing two or more columns. Its their use that resulted in a heated discussion between myself and a fellow programmer recently. The arguments against their use are quite logical:

  • Uniqueness is spread across multiple columns. This means that any queries require at least as many columns included as the composite key to guarantee a unique result.

  • The primary key index is slower as it contains more than one column, this introduces a slight delay when inserting new records as the index may need to be rebuilt.

Suffice to say, I think this fear of composite keys stems back to their often inappropriate use in Access, where the inexperienced database designer is tempted to use several (maybe more) fields in multiple relations to ensure a user does not by accident, duplicate data:


Such use of a composite key is definitely not a good move, while it ensures in some part that the same customer is not added twice - it is grossly inefficient and adds unnecessary complexity. Many however hold the opinion that a composite primary key should never be used. This is where I disagree. Consider the following, often common scenario that models an order processing system.

Here, the order item relation uses a primary key of its own and the two foreign keys which relate to the order and product relations.

Each order may have one or more products but the same product cannot be added to the order more than once (the purpose of the quantity column is to specify the number of products). Using the above model, you must check using either a trigger or in the application using the database that a duplicate is not going to occur before adding a new product to the order.

Using a trigger, you could silently update the quantity should a duplicate be found. But without the use of a trigger or stored procedure there is no other way of ensuring a duplicate item is not added to an order. Enter, the revised model:

Here, a composite key in the order_item relation is used containing the order and product Id's. The uniqueness is now spread across two columns and their is no ID column.

In my opinion the use of a composite key provides several advantages:

  • The uniqueness of each order item is ensured at the data model level without the requirement for a trigger. A side effect of this is two more advantages:

    • Where a database is accessed by multiple applications, relying on these applications to check uniqueness for you is dangerous and cannot be relied upon, especially if you are not the one developing the applications.

    • Using a trigger or stored procedure will solve the problem of uniqueness, but there is more overhead in processing a trigger than simply checking a composite key.

  • The order_id and product_id fields will benefit from indexes regardless of whether a composite primary key is used. Using the composite primary key reduces the number of indexes on the relation.

  • Although queries will require that you search using both parts of the key, it is rare that you would need to run a query which returns just a single line within an order.
It is my opinion that in some situations the use of a composite key is preferred and can be advantageous. Casting aside those diabolical access databases that use composite "mash up" keys, they can help achieve a level of security and maybe even efficiency within your database system.

Yes I did use Access for the diagrams - one thing it Excels at (pun intended) is rapid creation of ERD's.

4 comments:

Anonymous said...

nice

abhijit said...

there are other things access excels at! like app portability for instance where budget constraints prevent you from owning a real database.

Anonymous said...

I agree whole-heartedly. Composite keys are completely appropriate when a table represents a many-to-many relationship, just as you've shown. That said, I can't think of any other situation where they would be appropriate. I'll wager neither can you, because the advantages you've listed would only exist in this situation.

abhijit, no-one's saying that Access doesn't have a place. It's just that many of the people who use Access rather than a full RDBMS don't really understand the implications of a lot of what they do and create inefficient designs. The same can be done in "proper" databases too but those who design them have historically been more savvy developers who can create better designs. With the advent of SQL Server Express and the like though, that's changing too. :)

jmcilhinney

Adam said...

There is only one other situation where I have seen composite keys. This is where a primary key is a mixture of types. E.g: TD13543.

There are a few advantages gained. The second part of the string could be a number used as an identifier else where. Using the mixed type composite key will enable you to search using just that field, meaning no need to use inbuilt functions and less overhead.

On the subject of access - we a section on our website that relies on data from and access database. Granted, the data is extracted and put into XML format for the site but it does have its uses and the quick and easy form and report feature make it very attractive make it a good selling point for office bodies who insist on using Excel to store all their data.