Friday, February 09, 2007

Composite Primary Keys - Friend or Foe?

Effective data modelling can be a daunting task. Ensuring robustness, efficiency and normalisation are crucial and may determine the success or failure of the system(s) it relies upon. Fortunately the relational DBMS makes this task a lot easier by providing the infrastructure with which to model the data and the tools with which to do it.

One of the key practices when modelling data within the a RDBMS is the use of a unique identifier for each relation. A primary key. The primary key not only serves as an identifier unique identifier, it is also an index (speeding up searches using the key exponentially) and cna be used in relationships with other relations.

As such a primary key is often a single column of an integer data type with many database management systems providing a facility to have its value generated automatically upon insertion of a record.

It is also possible however to create a composite primary key containing two or more columns. Its their use that resulted in a heated discussion between myself and a fellow programmer recently. The arguments against their use are quite logical:

  • Uniqueness is spread across multiple columns. This means that any queries require at least as many columns included as the composite key to guarantee a unique result.

  • The primary key index is slower as it contains more than one column, this introduces a slight delay when inserting new records as the index may need to be rebuilt.

Suffice to say, I think this fear of composite keys stems back to their often inappropriate use in Access, where the inexperienced database designer is tempted to use several (maybe more) fields in multiple relations to ensure a user does not by accident, duplicate data:


Such use of a composite key is definitely not a good move, while it ensures in some part that the same customer is not added twice - it is grossly inefficient and adds unnecessary complexity. Many however hold the opinion that a composite primary key should never be used. This is where I disagree. Consider the following, often common scenario that models an order processing system.

Here, the order item relation uses a primary key of its own and the two foreign keys which relate to the order and product relations.

Each order may have one or more products but the same product cannot be added to the order more than once (the purpose of the quantity column is to specify the number of products). Using the above model, you must check using either a trigger or in the application using the database that a duplicate is not going to occur before adding a new product to the order.

Using a trigger, you could silently update the quantity should a duplicate be found. But without the use of a trigger or stored procedure there is no other way of ensuring a duplicate item is not added to an order. Enter, the revised model:

Here, a composite key in the order_item relation is used containing the order and product Id's. The uniqueness is now spread across two columns and their is no ID column.

In my opinion the use of a composite key provides several advantages:

  • The uniqueness of each order item is ensured at the data model level without the requirement for a trigger. A side effect of this is two more advantages:

    • Where a database is accessed by multiple applications, relying on these applications to check uniqueness for you is dangerous and cannot be relied upon, especially if you are not the one developing the applications.

    • Using a trigger or stored procedure will solve the problem of uniqueness, but there is more overhead in processing a trigger than simply checking a composite key.

  • The order_id and product_id fields will benefit from indexes regardless of whether a composite primary key is used. Using the composite primary key reduces the number of indexes on the relation.

  • Although queries will require that you search using both parts of the key, it is rare that you would need to run a query which returns just a single line within an order.
It is my opinion that in some situations the use of a composite key is preferred and can be advantageous. Casting aside those diabolical access databases that use composite "mash up" keys, they can help achieve a level of security and maybe even efficiency within your database system.

Yes I did use Access for the diagrams - one thing it Excels at (pun intended) is rapid creation of ERD's.