Wednesday, June 8, 2011

Surrogate keys

Surrogate key is a unique identifier used to uniquely identify records in a table. Surrogate keys are generally used in data warehousing.

Then what is the difference between surrogate key and a primary key?

Well, it depends on the type of database model being used. For a current database, where only current data are kept, there will be one to one correspondence between surrogate key and primary key.

But in a temporal database, where there are lots of data with same primary key value, as historical data are also saved.

In a temporal database, it is necessary to distinguish between the surrogate key and the primary key. Typically, every row would have both a primary key and a surrogate key. The primary key identifies the unique row in the database, the surrogate key identifies the unique entity in the modeled world; these two keys are not the same. For example, record containing a customer location details when he lived in US and now when he lives in UK. Both will have same customer_id which is the primary key. But in a temporal database where historical database are stored, surrogate key will be required to uniquely identify record between these two records. For such kind of situations, surrogate keys are required.

Example:

A customer table containing customer details.

Customer_id

Customer_name

Customer_address

Customer_city

Phone_number

Email_id

1002

John Smith

10, Wall Street

New York

1123301111

john.smith@abc.com

1002

John Smith

12, Grand Street

London

1122331111

john.smith@abc.com

As you see here, we cannot identify difference between the two records. As the customer_id are same. But as first record was input 10 years before and now the customer has moved to a new location. So, new record also contain the record. Such situation can be dealt using surrogate keys.

Customer_key

Customer_id

Customer_name

Customer_address

Customer_city

Phone_number

Email_id

1

1002

John Smith

10, Wall Street

New York

1123301111

john.smith@abc.com

2

1002

John Smith

12, Grand Street

London

1122331111

john.smith@abc.com

Now you can see that surrogate keys are unique even if there is temporal data in a database.

A surrogate key are generally generated using database sequence objects.

Advantages of a surrogate key:

Immutability

Surrogate keys do not change while the row exists. This has the following advantages.

Requirement changes:

Attributes that uniquely identify an entity might change, which might invalidate the suitability of the natural, compound keys.

Performance

Surrogate keys tend to be a compact data type such as a four-byte integer. This allows the database to query the single key column faster than it could multiple columns.

No comments:

Post a Comment

Oracle analytics - Different pricing schems

OBIEE on premise Licensing: Component based licensing: The pricing is based on the user base or volume(COGS/revenue). It is a buy as ...