Why are natural numbers still used for primary keys?

Most database products have a facility for creating unique numbers. These numbers are commonly used as the primary key of a record. You can often see this exposed as a small integer in the URL of a web page. This posting is not about the risks of exposing primary keys but rather about why natural numbers are still used for primary keys.

The observation I want to make is that 1234 can easily be mistaken for 1324. Suppose I have equal access to both the records associated with 1324 and 1234, what mechanisms prevent or at least hinder me or my agents using of the wrong one?

UPCs numbers are numeric but not natural numbers. Credit card numbers are numeric but not natural numbers. ISBNs numbers are numeric but not natural numbers. Each of these are examples of identifiers exposed as "numbers." These identifiers have useful characteristics. They are self validating so that some transcription errors invalidate the identifier. Some identifiers are self correcting. And in the vast universe of value for, for example, 13 digit numbers only a small fraction of the values are valid ISBNs.

As ever more systems record URLs as references to data we need to be ever more careful about our identifiers. Public identifiers should not be natural numbers. Perhaps any system of identification that does not require ordering by identifier should not use natural numbers. It is time that database products had a facility for creating and using unique identifiers that is as easy to use a sequence table or an automatically incrementing integer. Until then, add a check digit.

ISBNs are not stricly numeric as the check digit might be 10 and this is represented as "X".

No comments: