One of the core constructs within Data Modelling is identifying the Business Key (BK) for a table – or, generally, an entity.
There appears to be considerable confusion and misunderstanding amongst the data modelling community as to what exactly is a BK?
Does it have to contain a commonly recognised ‘business’ term, such as ‘Employee Name’ instead of just being the number, ‘Employee-id’?
In Data Modelling, it is agreed that an entity’s Business Key must be:
1. Unique
2. Mandatory
3. Immutable
4. Usable as a Foreign-Key (FK) within the organisation’s domain of discourse.
Let’s look at a typical data modelling scenario where we have:
· A number of source tables from one or more interacting applications
· These applications are being migrated or ETL (Extract-Transform-Load) into a Data Warehouse (DW)/ Data Vault (DV)/ Data Lake. Let’s consider the target to be a DW for our purposes.
· It is required to model the keys and data relationships amongst all the tables in scope
· It is required to ingest both current and historical data from these source tables.
Now, when modelling the Employee table, the key that fits the above four criteria is the Employee-id; not the Employee-Name.
Why?
Primarily because an employee’s name:
· Can change over time and is thus not immutable
· Can have duplicates and therefore non-unique, especially in large organisations
· Can really not be used as a Foreign Key (FK) – or very clumsily – in relationships between tables.
However, an employee name is commonly used in business discourse. For example, if Mary Smith (Employee-id: 23) is a Project Manager (PM), a report of current PM’s will show “Mary Smith” along with her other attributes.
Her Employee-id of 23 might also be mentioned in the report but only as a secondary, supporting field.
Everybody in the organisation knows her as Mary Smith; whereas very few people would know her Employee-id.
This is all fine and is the rationale behind Semantic Modelling where we look at data models from the user’s point of view.
However, beneath the covers, we have to use the Employee-id as the BK in the relational normalised (3NF) data model for the HR application.
Many project teams spend a lot of time and effort trying to discover a ‘business-centric’ BK and would discard the Employee-id since it is just a number – like 23 in our example.
I believe this leads to wasted effort and time. Let me explain why.
The source OLTP (Online Transaction Processing) application where the employee data is being mastered is probably the HR/Payroll application.
When Mary Smith was set up as a new employee, the HR application assigned her an Employee-id of 23 – probably using a simple algorithm, such as the next number in sequence.
This Employee-id would uniquely identify her for the tenure of her employment with the organisation.
It would be used in:
· Payroll
· Training courses that Mary undertakes
· Any official correspondence with Mary
· Passed as an FK to associated applications in the organisation’s ecosystem.
Yes – the Employee Name would be used in correspondence as well. But Mary Smith would be identified within HR and other OLTP applications with her Employee-id.
For example, her compliance training records would probably be stored in a training application with details of attended training courses. This training application would also use her Employee-id as the BK for Mary Smith.
Why does not the HR system use Mary’s Tax File Number (TFN) or Passport Number – which are also unique and appear to have some wider ‘business meaning’?
There are several reasons, chiefly:
1. Not everyone has a Passport Number or even a TFN.
2. It is not mandatory for a new employee to expose their TFN under privacy laws.
Primary and Surrogate Keys
A Business Key satisfying the four conditions above can serve as the Primary Key (PK) for the entity/table. ‘
It is just because of this that the BK can act in Foreign Key (FK) relationships with all other tables, since, by definition, the FK must be the PK of another table.
A Surrogate Key is a system generated hash key or UUID. Hashing algorithms, such as SHA1, are used to generate such a (Hex-32) key which looks something like this:
123e4567-e89b-12d3-a456-426614174000
An SK is generally not used as an FK in relationships within the source OLTP applications – although, theoretically, it can be.
This is primarily because it is not user-friendly nor easily understandable by project personnel.
It is often used in Data Warehouses/Vaults as a unique key for an instance of an entity - whether current or historical, especially in Dimensional Data Marts, that is, Kimball methodology.
In any case, a hashed-SK cannot function as a Business Key because it:
· Has even less meaning to the non-technical user than the Employee-Id
· Is never used in business discourse / correspondence.
In our illustrative example, we have considered only a system-generated sequence number for the Employee-id BK.
Applications can use any algorithm of choice and structure to generate their BK.
For instance, a Home Insurance Policy BK could be structured as follows:
· H: Home
· P: Policy
· A: Account
· System generated sequence number
For example, HPA0011111111111
So, the next Home Insurance Policy generated by the application would be HPA0011111111112 and function as the BK for the Policy entity.
In summary, in data modelling, a Business or Natural Key does not have to contain apparent ‘business context’ like ‘employee name’.
An application-generated key, such as Employee-id, is a perfectly valid BK and is often generated using an automated algorithm conforming to a defined structure.
These keys fulfil all the criteria of a valid Business Key in data models and can be utilised within the organisation’s wider eco-system.