Entity instance indentifiers and references.
Keys are attributes (or combinations of attributes) in an entity that identify instances of that or another entity. The declaration of a key on an attribute of an entity implicitly places a constraint on the values that instances of that entity may have for the given attribute. For example, declaring an attribute as a primary key or unique key of an entity also declares that no two instances of that entity may have the same value for that attribute. Similarly, declaring an attribute of an entity as a foreign key, referencing a primary key or unique key of another entity, also declares that no instance of the first entity may have a value for that attribute that doesn’t match the specified primary or unique key attribute value in an instance of the second entity.
A primary key is an attribute (or combination of attributes) designated to uniquely identify (and distinguish between) instances of an entity. Strong entities are required to have primary keys; in practice, weak entities usually have primary keys as well.
Primary keys must follow some rules quite strictly:
Only one primary key may be declared for an entity.
A NULL
value can’t be assigned to an attribute designated as a primary key.
All of the instances of an entity must have distinct values for the primary key attribute. Thus, the primary key represents a constraint, as noted above.
In addition, there are additional rules and guidelines that are either enforced by most database management systems, or followed widely in practice:
Primary key values are not reused. In other words, if an instance is removed from the set of instances for an entity (i.e. a row is removed from the table embodying the entity in the data store), its primary key value is not assigned to any instances added to the set in the future.
Primary key values are usually generated automatically—e.g., using auto-incremented integer sequences.
As a matter of practice, an attribute used as a primary key should be dedicated to that purpose, and should not have any other intrinsic meaning.
For example, while it may seem natural to use the social security number (SSN) as a primary key in an employee
entity (assuming the set of employee
instances is intended to contain only US citizens and non-citizens who may legally work in the US), doing so is not advisable: It would couple (to an unacceptable level) the data model implementation with external factors that are not within our control. In this case, our data model implementation would require significant changes (not just to the employee
entity, but to many others as well) if the length or makeup of SSNs were changed by the Social Security Administration.
In the example below, repeated from the previous page, student_id
is the attribute designated as the primary key. As described previously, it is shown in bold, with the PK
indicator.
A unique key is an attribute (or combination of attributes), other than that used for a primary key, that uniquely identifies an entity instance. While unique keys and primary keys are similar in some ways, the rules and practical guidelines for unique keys are less strict than for primary keys.
Unique keys may permit NULL
values (though this isn’t always a good idea).
An entity may have more than one unique key declared.
All the instances of an entity must have distinct values for the unique key attribute(s), at least for non-NULL
values. Thus, like the primary key, a unique key represents a constraint on the instances in the entity set. However, while some database management systems permit a single instance of an entity to have a NULL
value for a unique key attribute, some others suspend the unique key constraint altogether for instances with a NULL
value in a unique key attribute.
As a rule, database management systems do not include the same support for automatic generation of unique key attribute values as they do for primary key attribute values.
In practice, while primary key attributes should be dedicated to that purpose, unique keys are often declared for attributes that have some intrinsic meaning. Expanding on the SSN scenario described above, if an employee
entity includes a social_security_number
attribute, declaring that attribute as a unique key would help avoid the accidental addition of duplicate or redundant employee
instances, since the unique key constraint would prevent multiple instances with the same SSN.
In an ERD, unique keys should be denoted by a UK
, UQ
, or UX
indicator. Since an entity may have multiple unique keys, these indicators should be numbered. (In the DDC style guide, these are declared as strict rules.)
In the classroom management example, it might be useful to declare the name
attribute of the course
entity as a unique key, to avoid potential confusion caused by multiple courses having the same name; we might represent this as follows. (Notice the use of the UQ1
indicator on the name
attribute. Also, note that the attributes of course
are still rather minimal; we’ll be adding to them in later examples.)
A foreign key is an attribute (or combination of attributes) of an entity that references another entity (or, in some cases, the same entity). That is, every instance of the first entity has a value for the foreign key attribute that matches a unique identifier (usually the primary key value) of an instance of the second entity. Putting it another way: foreign keys—and more specifically, the constraints implied by them—effectively define relatonships between entities.1
Of course, there are strict rules and some practical guidelines for foreign keys:
The value of a foreign key attribute must match a primary key (or less commonly, a unique key) value of an instance in the referenced entity. This is the constraint declared implicity when the foreign key is declared.
An entity may include multiple foreign keys.
In general, foreign key attributes permit NULL
values. These are very much like the value null
in Java object references: a foreign key value of NULL
refers to no instances of the parent entity.
Though permitted, allowing NULL
values in a foreign key isn’t always a good idea. In particular, it should never be allowed in a weak/child entity’s foreign key referring to the strong/parent entity: this can result in orphan instances—child entity instances that are no longer associated with parent entity instances, and which serve no purpose in the system functionality, but which can impact performance and produce inconsistent or otherwise undesirable results in aggregate queries.
In a many-to-one or one-to-many relationship embodied in a foreign key, the entity on the many side of the relationship always has the foreign key referring to the other entity, while the entity on the one side never has the foreign key referring to the entity on the many side—at least not for the given relationship. (It’s not unheard of to have more than one relationship—not necessarily of a single type—between a pair of entities.)
In a one-to-one relationship embodied in a foreign key, either entity (but only one of the two) must have a foreign key attribute that refers to the entity on the other side of the relationship.
In a many-to-many relationship, neither entity has a foreign key to the other. Instead, as mentioned previously, a many-to-many relationship involves an associative entity (which may not be displayed as an entity in the ERD); that entity has foreign key attributes referring to the entities on both sides of the relationship. (When an associative entity is included in place of a many-to-many relationship between two other entities in the ERD, the relationships between the associative entity and the other two aren’t many-to-many, but many-to-one.)
In an ERD, a foreign key attribute is denoted by the FK
indicator. Since an entity may have multiple foreign keys, the FK
indicator is numbered.
Let’s look again at the student-advisor relationship mentioned previously. This is a many-to-one relationship: multiple students may be assigned to any single advisor, and exactly one advisor is assigned to any given student. Thus, the foreign key would be in the many side of the relationship—that is, in the student
entity.
Note that in this ERD fragment, the relationship is shown as an optional-many-to-mandatory-one. This implies that any given student will always have exactly one advisor, but an advisor might spend some period of time without any students assigned. Also, notice that while it may feel natural to do so, it is not necessary to draw the relationship line from the foreign key to the referenced primary (or unique) key; it is only ncessary to draw it between the related entities.
As noted above, a primary key, unique key, or foreign key may be declared for a combination of attributes, rather than a single attribute; such a key is a composite key (in addition to being a primary key, unique key, or foreign key).
It’s fairly common for a composite key to be used as a unique key. However, composite keys are virtually never used as primary keys of strong entities; they are somewhat more commonly used as primary keys of weak entities, and much more commonly as primary keys of associative entities.
It’s possible to have a relationship between entities that is not supported or enforced by a foreign key. For example, this is sometimes the case when accessing a legacy database—especially one implemented in a lightweight RDBMS (e.g., outdated versions of SQLite or MySQL). However, when implementing a new data model, virtually all relationships should be enforced by foreign keys. ↩