Basic concepts and diagramming conventions.
In this module, we introduce the entity, attribute, relationship, primary key, foreign key, and related concepts, all critical in understanding and creating relational data models—that is, models of data elements that are to be written to and read from a relational database.1 We also explore the entity-relationship diagram (ERD), a widely used visual representation for relational data models, along with a specific set of notational conventions, crow’s foot notation, used in building ERDs.
An entity is a set of similar objects, analogous to a set containing all instances of a given Java class. And just as we use a class to describe the structure of a type of object, an entity type serves that role for an entity. Like the fields of a class, an entity type may (and usually does) include attributes.
In an ERD, an entity is represented by a rectangle (sometimes with rounded corners) or table, with the entity name at the top. By convention, the name is singular (e.g. entity
) rather than plural (entities
). Following our style guide for SQL (which applies to entity-relationship diagrams as well as SQL code), we write this name in lower_snake_case
.2
Many ERDs include explanatory text with each entity, further describing that entity’s role in the data model.
This is an entity which is not defined in terms of other entities. Every instance of such an entity has an attribute that uniquely identifies it, distinguishing it from every other instance of the same entity type; the entity type includes this identifier (a _primary key, _described further in “Primary keys”) as an attribute.
Consider the example of a classroom management system used by a college. In the data model for this system, student
and course
are strong entities: a student may register prior to (and after) enrolling in any courses, and courses may be scheduled without any students being enrolled. Further, each student has a student ID number, uniquely identifying them; every course has a course ID, similarly identifying it.
A weak entity is one whose instances depend on—and are related to—instances of one or more strong entities, often called parent entities. An instance of the weak (child) entity cannot exist except in the context of the strong (parent) entity instance; thus, a child entity instance may be uniquely identified by its reference to the parent, combined with additional information (as needed) to distinguish between other child entity instances that are related to the same parent entity instance. These references and other properties are included as attributes in the weak entity type definition.
Note that this parent-child/strong-weak relationship may be hierarchical, with a weak entity having other (even weaker) entities dependent upon it. Also, while formally a weak entity does not require its own primary key, it usually includes one in practice.
Continuing the example given above, we might have a grade
entity, which can be seen as a weak/child entity in relation to student
, since a grade can only be created in the context of the student receiving the grade. The attributes of the grade
entity type would necessarily include a key that identities the student
instance to which a grade
instance is related. (These reference attributes are discussed in “Foreign keys”.)
On the other hand, assume we have an advisor
entity, and consider the relationship between advisor
and student
: Each instance of advisor
might be associated with multiple instances of student
, and each instance of student
could be associated with no more than one advisor
instance; however, student
is not really a child/weak entity with respect to advisor
, since a student should be able to register without having to do so in the context of the student-advisor relationship.
An associative entity is a set made up of pairwise relationships between members of two entity sets, and whose primary purpose is to represent those relationships. This is most useful when any single member of the first entity set might be associated with multiple members of the second set, and any single member of the second might be associated with multiple members of the first. (Contrast this with the relationship between grade
and student
, and the relationship between student
and advisor
, both described above.)
In the classroom management system, a good example of an associative entity might be enrollment
: Every member of that set relates a student
to a course
, indicating that the specified student is enrolled in the given course. There could be multiple enrollment
instances associated with a single student instance
, each associated with a different instance of course
(i.e. a student can be enrolled in multiple courses); there could also be multiple enrollment
instances associated with a single course
instance, each associated with a different student
instance (multiple students can be enrolled in a course).
Very basic associative entities—e.g. those with no additional attributes of their own, other than the references to the pair of related entities—are sometimes omitted from the ERD altogether, and are represented instead as a many-to-many relationship. (This is especially common in conceptual ERDs.) An example of a many-to-many relationship is shown below.
An attribute is a property of an entity. Attributes include keys that can be used to distinguish between (i.e. uniquely identity) instances of that entity, keys that identify related entity instances, and other properties representing important details of an entity instance.
In our SQL style guide, attribute names—like entity names—are written in lower_snake_case
in the ERD.2 Depending on the level of detail the ERD is intended to show, each attribute name may be followed by (either in another column or separated from the entity by the colon character) the type of that attribute.
One of the attributes of all strong entities (and in practice many weak entities) is a primary key. This is the attribute (or combination of attributes) that uniquely identifies an instance. In an ERD, the primary key attribute is often shown in bold type, accompanied by an asterisk or PK
indicator.
Note that some or all attributes may be omitted from a conceptual ERD, which focuses primarily on the entities and relationships.
As described above, entities may be related to other entities. In an ERD, these relationships are indicated by relationship lines between pairs of entities. Relationship lines include markers at each end, indicating the cardinality and modality of the relationship for each of the two entities. These may be indicated in different ways, depending on the notational conventions followed; we will be using crow’s foot notation.
Relationship lines are often accompanied by text annotations that further describe the nature of the relationships. When not using crow’s foot notation, these annotations serve the same purpose as the symbols described below, and consist of simple verb phrases near each end of the relationship line. Even when using crow’s foot notation, including explanatory text on relationship lines is encouraged in many style guides (including DDC’s).
In crow’s foot notation, the marker at each end of a relationship line includes a cardinality indicator, which indicates the maximum number of instances of that entity that may be related to one instance of the entity at the opposite end of the relationship line. This is not a quantitative indicator, but a qualitative one: the maximum is indicated as either one or many. (Specific values greater than 1 are treated as details handled by the business logic of a system, not by its data model.)
A cardinality of one is represented by a single short line segment, drawn perpendicular to the relationship line.
A cardinality of many is represented by two short line segments, drawn from the relationship line and connecting to the entity. The combination of three line segments touching the border of the entity makes up the “crow’s foot” that gives this notational convention its name.
The marker at each end of a relationship line also includes an indicator of modality, drawn slightly further than the cardinality indicator from the entity. This represents the minimum number of instances of that entity that may be related to an instance of the entity at the opposite end of the relationship line. Once again, this indicator is qualitative, having only zero or one as possibilities. (Specific values greater than 1 are again treated as details handled by the business logic of a system, not by its data model.)
Modality is sometimes called optionality, where a modality of zero means optional and a modality of one means mandatory. This optional vs. mandatory distinction also gives us an intuitive way to refer to the combinations of cardinality and modality.
A modality of zero (i.e. optional) is indicated by a small white circle on the relationship line.
Like the cardinality indicator for one, a modality of one (i.e. mandatory) is indicated by a single short line segment, drawn perpendicular to the relationship line.
In most ERDs (and in all ERDs following the DDC style guide), markers on relationship lines include both the cardinality indicator and the modality indicator, with the cardinality indicator located closer to the entity. Since each indicator has two possible forms, there are $2 \cdot 2 = 4$ possible combinations for each marker:
A cardinality of one with a modality of zero gives a combination of zero or one, or optional one.
A cardinality of one with a modality of one gives a combination of one and only one, or mandatory one.
A cardinality of many with a modality of zero gives a combination of zero or many, or optional many.
A cardinality of many with a modality of one gives a combination of one or many, or mandatory many.
Since each of the two markers on a relationship line can have one of the four combinations listed above, there are a total of $4 \cdot 4 = 16$ different types of relationships between pairs of entities. Some of these don’t occur very often in practice, or might be better included in a data model in other ways. For instance, a mandatory-one-to-mandatory-one relationship may be better expressed by combining the two entities in question into a single entity.
Notwithstanding the above, there are still several distinct and useful types of relationships between entities. Shown below are just a few examples. (Note that attribute names are not shown, since the focus is on the entities and relationships.)
Continuing with the classroom management example, assume that every course may have one teaching assistant assigned, but some courses have no teaching assistants. Further, assume that any teaching assistant is assigned to exactly one course. The relationship between the course
and assistant
data model entities is thus mandatory-one-to-optional-one: One (and only one) course is assigned to each assistant, and zero or one assistant is assigned to each course.
In the same example system, assume that the data model includes an advisor
entity. Each advisor typically has an advisory load of multiple students, but it’s possible that an advisor will have no assigned students for some period of time. However, every student is assigned to one advisor immediately on registration. Therefore, the relationship between advisor
and student
is mandatory-one-to-optional-many: One (and only one) advisor is assigned to each student, and zero or more students are assigned to each advisor.
As mentioned above, enrollment
may be modeled as an associative entity that has a relationship with both class
and student
. However, we might (particularly in a conceptual ERD) see enrollment modeled as an optional-many-to-optional-many relationship between class
and student
: Any given student may be enrolled in zero or more courses, and any given course may have zero or more students enrolled.
Similar to the way that attributes are sometimes omitted when the data modeler wants to focus our attention just on the entities and relationships, the modality portions are sometimes omitted for brevity in an informal description of the relationship between two entities; thus, we might refer to the above example relationships as one-to-one, one-to-many, and many-to-many, respectively.