Implementation Considerations

Relational Data Modeling

Data types, indices, and other aspects of a physical data model.

Page contents

Overview

In implementation, entities of the data model map to tables (aka relations) in a relational database, while attributes map to columns of those tables.

Assuming that one of our aims in data modeling is to implement the models we create, we must eventually turn our attention to implementation considerations—usually while the modeling process is still underway. Among others, these considerations include variations between different relational database managament systems (RDBMS) in compliance to SQL standards; supported data types, and limits on those data types; support for and treatment of NULL values; and explicit and implicit generation and use of indices. A conceptual ERD often omits many of these details (some employ an entirely different set of notational conventions from those described here), but as we move closer to implementation we fill them in.

Platforms

Structured Query Language (SQL) is the language most closely associated with the relational model first proposed by Edgar Codd (of IBM) in 1969. While there are a number of fundamental differences between the relational model and SQL, the two terms have become almost synonymous in common usage, and no currently available alternative to SQL has even close to the same level of platform support.

From the above, it might seem that deciding to use SQL should answer most of the implementation questions for a relational data model; unfortunately, this isn’t really the case. Since the first ISO standardization of SQL in 1987, the standard has been revised nine times, most recently in 2019. Most RDBMS implementations lag several years behind the standard, and even when any given RDBMS announces support for a given SQL standard, there are virtually always some features left unsupported.

Making matters worse, the SQL standard itself leaves significant areas of implementation behavior officially undefined, leaving these behaviors up to implementors to define as they see fit.

One practical consequence of the above is that it is nearly impossible to implement a relational data model in an entirely platform-independent manner. We’ll try to minimize the impact of this on our work by focusing on a small number of RDBMS platforms, summarized below.

Primary platforms

Our SQL coding exercises and shared project work will use one of these platforms. Neither of these is strictly compliant with the SQL standard, but both provide a good compromise of SQL feature support and a small footprint.

Secondary platforms

These platforms are addressed in some curriculum elements (primarily in syntax and feature comparisons), and may be used (subject to instructor approval) in student capstone projects.

Data types

When including data types in ERDs, we have a few natural answers to the question, “Which data types should we use?”

In the DDC Java Enterprise & Android Mobile bootcamp, our current practice is to use either Java types or platform-specific types. The table below shows the SQL standard, SQLite, and H2 datatypes corresponding to core Java scalar types.

Java Standard SQLite H2
boolean, Boolean BOOLEAN INTEGER BOOLEAN
byte, Byte SMALLINT INTEGER TINYINT
short, Short SMALLINT INTEGER SMALLINT
char, Character NCHAR(1) INTEGER, TEXT NCHAR(1)
int, Integer INTEGER INTEGER INTEGER
long, Long BIGINT INTEGER BIGINT
float, Float REAL REAL FLOAT
double, Double DOUBLE PRECISION REAL DOUBLE PRECISION
String NCHAR VARYING, NVARCHAR TEXT NCHAR VARYING, NVARCHAR
Date TIMESTAMP INTEGER, REAL, TEXT1 TIMESTAMP WITH TIME ZONE
LocalTime TIME INTEGER, REAL, TEXT1 TIME
LocalDate DATE INTEGER, REAL, TEXT1 DATE

Indices

Indices (indexes is an acceptable and widely used alternative form) are data structures permitting high-performance non-sequential access to a specified subset of the columns stored in the rows of a database table. In a relational database, indices are used to support performant enforcement of the constraints implicitly declared in primary keys, unique keys, and foreign keys, and to speed up data filtering, sorting, and retrieval.2

Formally, indices are not required by the relational model or by SQL, and indices can be created or dropped with no effect on the underlying data. However, for all but the smallest databases, indices are a practical necessity in implementation. Users don’t see indices—they just experience their effects.

Depending on the RDBMS, some indices are created automatically as a consequence of definining primary, unique, and foreign keys on a SQL table. Others are created explicitly in the SQL Data Definition Language (DDL) statements of the data model implementation. The table below summarizes the current automatic index creation behavior of SQLite and H2 for primary, unique, and foreign keys: A check mark (✔) indicates that an index is created automatically for the specified key, while a cross mark (❌) indicates that such an index must be created explicitly.

Key SQLite H2
Primary
Unique
Foreign

Like a key, an index is declared on an attribute/column (or a collection of attributes). In an ERD, indices are denoted by the IX indicator. Multiple indices may be created for a single entity/table, so IX is followed by a number. It’s fairly common to omit the IX indicators in an ERD for those indices created (automatically or otherwise) to enforce primary, unique, and foreign key constraints.

  1. SQLite has no native data type support for dates and times. Application code or the date and time functions of SQLite may be used to convert between these types and the INTEGER REAL, and TEXT types of SQLite. These are the most common representation approaches:

    • INTEGER for datetime values represented as milliseconds elapsed before or after 1 January, 1970, 00:00:00 UTC.
    • REAL for Julian day numbers—i.e., the number of days since the start (in GMT) of November 24, 4714 BCE (according to the proleptic Gregorian calendar), with the fractional part representing the time of day.
    • TEXT for ISO 8601 datetime strings, e.g. "2022-01-24T08:00:00-06:00".

     2 3

  2. Indices may speed up SELECT statements (more correctly, they may speed up JOIN and WHERE clauses—primarily on SELECT statements, but also on UPDATE, DELETE, and INSERT statements in implementations that support JOIN clauses on those statements), but they increase the time required for the persistence operations of INSERT and DELETE statements, as well as some UPDATE statements: when the data is modified, the associated indices must be updated as well.