Implementation Considerations

Relational Data Modeling

Data types, indices, and other aspects of a physical data model.

Page contents

Overview

In implementation, entities of the data model map to tables (aka relations) in a relational database, while attributes map to columns of those tables.

Assuming that one of our aims in data modeling is to implement the models we create, we must eventually turn our attention to implementation considerations—usually while the modeling process is still underway. Among others, these considerations include variations between different relational database managament systems (RDBMS) in compliance to SQL standards; supported data types, and limits on those data types; support for and treatment of NULL values; and explicit and implicit generation and use of indices. A conceptual ERD often omits many of these details (some employ an entirely different set of notational conventions from those described here), but as we move closer to implementation we fill them in.

Platforms

Structured Query Language (SQL) is the language most closely associated with the relational model first proposed by Edgar Codd (of IBM) in 1969. While there are a number of fundamental differences between the relational model and SQL, the two terms have become almost synonymous in common usage, and no currently available alternative to SQL has even close to the same level of platform support.

From the above, it might seem that deciding to use SQL should answer most of the implementation questions for a relational data model; unfortunately, this isn’t really the case. Since the first ISO standardization of SQL in 1987, the standard has been revised nine times, with the most recent edition finalized in 2023. Most RDBMS implementations lag several years behind the standard, and even when any given RDBMS announces support for a given SQL standard, there are virtually always some features left unsupported.

Making matters worse, the SQL standard itself leaves significant areas of implementation behavior officially undefined, leaving these behaviors up to implementors to define as they see fit.

One practical consequence of the above is that it is nearly impossible to implement a relational data model in an entirely platform-independent manner. We’ll try to minimize the impact of this on our work by focusing on a small number of RDBMS platforms, summarized below.

Primary platforms

At Deep Dive, our SQL coding exercises and projects typically use one of these platforms, listed from least SQL-standards-compliant to most.

SQLite

This is an embedded database, implemented as a library and accessed directly from client code. It’s included as a standard component of several operating systems, including Android, iOS, macOS, and others. It’s also accessible from within Chromium-based and Safari web browsers (though the Web SQL standardization effort, which initially incorporated SQLite, was abandoned in 2010).

There’s a JDBC driver for SQLite available; however, Android applications will generally access SQLite via the Room ORM (or directly via a low-level, SQLite-specific library), which does not rely on JDBC. (In any event, JDBC is not included in the portion of the JCL ported to Android.)

MySQL

This is the most widely used RDBMS in existing web applications, particularly on small- to medium-sized sites. It doesn’t run in embedded mode; thus, it cannot be packaged directly into a Spring Boot or Jakarta application, but such applications can access a MySQL server via JDBC. While MySQL—much like SQLite—began as a lightweight RDBMS, with minimal support for SQL standards, it has evolved to include more complete standards adherence and better scalability—particularly in the non-open-source Enterprise Server edition.

H2

H2 is an open-source database developed in Java, typically accessed via JDBC. It provides very good standards conformance, as well as compatibility modes that let it emulate (to some extent) MySQL, PostgreSQL, or SQL Server; this makes H2 attractive for use as a lightweight development database with fairly straightforward migration to any of those supported by compatibility modes.

H2 can also be used in server or embedded mode, making it well-suited for use in Spring Boot-based projects.

PostgreSQL

Postgres began as a re-implementation of the Ingres RDBMS, and was intended by its developers to be both an alternative and a successor to Ingres. Both platforms originally used non-SQL data definition/manipulation languages (QUEL and POSTQUEL)—but by the mid-‘90s, both had adopted SQL.

In contrast with MySQL, PostgreSQL has long had nearly complete support for SQL standards, as well as a number of powerful extensions in datatypes, SQL syntax, and scalability. In the past 15 years, deployment and configuration has been simplified, making the threshold for adoption roughly equivalent to MySQL; this is partly responsible for PostgreSQL now being used more widely than MySQL for new projects.

Data types

When including data types in ERDs, we have a few natural answers to the question, “Which data types should we use?”

The table below shows the SQL standard, SQLite, H2, MySQL, and PostgreSQL datatypes corresponding to several core Java scalar types. (In the Java Enterprise & Android Mobile bootcamp, our practice is to use either Java types or platform-specific types.)

Java SQL standard SQLite MySQL H2 PostgreSQL
boolean, Boolean BOOLEAN INTEGER TINYINT(1), BOOL BOOLEAN, TINYINT BOOLEAN, BOOL
byte, Byte SMALLINT INTEGER TINYINT TINYINT SMALLINT
short, Short SMALLINT INTEGER SMALLINT SMALLINT SMALLINT
char, Character NCHAR(1) INTEGER, TEXT NCHAR(1) CHARACTER, CHAR, NATIONAL CHARACTER, NATIONAL CHAR, NCHAR CHARACTER(1), CHAR(1)
int, Integer INTEGER INTEGER INTEGER, INT INT INTEGER
long, Long. BIGINT INTEGER BIGINT BIGINT BIGINT
float, Float REAL REAL FLOAT REAL, FLOAT(24) REAL
double, Double DOUBLE PRECISION REAL DOUBLE DOUBLE PRECISION, FLOAT(53) DOUBLE PRECISION
BigInteger NUMERIC(p), DECIMAL(p), NUMERIC(p, 0), DECIMAL(p, 0) TEXT, BLOB1 NUMERIC(p), DECIMAL(p), NUMERIC(p, 0), DECIMAL(p, 0) NUMERIC(p), DECIMAL(p), NUMERIC(p, 0), DECIMAL(p, 0) NUMERIC(p), DECIMAL(p), NUMERIC(p, 0), DECIMAL(p, 0)
BigDecimal NUMERIC(p, s), DECIMAL(p, s) TEXT, BLOB1 NUMERIC(p, s), DECIMAL(p, s) NUMERIC(p, s), DECIMAL(p, s) NUMERIC(p, s), DECIMAL(p, s)
String NATIONAL CHARACTER VARYING, NVARCHAR TEXT VARCHAR, TEXT CHARACTER VARYING, CHAR VARYING, VARCHAR, NATIONAL CHARACTER VARYING, NATIONAL CHAR VARYING, NCHAR CHARACTER VARYING, VARCHAR
Date, Instant TIMESTAMP WITH TIME ZONE INTEGER, REAL, TEXT2 TIMESTAMP TIMESTAMP WITH TIME ZONE TIMESTAMP WITH TIME ZONE
LocalTime TIME INTEGER, REAL, TEXT2 TIME TIME TIME, TIME WITHOUT TIME ZONE
LocalDate DATE INTEGER, REAL, TEXT2 DATE DATE DATE

Indices

Indices (indexes is an acceptable and widely used alternative form) are data structures permitting high-performance non-sequential access to a specified subset of the columns stored in the rows of a database table. In a relational database, indices are used to support performant enforcement of the constraints implicitly declared in primary keys, unique keys, and foreign keys, and to speed up data filtering, sorting, and retrieval.3

Formally, indices are not required by the relational model or by SQL, and indices can be created or dropped with no effect on the underlying data. However, for all but the smallest databases, indices are a practical necessity in implementation. Users don’t see indices—they just experience their effects.

Depending on the RDBMS, some indices are created automatically as a consequence of definining primary, unique, and foreign keys on a SQL table. Others are created explicitly in the SQL Data Definition Language (DDL) statements of the data model implementation. The table below summarizes the current automatic index creation behavior of SQLite and H2 for primary, unique, and foreign keys: A check mark (✔) indicates that an index is created automatically for the specified key, while a cross mark (❌) indicates that such an index must be created explicitly.

Key SQLite H2
Primary
Unique
Foreign

Like a key, an index is declared on an attribute/column (or a collection of attributes). In an ERD, indices are denoted by the IX indicator. Multiple indices may be created for a single entity/table, so IX is followed by a number. It’s fairly common to omit the IX indicators in an ERD for those indices created (automatically or otherwise) to enforce primary, unique, and foreign key constraints.

  1. SQLite has no native data type support for arbitrary-size integers, or for integers of more than 64 bits. These are the most common representation approaches:

    • TEXT for holding the textual representation of the integer value.
    • BLOB to hold the integer value as a sequence of bytes.

     2

  2. SQLite has no native data type support for dates and times. Application code or the date and time functions of SQLite may be used to convert between these types and the INTEGER REAL, and TEXT types of SQLite. These are the most common representation approaches:

    • INTEGER for datetime values represented as milliseconds elapsed before or after 1 January, 1970, 00:00:00 UTC.
    • REAL for Julian day numbers—i.e., the number of days since the start (in GMT) of November 24, 4714 BCE (according to the proleptic Gregorian calendar), with the fractional part representing the time of day.
    • TEXT for ISO 8601 datetime strings, e.g., "2022-01-24T08:00:00-06:00".

     2 3

  3. Indices may speed up SELECT statements (more correctly, they may speed up JOIN and WHERE clauses—primarily on SELECT statements, but also on UPDATE, DELETE, and INSERT statements in implementations that support JOIN clauses on those statements), but they increase the time required for the persistence operations of INSERT and DELETE statements, as well as some UPDATE statements: when the data is modified, the associated indices must be updated as well.