Data types, indices, and other aspects of a physical data model.
In implementation, entities of the data model map to tables (aka relations) in a relational database, while attributes map to columns of those tables.
Assuming that one of our aims in data modeling is to implement the models we create, we must eventually turn our attention to implementation considerations—usually while the modeling process is still underway. Among others, these considerations include variations between different relational database managament systems (RDBMS) in compliance to SQL standards; supported data types, and limits on those data types; support for and treatment of NULL
values; and explicit and implicit generation and use of indices. A conceptual ERD often omits many of these details (some employ an entirely different set of notational conventions from those described here), but as we move closer to implementation we fill them in.
Structured Query Language (SQL) is the language most closely associated with the relational model first proposed by Edgar Codd (of IBM) in 1969. While there are a number of fundamental differences between the relational model and SQL, the two terms have become almost synonymous in common usage, and no currently available alternative to SQL has even close to the same level of platform support.
From the above, it might seem that deciding to use SQL should answer most of the implementation questions for a relational data model; unfortunately, this isn’t really the case. Since the first ISO standardization of SQL in 1987, the standard has been revised nine times, most recently in 2019. Most RDBMS implementations lag several years behind the standard, and even when any given RDBMS announces support for a given SQL standard, there are virtually always some features left unsupported.
Making matters worse, the SQL standard itself leaves significant areas of implementation behavior officially undefined, leaving these behaviors up to implementors to define as they see fit.
One practical consequence of the above is that it is nearly impossible to implement a relational data model in an entirely platform-independent manner. We’ll try to minimize the impact of this on our work by focusing on a small number of RDBMS platforms, summarized below.
Our SQL coding exercises and shared project work will use one of these platforms. Neither of these is strictly compliant with the SQL standard, but both provide a good compromise of SQL feature support and a small footprint.
SQLite
This is an embedded database, implemented as a library and accessed directly from client code. It’s included as a standard component of several operating systems, including Android, iOS, OS X, and others. It’s also accessible from within Chromium-based and Safari web browsers (though the Web SQL standardization effort, which initially incorporated SQLite, was abandoned in 2010).
There’s a JDBC driver for SQLite available; however, we use SQLite in Android apps via the Room ORM, which does not access SQLite via JDBC. (In any event, JDBC is not included in the portion of the JCL ported to Android.)
H2
H2 is an open-source database developed in Java, typically accessed via JDBC. It can be used in embedded or server mode, making it well suited to use in Spring Boot-based projects.
These platforms are addressed in some curriculum elements (primarily in syntax and feature comparisons), and may be used (subject to instructor approval) in student capstone projects.
Apache Derby
Like H2, Derby can be used in embedded or server mode, and integrates well with the Spring Boot ecosystem. As “Java DB”, Derby was a standard component of the Oracle distributions of JDK 6, 7, and 8, and is preconfigured for use in the Glassfish reference implementation of Eclipse Jakarta EE (formerly Oracle Java EE).
MySQL
This is the most widely used RDBMS in web applications, particularly in small- to medium-sized sites. It doesn’t run in embedded mode; thus, it cannot be packaged directly into a Spring Boot or Jakarta application, but such applications can access a MySQL server via JDBC. While MySQL–much like SQLite–began as a lightweight RDBMS, with minimal support for SQL standards, it has evolved to include more complete standards adherence and better scalability–particularly in the non-open-source Enterprise Server edition.
PostgreSQL
Postgres began as a re-implementation of the Ingres RDBMS, and was intended by its developers to be both an alternative and a successor to Ingres. Both platforms originally used non-SQL data definition/manipulation languages (QUEL and POSTQUEL)—but by the mid-‘90s, both had adopted SQL.
In contrast with MySQL, PostgreSQL has long had relatively complete support for SQL standards, as well as a number of powerful extensions in datatypes, SQL syntax, and scalability. In the past 15 years, deployment and configuration has been simplified, making the threshold for adoption roughly equivalent to MySQL.
When including data types in ERDs, we have a few natural answers to the question, “Which data types should we use?”
Java data types (or those of another general-purpose programming language)
When using an ORM, part of the responsibility of the ORM is to map the native types of the programming language to corresponding types in the underlying database. Thus, we might decide to use that language’s types in our ERDs. In many real-world situations, however, custom adapter code is required for mapping between types in the programming language and those in the RDBMS.
Standard SQL types
Data types included in some selected version of the SQL standard might be used. This is an especially viable option early in the timeline, when we might not have a definite implementation platform selected.
Platform-specific types
As noted above, it’s uncommon for an RDBMS to implement the latest SQL standard fully. This applies not just to functional features, but to data type support as well. So if we have an implementation platform selected, we might choose to use that platform’s data types rather than standard SQL types.
In the DDC Java Enterprise & Android Mobile bootcamp, our current practice is to use either Java types or platform-specific types. The table below shows the SQL standard, SQLite, and H2 datatypes corresponding to core Java scalar types.
Java | Standard | SQLite | H2 |
---|---|---|---|
boolean , Boolean |
BOOLEAN |
INTEGER |
BOOLEAN |
byte , Byte |
SMALLINT |
INTEGER |
TINYINT |
short , Short |
SMALLINT |
INTEGER |
SMALLINT |
char , Character |
NCHAR(1) |
INTEGER , TEXT |
NCHAR(1) |
int , Integer |
INTEGER |
INTEGER |
INTEGER |
long , Long |
BIGINT |
INTEGER |
BIGINT |
float , Float |
REAL |
REAL |
FLOAT |
double , Double |
DOUBLE PRECISION |
REAL |
DOUBLE PRECISION |
String |
NCHAR VARYING , NVARCHAR |
TEXT |
NCHAR VARYING , NVARCHAR |
Date |
TIMESTAMP |
INTEGER , REAL , TEXT 1 |
TIMESTAMP WITH TIME ZONE |
LocalTime |
TIME |
INTEGER , REAL , TEXT 1 |
TIME |
LocalDate |
DATE |
INTEGER , REAL , TEXT 1 |
DATE |
Indices (indexes is an acceptable and widely used alternative form) are data structures permitting high-performance non-sequential access to a specified subset of the columns stored in the rows of a database table. In a relational database, indices are used to support performant enforcement of the constraints implicitly declared in primary keys, unique keys, and foreign keys, and to speed up data filtering, sorting, and retrieval.2
Formally, indices are not required by the relational model or by SQL, and indices can be created or dropped with no effect on the underlying data. However, for all but the smallest databases, indices are a practical necessity in implementation. Users don’t see indices—they just experience their effects.
Depending on the RDBMS, some indices are created automatically as a consequence of definining primary, unique, and foreign keys on a SQL table. Others are created explicitly in the SQL Data Definition Language (DDL) statements of the data model implementation. The table below summarizes the current automatic index creation behavior of SQLite and H2 for primary, unique, and foreign keys: A check mark (✔) indicates that an index is created automatically for the specified key, while a cross mark (❌) indicates that such an index must be created explicitly.
Key | SQLite | H2 |
---|---|---|
Primary | ✔ | ✔ |
Unique | ✔ | ✔ |
Foreign | ❌ | ✔ |
Like a key, an index is declared on an attribute/column (or a collection of attributes). In an ERD, indices are denoted by the IX
indicator. Multiple indices may be created for a single entity/table, so IX
is followed by a number. It’s fairly common to omit the IX
indicators in an ERD for those indices created (automatically or otherwise) to enforce primary, unique, and foreign key constraints.
SQLite has no native data type support for dates and times. Application code or the date and time functions of SQLite may be used to convert between these types and the INTEGER
REAL
, and TEXT
types of SQLite. These are the most common representation approaches:
INTEGER
for datetime values represented as milliseconds elapsed before or after 1 January, 1970, 00:00:00 UTC.REAL
for Julian day numbers—i.e., the number of days since the start (in GMT) of November 24, 4714 BCE (according to the proleptic Gregorian calendar), with the fractional part representing the time of day.TEXT
for ISO 8601 datetime strings, e.g. "2022-01-24T08:00:00-06:00"
.Indices may speed up SELECT
statements (more correctly, they may speed up JOIN
and WHERE
clauses—primarily on SELECT
statements, but also on UPDATE
, DELETE
, and INSERT
statements in implementations that support JOIN
clauses on those statements), but they increase the time required for the persistence operations of INSERT
and DELETE
statements, as well as some UPDATE
statements: when the data is modified, the associated indices must be updated as well. ↩