Math Concepts: Structures

Sequences and sets.

Overview

Moving beyond primitive values, wrapper classes, and extended numeric classes—the scalar types—we encounter the basic computational structured types: arrays, lists, and sets. The first 2 correspond to the mathematical concepts of sequence and array (though the 2-dimensional array and 1-dimensional vector structures in mathematics are not specifically addressed here), while the 3rd corresponds (naturally) to a mathematical set.

Sequences

Definition

A sequence is an ordered, enumerable collection of values. While it may not be obviously the case, such a collection need not be finite in size—neither from a theoretical standpoint, nor a practical standpoint. Conversely, a sequence may also be empty—that is, its length can be 0.

The values in a sequence are generally homogeneous (all of the same type), and are not necessarily unique—that is, a given value may appear more than once in a single sequence. Each value in a sequence can be referenced by its index, or position. Typically (but not always), index values begin at either 0 (zero-based) or 1 (one-based). So, depending on the context, the finite sequence $A$ of length $n$ might be defined as

\[\begin{aligned} A &= \left( a_0, a_1, \ldots, a_{n - 1} \right), \end{aligned}\]

or

\[\begin{aligned} A &= \left( a_1, a_2, \ldots, a_n \right). \end{aligned}\]

In Java, finite sequences are most often implemented as arrays or lists. A Java array is of finite and fixed length, while a list (i.e. some object of a class that implements the List interface) is of finite, but not necessarily fixed, length. Arrays are structures supported directly by the Java language itself, while the List interface (as well as several implementations, including ArrayList and LinkedList) is defined in the Java standard library; thus, the syntax for using an array is a bit more direct than that for a list. Both, however, use zero-based indices.

int[] dataArray = {10, 5, 3, 2, 5};           // Declare & initialize.
System.out.println(dataArray[2]);             // 3
dataArray[3] = 20;                            // {10, 5, 3, 20, 5}.
System.out.println(dataArray[3]);             // 20

List<Integer> dataList = new LinkedList<>();  // Create empty List.
Collections.addAll(dataList, 10, 5, 3, 2, 5); // Populate the list.
dataList.set(3, 20);                          // {10, 5, 3, 20, 5}.
System.out.println(dataList.get(3));          // 20

Another key difference between arrays and lists is that an array can be declared to have elements of any type, whether primitive or object, while a list can only contain instances of object types. (Fortunately, as implied in Wrapper classes, we can add a primitive value to a list that is declared to contain the corresponding wrapper type; the compiler simply auto-boxes the primitive value into an object of the wrapper type.)

While the Java language doesn’t have explicit, built-in support for it, we can implement an infinite sequence in Java—for example, by writing our own implementations of the Iterable and Iterator interfaces. An example of such an implementation is beyond the scope of this document, but we’ll learn how to implement these interfaces in the bootcamp.

Length

The length of a sequence is simply the number of terms in that sequence. Of course, this is not necessarily the same as the number of distinct terms in the sequence, since a term may appear multiple times in the same sequence.

Implementations of the Java standard library interface Collection (which is extended by List) include the size method, which returns the number of items in that collection. Since that method is declared with a return type of int, and since int is used for element indices in List, this implies that all instances of Collection are limited to a maximum of $\left( 2^{31} - 1 \right)$, or Integer.MAX_VALUE elements.

If a sequence is implemented in Java code as an array, then the length property (not a method) of the array contains the number of elements in the sequence. As with the Collection.size() method, the length property is an int; thus, arrays (and sequences implemented with arrays) are limited to a maximum of $\left( 2^{31} - 1 \right)$, or Integer.MAX_VALUE elements.

Recurrence relations

Some sequences are defined (at least in part) according to a recurrence relation between the terms of the sequence. In this type of definition, the value of the $n^\text{th}$ term of a sequence is defined as a function of the preceding terms. Typically, this type of definition is used for infinite sequences.

Given the sequence $A$,

\[\begin{aligned} A &= \left( a_0, a_1, \ldots \right), \end{aligned}\]

a recurrence relation may be defined (in very general terms) as

\[\begin{aligned} a_n &= f\left( a_0, a_1, \ldots, a_{n-1} \right). \end{aligned}\]

In most cases, the function $f$ isn’t a function of all the preceding terms, but of a small number of terms immediately preceding $a_n$. Also, note that the recurrence relation usually doesn’t define $A$ completely; the definition generally includes one or more initial values, as well. For example, the well-known Fibonacci sequence may be defined as

\[\begin{aligned} F_0 &= 0 \\ F_1 &= 1 \\ F_n &= F_{n-1} + F_{n-2} \text{ , where } n > 1 \end{aligned}\]

Computation of terms in a sequence characterized by a recurrence relation are sometimes implemented (in Java and many other languages) via recursion. Recursive approaches are beyond the scope of this module; however, more information—and related coding exercises—can be found in the “Recursion” module.

Sum

We use the summation symbol, $\sum$, to denote the sum of the terms of a sequence, e.g.

\[\begin{aligned} \sum_{i = 1}^{n} a_i &= a_1 + a_2 + \ldots + a_n \end{aligned}\]

In Java, if a given finite sequence is implemented with an array, or with a List instance—or with any instance of a class that implements the Iterable interface—then we can sum the terms of the sequence in several ways; one is via an enhanced for loop:

int[] fib10 = {0, 1, 1, 2, 3, 5, 8, 13, 21, 34}; // 1st 10 Fibonacci terms.
int sum = 0;
for (int f : fib10) {
  sum += f;
}
System.out.println(sum);                         // 88

Another approach, available in Java 8 and later versions, is to leverage the stream framework, e.g.

int[] fib10 = {0, 1, 1, 2, 3, 5, 8, 13, 21, 34};
int sum = IntStream
    .of(fib10)
    .sum();
System.out.println(sum);                         // 88

Product

General case

The product of terms in a sequence is denoted by $\prod$:

\[\begin{aligned} \prod_{i = 1}^{n} a_i &= a_1 \cdot a_2 \cdot \ldots \cdot a_n \end{aligned}\]

Just as we can compute the sum of an array or Iterable instance with an enhanced for loop in Java, we can use the same mechanism to compute the product. (Depending on the type of values in the sequence, and the length of the sequence, we should watch out for possible overflow.)

int[] lookSay6 = {1, 11, 21, 1211, 111221, 312211}; // 1st 6 terms of look & say.
long product = 1;
for (int a : lookSay6) {
  product *= a;
}
System.out.println(product);                        // 9713843871995571

Again, one alternative approach (out of several) uses the streams framework:

int[] lookSay6 = {1, 11, 21, 1211, 111221, 312211};
long product = IntStream
    .of(lookSay6)
    .asLongStream()
    .reduce(1, (a, b) -> a * b);
System.out.println(product);                        // 9713843871995571

Factorial

The factorial of a non-negative integer $n$, denoted by $n!$, is a special case of a sequence product, where the sequence is the positive integers less than or equal to $n$:

\[\begin{aligned} n! &= \prod_{i = 1}^{n} i \end{aligned}\]

Following the empty product convention, the product of zero terms is defined to be equal to 1; therefore, $0! = 1$.

Of course, we can use iteration to compute a factorial in Java, just as we can for the general case of a sequence product; however, rather than an enhanced for, we would typically use a basic or traditional for. (There are also other techniques for computing $n!$, such as Stirling’s approximation, but a discussion of these is beyond the scope of this document.) For example, we might compute and display $20!$ (the largest factorial that fits in the range of a long) with the following code.

long product = 1;
for (int i = 1; i <= 20; i++) {
  product *= i;
}
System.out.println(product); // 2432902008176640000

The above is compact enough that we wouldn’t gain much economy of expression by using the streams framework to compute a factorial. Nonetheless, we can use streams if we want to. (Note that the example below uses mapToObj to convert an IntStream to a Stream<BigInteger>; using the BigInteger type lets us compute $21!$ or higher—much higher, in fact—without overflow.)

BigInteger product = IntStream
    .rangeClosed(1, 21)      // (1, ..., 21)
    .mapToObj(BigInteger::valueOf)
    .reduce(BigInteger.ONE, BigInteger::multiply);
System.out.println(product); // 51090942171709440000

Sets

Definition

In contrast to a sequence, a set is an unordered collection of objects—i.e. an element of a set cannot generally be referenced by index or position, and when iterating over the contents of a set, the order of iteration may not be known in advance. Another important difference is that while a given object may appear multiple times in some sequence, this is not the case, in any meaningful way, for a set: an object is simply either an element of the set, or it is not. Adding an object to the same set multiple times has no effect after the first addition.

As with a sequence, the elements of a set are generally homogeneous. A set may also be finite (i.e. it contains a finite number of elements) or infinite.

The Java standard library includes the Set interface (extending the Collection interface), as well as a number of classes implementing that interface (e.g. HashSet, EnumSet, TreeSet), to provide set functionality. None of these classes implement infinite sets, but infinite sets can be implemented (with some constraints) by writing our own implementation of Set.

While most of set theory is far beyond the scope of this document, some important terms and notation conventions are defined below.

Cardinality

For a finite site, the cardinality (size) of the set is simply the number of elements in the set. (For an infinite set, things get a bit more interesting.)

The cardinality of a Java set that implements the Set interface can be obtained from the size() method. However, somewhat confusingly, in a Java BitSet (a class for use in maintaining a set of non-negative int values), the correct method to use is cardinality().

The empty set

The empty set, denoted by $\emptyset$, $\varnothing$, or ${}$, is the set containing no elements. Formally, there is a single empty set; in a mathematical context, we usually say “the empty set”, rather than “an empty set”.

In Java, the Collections.EMPTY_SET constant refers to an immutable empty set, and we can test any instance of a Set implementation with the isEmpty method, to determine whether that set is empty. We can also use the size method (declared by the Collection interface) to get the number of elements in a set (the cardinality of the set); the cardinality of the empty set is 0.

Set set1 = Collections.EMPTY_SET;      // Immutable empty set.
System.out.println(set1.size());       // 0
System.out.println(set1.isEmpty());    // true
Set<Integer> set2 = new HashSet<>();   // HashSet implements set.
System.out.println(set2.size());       // 0
System.out.println(set2.isEmpty());    // true
System.out.println(set1.equals(set2)); // true

Membership

The central relationship constituting a set is that between the set and its members. An object is either an element of (in) a set or it is not, and in general, a set can be completely characterized by its elements. We can assert that a object is an element of a set with the $\in$ symbol, or assert that an object is not an element of a set with $\notin$:

\[\begin{aligned} 1 & \in \mathbb{N} \\ -1 & \notin \mathbb{N} \end{aligned}\]

The contains method is used to test an object for membership in a Java set:

Set<Integer> set = Set.of(1, 2, 3);  // Immutable set {1, 2, 3}.
System.out.println(set.contains(2)); // true
System.out.println(set.contains(4)); // false

Equality

Because sets are unordered, the only thing that matters, when comparing sets for equality, is that both sets contain the same elements. In other words, 2 sets are equal if and only if every element of the first is also an element of the second, and vice versa. We might write this as

\[\begin{aligned} S = T & \iff (v \in S, \forall v \in T) \land (v \in T, \forall v \in S) \end{aligned}\]

Here, $\forall$ means “for all”, and $\iff$ means “if and only if”. Thus, we have “$S$ equals $T$ if and only if $v$ is an element of $S$, for all $v$ in $T$, and $v$ is an element of $T$, for all $v$ in $S$.

Java makes comparing sets for equality quite easy: All implementations of Set in the standard library implement the equals method appropriately for set comparison:

Set<Integer> set1 = Set.of(1, 2, 3);
Set<Integer> set2 = Set.of(3, 1, 2);
System.out.println(set1.equals(set2)); // true
Set<Integer> set3 = Set.of(1, 2, 3, 4);
System.out.println(set1.equals(set3)); // false

Subsets & supersets

One set is a subset of another if every element of the first is also an element of the second; this relationship is denoted by $\subseteq$. If there is at least one element of the second that is not an element of the first (i.e. the 2 sets aren’t equal), then the first is a proper subset (sometimes called a strict subset), indicated by the $\subset$ symbol. Mathematically, we could define these relationships as

\[\begin{aligned} S \subseteq T & \iff v \in T, \forall v \in S \\ S \subset T & \iff \left( v \in T, \forall v \in S \right) \land \left( S \neq T \right) \end{aligned}\]

Note that by this definition, 2 sets that are equal are also subsets and supersets of each other, but not proper subsets or supersets of each other. Also, note that since the empty set has no elements, it trivially satisfies the conditions for being a subset of all sets, and for being a proper subset of all sets other than itself.

We can easily define superset (and proper superset) in terms of the subset relationship: one set is a superset (or proper supserset) of another if and only if the second is a subset (or proper subset) of the first. The symbols used here are those used for the subset relationship, with the direction reversed:

\[\begin{aligned} S \supseteq T & \iff T \subseteq S \\ S \supset T & \iff T \subset S \end{aligned}\]

The Collection interface of the Java standard library declares the containsAll method, which—when invoked on a set—tests for subset/superset relationships; combining that with the equals method and the logical negation operator ! lets us test for proper subset/superset relationships:

Set<Integer> set1 = Set.of(1, 2, 3);
Set<Integer> set2 = Set.of(3, 1, 2);
System.out.println(set1.containsAll(set2)); // true
System.out.println(set1.containsAll(set2) 
    && !set1.equals(set2));                 // false (not a proper superset)
Set<Integer> set3 = Set.of(1, 2, 3, 4);
System.out.println(set3.containsAll(set1)); // true
System.out.println(set3.containsAll(set1) 
    && !set3.equals(set1));                 // true (a proper superset)

Set-builder notation

Set-builder notation is a set of conventions commonly used to describe a set. Since this notation is quite convenient when defining many set theory concepts and terms, we’ll illustrate some aspects of it before going further.

Java doesn’t support set-builder notation directly; however, that’s not the only way to express the underlying concepts, and there are features in the Java standard library for defining a set (or adding elements to a set) by enumeration, and for filtering sets via predicates (or as shown below, the logical inverse of a predicate—a condition under which elements are removed from a set). Thus, it’s usually fairly straightforward to work from a definition in set-builder notation to a Java implementation.

For example, we can use set-builder notation to describe the set of odd natural numbers under 20:

\[\left\{ x \in \mathbb{N} \mid \left( x < 20 \right) \land \left( x \bmod 2 \neq 0 \right) \right\}\]

Then, we can create this set using iteration and the removeIf method in Java code:

Set<Integer> set = new HashSet<>();
for (int i = 1; i < 20; i++) { // Add 1 through 19 to the set.
  set.add(i);                                        
}
set.removeIf(x -> x % 2 == 0); // Remove even values.
System.out.println(set);       // [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

Union

The union of 2 sets, denoted by $\cup$, contains only those objects that are elements of either set (or both sets). We can express this using set-builder notation (with the symbol $\lor$, meaning “or”) as

\[\begin{aligned} S \cup T &= \left\{ v \mid \left( v \in S \right) \lor \left( v \in T \right) \right\} \end{aligned}\]

In Java, we construct a set as the union of 2 other sets by initializing the union set with the elements of the first of other 2, and then adding to it the elements of the second via the addAll method; since an object cannot be included multiple times in the same set, duplicated values are ignored.

Set<Integer> set1 = Set.of(1, 2, 3);
Set<Integer> set2 = Set.of(1, 3, 4, 5);
Set<Integer> union = new HashSet<>(set1); // Initialize from set1.
union.addAll(set2);                       // Add set2 elements.
System.out.println(union);                // [1, 2, 3, 4, 5]

Intersection

The intersection of 2 sets, denoted by $\cap$, contains only those objects that are elements of both sets. We can express this as

\[\begin{aligned} S \cap T &= \left\{ v \mid \left( v \in S \right) \land \left( v \in T \right) \right\} \end{aligned}\]

In Java, we can construct a set as the intersection of 2 other sets by initializing the intersection set with the elements of the first of the other 2, then using the retainAll method to keep only those elements that are also elements of the second.

Set<Integer> set1 = Set.of(1, 2, 3);
Set<Integer> set2 = Set.of(1, 3, 4, 5);
Set<Integer> intersection = new HashSet<>(set1); // Initialize from set1.
intersection.retainAll(set2);                    // Retain those also in set2.
System.out.println(intersection);                // [1, 3]

Difference (relative complement)

The difference between 2 sets is the set containing only those elements of the first that are not in the second. This is also called the relative complement of the second set in the first (i.e. the intersection of the first set and the complement of the second set), and is denoted by $\setminus$.

\[\begin{aligned} S \setminus T &= \left\{ v \in S \mid v \notin T \right\} \end{aligned}\]

In Java, we can construct a set as the difference of 2 other sets by initializing the intersection set with the elements of the first of the other 2, then using removeAll to remove those elements that are also elements of the second.

Set<Integer> set1 = Set.of(1, 2, 3);
Set<Integer> set2 = Set.of(1, 3, 4, 5);
Set<Integer> diff = new HashSet<>(set1); // Initialize from set1.
diff.removeAll(set2);                    // Remove set2 elements.
System.out.println(diff);                // [2]

Symmetric difference

The symmetric difference between 2 sets is the set containing only those objects that are elements of either—but not both—of the sets. It is denoted by the symbol $\triangle$, and can be expressed in terms of union of the differences of the 2 sets:

\[\begin{aligned} S \triangle T &= \left( S \setminus T \right) \cup \left( T \setminus S \right) \end{aligned}\]

It can also be expressed as the difference between the union and intersection of the 2 sets:

\[\begin{aligned} S \triangle T &= \left( S \cup T \right) \setminus \left( S \cap T \right) \end{aligned}\]

We can use either of the above expressions as a guide to a Java implementation. Here, we implement the second:

Set<Integer> set1 = Set.of(1, 2, 3);
Set<Integer> set2 = Set.of(1, 3, 4, 5);
Set<Integer> union = new HashSet<>(set1);        // Initialize from set1.
union.addAll(set2);                              // Add set2 elements.
Set<Integer> intersection = new HashSet<>(set1); // Initialize from set1.
intersection.retainAll(set2);                    // Retain those also in set2.
Set<Integer> diff = new HashSet<>(union);        // Initialize from union.
diff.removeAll(intersection);                    // Remove intersection elements.
System.out.println(diff);                        // [2, 4, 5]