How are Java arrays allocated, and the elements given initial values?
Like any other Java object, space for an array is allocated on the heap, and an initialization process sets up the initial state of the array. However, the constructor for an array is not invoked in the same way as it is for other objects, and the initialization process can be a bit confusing.
Once again, the Java array initialization will look familiar to those who have programmed in C—but there are some important differences.
In understanding memory allocation for an array, it helps to remember that an array is, above all, an object. The memory for all objects is allocated on the heap, usually via the new
keyword. For an array, we follow new
with the element type, then square brackets; in the square brackets, we specify the length of the array (or the length of the corresponding dimension in a multidimensional array).
(We can omit the array lengths if we specify initial values, as described in “Initial values”.)
For example, to allocate space for an array of 10 int
elements, we would write
new int[10]
This is an expression that returns a reference to the allocated array on the heap. However, the memory allocation isn’t very helpful by itself: we need to consume the reference returned, usually by assigning it to a variable. So if we declared durations
as
int[] durations;
we could then allocate space and assign the reference to durations
with
durations = new int[10];
Of course, like other Java fields and variables, we can combine these two, using declaration with assignment:
int[] durations = new int[10];
Because a multidimensional array is actually an array of arrays, we can allocate space for the outer array alone (e.g. allocate space for an array of row references in a two-dimensional array), or allocate space for the outer array and one or more inner array dimensions as well. For example, to allocate the space for the two-dimensional data
array shown in “Introduction: Multidimensional arrays”, and assuming the elements are of the int
type, we would write
int[][] data = new int[3][4];
This declares data
as a two-dimensional array of int
(i.e. an array of int[]
arrays); allocates an array of length 3, where each element is a reference to an int[]
of length 4; and assigns the reference returned by new
to data
.
However, if at the time of allocation of the outer array, we don’t know what the length of the inner arrays will be—or if the inner arrays will be of different lengths—we would simply allocate the outer array first, without allocating space for the inner arrays. For example, we could write
int[][] data = new int[3][];
Note that we must still match dimensions: If (for example) we’re assigning a reference to a two-dimensional array-valued field or variable, then there must be 2 sets of brackets in the allocation expression on the right-hand side. Also, the length for at least one dimension must be specified, and any unspecified dimension lengths (empty brackets) must follow all of the specified dimension lengths.
After allocating the outer array, we can allocate the inner arrays, assigning each value returned by new
to the corresponding element of the outer array, e.g.:
data[0] = new int[4];
When allocating space for the inner arrays, we can (if appropriate to the application) allocate arrays of different lengths. For example, consider this code:
int[][] data = new int[3][];
data[0] = new int[3];
data[1] = new int[2];
data[2] = new int[4];
We now have this jagged structure:
In Java, it is allowed (and sometimes appropriate) to allocate an array of length zero (0
). This is a significant difference from C, where an array is simply a contiguous block of memory, and a zero-length array isn’t of much use.1 In Java, a zero-length array is still an object, and can be quite useful.
length
fieldEvery array has a final
field called length
. The value of this field is set (when the array is allocated) to the number of elements in the array. This is one of the contexts in which the fact that a multidimensional array is actually an array of arrays is relevant: the length
of such an array is the number of elements in the outer array, not the total number of elements in all of the inner arrays. In the two examples above, data.length
has a value of 3
.
If we attempt to allocate an array (simple, outer, or inner) with a negative length, java.lang.NegativeArraySizeException
is thrown.
When space for an array is allocated as described above, all of the elements are automatically filled with a default value for the declared element type:
char
) will be filled with the value 0
.boolean
primitives will be filled with the value false
.String
) will be filled with the value null
.This is true even of arrays declared and allocated as local variables in methods. (This is another important difference from the behavior of C/C++.)
This default behavior is often precisely what is needed; however, there are also many cases where we want to assign other values to array elements immediately upon or after allocation.
Of course, we can assign values to individual elements of an array (as seen in “Introduction: Accessing elements”). However, it’s often much more useful to assign values to all the elements of an array as part of the allocation statement. This functionality is supported through array initializer expressions.
An array initializer expression is simply a brace-enclosed list of array values. In an array allocation expression, it follows immediately after the square brackets which would otherwise contain the array length(s). However, when an array initializer is used, no lengths are specified in the brackets; instead, the compiler gets the array length from the array initializer. For example, assume we declare weights
with this statement:
int[] weights;
The following statement will allocate space for 5 int
elements, assign the values specified to those elements, and assign the resulting reference to weights
:
weights = new int[]{7, 3, 2, 5, 8};
With this syntax, specifying an array length in the square brackets will cause a compilation failure.
If we use declaration-with-assignment, we can choose to omit the allocation part of the statement; if we do, the allocation will be inferred by the compiler. Thus, the following two examples are equivalent:
int[] weights = new int[]{7, 3, 2, 5, 8};
int[] weights = {7, 3, 2, 5, 8};
Array initializers can also be used to assign values to the elements of multidimensional arrays—including jagged arrays. Once again, it’s a good idea to remember that this is actually an assignment to an array of arrays.
Take this declaration-with-assignment statement, for example:
int[][] data = {
{10, 3, 7},
{12, 6},
{2, 0, 5, 6}
};
After the above statement executes, data
has the following structure and content:
Simply assigning an existing array to another array-valued variable doesn’t create a copy of the first. Instead, since all Java objects are accessed by reference, it simply assigns one array reference to a second one; there are now two variables referring to the same array, and changes to the element contents referenced by the first will also be reflected in the second.
Fortunately, there are other ways to create one array as a true copy of another:
Arrays implement the Cloneable
interface and make public
the clone
method. Thus, invoking clone
on one array creates a copy of that array.
The java.util.Arrays
utility class defines the copyOf
method, overloaded for all primitive types, and for a generic object type. These overloaded methods also take a newLength
parameter, allowing them to be used to create a resized copy of an existing array.
Note: Both of the above approaches come with a significant caveat: Both perform shallow copies. In other words, a new array is being created in both cases—but if the element type of the original is an object type (including an array), then the element values copied are the references to the objects, rather than the objects themselves.
The implication of this shallow copying is that for arrays of mutable objects, as well as for multidimensional arrays (even if the element type of the innermost arrays is primitive), additional code will be needed to fully execute a copy of the objects referenced by the elements.
A zero-length array is sometimes included as the final element of a C struct
declaration; at runtime, the struct
will usually be allocated with sufficient memory for the array to have as many elements as needed, and this length will be assigned to an element of the struct
. In many ways, this is a low-level analogue to a Java array, where every array has a length
field. However, in this case, it is not the array that is being allocated, but the struct
—and with a non-zero size. ↩