Nick Bennett and Todd Nordquist" />
Coding conventions for the Deep Dive Coding Java + Android Bootcamp
A number of tasks in this bootcamp focus on reading or writing pseudocode. Pseudocode is a set of practices and notational conventions used to express computational processes in a non-programming-language-specific form, but usually with the aim of facilitating or documentating implementation in one or more programming languages.
Pseudocode shares its primary aim with flowcharting: expression of a procedure in a clear, unambigous form, built on a relatively small vocabulary of symbols that a reader may be assumed to know. Pseudocode differs from flowcharting primarily in the emphasis on a textual, rather than graphical, representation of the process or algorithm. Another difference is that while flowchart symbols are mostly standardized, there isn’t a formal or de facto standard syntax for pseudocode; instead, pseudocode typically employs a combination of natural language and mathematical notation.
The guidelines and rules below have been formulated with the following aims in mind:
Applicability in the workplace
Like coding style guides, common practices for pseudocode differ between organizations. In fact, in our experience, most development organizations don’t have formal style guides for pseudocode. Nonetheless, our intent is that the practice in reading and writing pseudocode that participants get in this bootcamp will be valuable not only in the bootcamp, but in the participants’ eventual workplaces.
Markdown authoring
Any strict requirement or constraint declared in this page is intended to be satisfiable using Markdown, with some embedded HTML for subscripts, superscripts, etc. (Of course, a good set of Markdown extensions, such as those for tables, footnotes, definition lists, and $\LaTeX$ mathematical notation—all of which are available in the kramdown Markdown processor used by default in GitHub Pages—will usually make the task easier.)
Unambiguous communication
A central goal, when writing pseudocode, should always be to communicate the process clearly, with as little room as possible for misunderstanding. But our aim here is even higher: We want our pseudocode, in a sense, to make the algorithms that they express come to life. This can only happen when the reader isn’t distracted—by questions on undocumented assumptions, confusion caused by the use of poorly defined symbols, inconsistencies in typography and formatting, etc.
Note that there are recognized experts in the fields of computer science and software engineering who advocate the use of programming language code over pseudocode. For example, Robert Sedgewick, the author or co-author of several influentual texts on algorithms, has recommended the Java language for expressing algorithms, even when the intended implementation is in other languages. Nonetheless, we find significant value in the practice of writing pseudocode, and the inclusion of pseudocode in technical documentation.
If the computational procedure being described in the pseudocode is parameterized by one or more input values, these inputs must be declared, with the relevant types, at the start of the pseudocode—e.g. in a section labeled “Inputs” or “Given”.
Well-known mathematical constants such as $\pi$, $\phi$, and $\text{e}$ must be referenced symbolically, rather than using literal numeric values.
Constants and variables must be declared prior to or on first use, with the type included in the declaration.
Just as in programming code, “magic numbers” should be avoided in pseudocode. Do not include literal constants (other than $-1$, $0$, $1$, and other self-explanatory values) in the pseudocode, other than in the definitions of constants or variables.
Input, constant, and variable type declarations should not use Java-specific types; instead, prefer more general mathematical types (e.g. integer, real, Boolean) over Java primitive types. Similarly, prefer sequence, array, list, and set over Java implementation types.
While the order of declaration of variables, fields, and constants in Java (a type followed by a name) is familiar to C, C#, C++, and Java programmers, it may not be intuitive to non-programmers, or to programmers unfamiliar with the C-derived languages. Because of this, prefer a reversed declaration order of symbols in pseudocode: the name, followed by a colon, and then the type–e.g. “$x\text{: real}$”.
Arrays, lists, and sets must be clearly declared as such, preferably following the form “array of {type}”, or “{type} array”, rather than “{type}[]”. For example, “$V\text{: array of integer}$” can be used to declare $V$ as an array of integers.
Symbolic names of inputs, constants, and variables must be distinguished typographically from other text in the pseudocode. The mathematical convention of italicizing such symbols is preferred for this purpose. (Note that this distinction is implemented in $\LaTeX$ by default.)
While pseudocode must not be written in such a way that it can only be implemented in one programming language, there are a number of English words and phrases that are used as keywords in most programming languages; don’t hesitate to use them if they help make the logic of the pseudocode clear. For example, the following are often included in pseudocode, and are encouraged for pseudocode written in this bootcamp:
On the other hand, there are a few keywords in several programming languages that have alternatives in natural language that may be more clear to readers, and should thus be preferred to their programming equivalents when writing pseudocode. In particular, “otherwise” should be used in place of “else”.
To avoid confusion around assignment vs. equality comparison, pseudocode in this bootcamp must use a single equals sign ($=$) for equality comparison, and must use the $\gets$ (“gets”) operator for assignment.
The only exception allowed to this rule, if neither $\LaTeX$ notation nor HTML/XML entities are available, is to use “Let” (or if the writer’s chosen natural language is not English, then the local language equivalent of “Let”) with a single equals sign for assignment. For example, the following two statements are equivalent; the first is the form required in Markdown or HTML, while the second is an acceptable alternative otherwise.
Note that even when $\LaTeX$ is not available, the ←
HTML entity or ←
HTML/XML numeric entity may be used to produce the left arrow symbol (←) in Markdown or HTML content.
Do not mix multiple natural languages in the pseudocode for any single algorithm or other computational process. Of course, pseudocode can and should be written in the natural language most likely to be shared between the writer and the audience; muddying the waters by mixing languages can confuse the intended readers, and must be avoided.
When using mathematical notation, any symbols beyond those typically seen in second-year algebra, introductory analytic geometry, or introductory trigonometry must either be defined prior to use, or be annotated with a callout to a definition (e.g. in a footnote or separate page). More generally, when in doubt, favor explicit definition over assumption of shared understanding.
To refer to individual elements of an array, list, or sequence, the element indices should be written as subscripts, though brackets or parentheses are also permitted. For example, $x_i$ is preferred over x(i) or x[i]. (When $\LaTeX$ notation is not available, the HTML <sub>...</sub>
element can be used for subscripts in Markdown or HTML content.)
Exponents should be indicated with superscripts when possible; otherwise, the ^
can be used to separate the base from the exponent in an expression. For example, prefer $a^b$ to a^b. (When $\LaTeX$ is not available, the HTML <sup>...</sup>
element can be used for superscripts in Markdown or HTML.)
Prefer grammar and syntax borrowed from natural language and mathematical expressions to those of programming languages. In particular, even when using symbols, remember that you’re writing statements that must be read and understood by human readers. Those statements may employ a simplified grammar, like headlines in a newspaper, but they must still be understandable. Also, just as the appropriate syntax, capitalization, and punctuation can help a reader understand common written language, they should be used for the same purpose in pseudocode.
For example, when expressing an operation to be repeated for all values of $row$ between 1 and $rowCount$ (inclusive), prefer
- …
For each $row$ in $(1, 2, 3, \ldots, rowCount)$:
- …
to
- …
for $row = 1$ to $rowCount$
- …
Arguably, the second has a bit more room for misunderstanding. For example, is the step size 1, or some other increment? Is $rowCount$ an inclusive or exclusive bound? More subtly, the capitalization and punctuation in the first help the reader see clearly the logical separation between the iteration statement and its predecessor, and between the iteration conditions and the statements to be executed repeatedly under those conditions.
While a programming-language-style expression may be admirably unambiguous, it requires the reader to be familiar with the underlying language (or at least, a family of languages). So, an expression like the following is not acceptable in pseudocode written in this bootcamp.
- …
for ($row = 1$; $row <= rowCount$; $row$++):
- …
Do not use explicit multiline grouping constructs, such as the braces found in programming languages derived from C. Instead, use indentation and numbering or bullets to indicate structure, as described in “Formatting” (below).
Pseudocode must be formatted using an ordered (numbered) or unordered (bullet) list, with nested lists as appropriate. Use the indentation structure of nested lists to provide visual and logical structure to the pseudocode.
Use embedded $\LaTeX$ and code fragments as appropriate. However, when using $\LaTeX$ or code blocks, be sure to respect the current indentation of the enclosing list, so that the list structure isn’t broken (e.g. the numbering in an ordered list restarted).
Depending on the intended audience, additional explanatory text annotations may help the reader to understand the pseudocode. When considering the inclusion of such annotations, please follow these guidelines:
Prefer clarifying the pseudocode itself to including additional explanatory text.
An explanatory annotation must either be inline—i.e. accompanying the pseudocode statement it explains (e.g. in a parenthetical statement following the pseudocode statement)—or be included in a legend following the entire block of pseudocode.
Do not mix and match inline and legend annotations for the same block of pseudocode.
If using inline annotations, they must be distinguishable from the pseudocode itself.
If using inline annotations, the annotations must not break the structure of the pseudocode. That is, the indentation and numbering (if ordered lists are used) of the pseudocode statements must not be disrupted by the explanatory text accompanying the pseudocode statements.
If using a legend for explanatory annotations, an ordered (numbered/lettered) list must be used in the pseudocode, and the legend must reference steps by number.
The following example includes inline explanatory annotations with some of the statements. Each such annotation is written in parentheses immediately below the statement it explains, and follows the indentation structure of the pseudocode.
What follows is annotated pseudocode for a binary search over a sorted array of real numbers. Note that this—like virtually all pseudocode (or programming language code) expressions of a non-trivial procedure—is just one of many ways to express the binary search algorithm.
Binary search over a sorted array
Mathematical symbols
Beyond the basic arithmetic and logical operations, this pseudocode uses the following mathematical symbol(s):
Symbol Meaning $\lfloor x \rfloor$ The floor of a real number $x$—i.e. the largest integer that is less than or equal to $x$. Given
$D\text{: real array}$, of length $n$, to be searched: $D = (d_0, d_1, \ldots, d_{n - 1})$.
The elements of $D$ are in ascending order.
$v\text{: real}$ is the value to search for:
- If $v$ is in $D$, its position must be returned.
- If $v$ is not in $D$, $-(p + 1)$ must be returned, where insertion of $v$ at position $p$ would preserve the sort order.
To search for $v$ in $D$
$a\text{: integer} \gets 0$.
($a$ is the lower inclusive bound of the search range.)
$b\text{: integer} \gets n$.
($b$ is the upper exclusive bound of the search range.)
While $a < b$:
$m\text{: integer} \gets \left \lfloor \frac{a + b}{2} \right \rfloor$.
($m$ is the midpoint of the search range.)
If $d_m = v$,
return $m$;
(Done: $v$ is found at position $m$.)
otherwise, if $d_m > v$,
$b \gets m$;
($d_m$ is too high, so $m$ is used as the new exclusive upper bound of the search range.)
otherwise,
$a \gets m + 1$.
($d_m$ is too low, so $(m + 1)$ is used as the new inclusive lower bound of the search range.)
Return $-(a + 1)$.
($v$ is not in $D$, but inserting $v$ at position $a$ would preserve the sorted order.)
The following Markdown, with embedded $\LaTeX$, was used to produce the rendered example above:
## Binary search over a sorted array
### Mathematical symbols
Beyond the basic arithmetic and logical operations, this pseudocode uses the following mathematical symbol(s):
| Symbol | Meaning |
|:------:|:--------|
| $\lfloor x \rfloor$ | The _floor_ of a real number $x$---i.e. the largest integer that is less than or equal to $x$. |
### Given
* $D\text{: real array}$, of length $n$, to be searched: $D = (d_0, d_1, \ldots, d_{n - 1})$.
* The elements of $D$ are in ascending order.
* $v\text{: real}$ is the value to search for:
* If $v$ is in $D$, its position must be returned.
* If $v$ is _not_ in $D$, $-(p + 1)$ must be returned, where insertion of $v$ at position $p$ would preserve the ascending sort order.
### To search for $v$ in $D$
1. $a\text{: integer} \gets 0$.
($a$ is the lower _inclusive_ bound of the search range.)
2. $b\text{: integer} \gets n$.
($b$ is the upper _exclusive_ bound of the search range.)
3. While $a < b$:
1. $m\text{: integer} \gets \left \lfloor \frac{a + b}{2} \right \rfloor$.
($m$ is the midpoint of the search range.)
2. If $d_m = v$,
* return $m$;
(Done: $v$ is found at position $m$.)
otherwise, if $d_m > v$,
* $b \gets m$;
($d_m$ is too high, so $m$ is used as the new exclusive upper bound of the search range.)
otherwise,
* $a \gets m + 1$.
($d_m$ is too low, so $(m + 1)$ is used as the new inclusive lower bound of the search range.)
4. Return $-(a + 1)$.
($v$ is not in $D$, but inserting $v$ at position $a$ would preserve the sorted order.)