A Tableau Calculus for the Analysis of Experiments

Experimental analysis often involves analyzing groups containing varying numbers of elements; for example, a different number of units for each treatment assignment within each stratum. We therefore encounter objects that are like matrices, except they are not perfect rectangular blocks; i.e., they are not always "filled."
In this note, we define a new structure, called a tableau, which can be regarded as a partially filled matrix, and seek to formalize the operations on tableaus that are used in the analysis of experiment. We then show how tableau notation can be used to express the key equations in a variety of statistical contexts, including stratification, Clustering, and the sum-of-squares decomposition. Moreover, we express these equations in both an invariant and index form:
- invariant notation (coordinate-free form) – defined in terms of objects and operators, much like the matrix-vector product A⋅x, and
- index notation (coordinate form) – defined explicitly in terms of indexed arrays and summation of multiple indices, much like expressing the matrix-vector product as ∑ⱼAᵢⱼ xⱼ.
Outline
This post consists of four main sections:
- Review of classic notation, the pros and cons;
- Theoretical development of the Tableau Calculus;
- Application to Experiments (completely randomized, block-randomized, adjustment formula, cluster-randomized, block-cluster, and ANOVA sum of squares decomposition);
- Python implementation
Classic Notation: Pros and Cons
In experimental analysis, there are three main styles of notation that are commonly used:
- classic notation – treatment assignment is explicitly enumerated: unit (ijk) describes the _k_th unit in the _j_th stratum of the _i_th treatment group (see [1], [2], and [5]);
- assignment notation – the assignment mechanism is treated as an independent variable, and we consider sums over quantities like ZᵢYᵢ or Zᵢⱼ Yᵢⱼ (see [2], [3], and [4]); and
- set notation – explicit variables referring to treatment and control sets; Yᵗ and Yᶜ, or Y⁺ⱼ for the aggregate sums of the _j_th cluster, and then _Y⁺ₜ and Y⁺c for the set of aggregate sums for treatment and control clusters, etc. (see [5]).
Classic notation allows one to express formula in the most compact manner, as treatment assignment is directly indexed in the response array, which is useful in describing stratification, multi-level experiments, and ANOVA. However, this notation is philosophically unsettling as the enumeration of units directly depends on the treatment assignment.
Assignment notation, on the other hand, enumerates units without regard to treatment assignment, but requires an auxiliary assignment mechanism Z and a doubling of the number of multi-summations, with one set of sums containing a factor Z and the other a factor (1-Z). This has the shortcoming that it is not amenable to multi-level design.
Finally, set notation simplifies things a great deal, but requires special definitions for every different set: in clustering, Y⁺ⱼ is the aggregate sum for the _j_th cluster, and Y⁺ₜ={Y⁺ⱼ : zⱼ=1}, but these are not used in block design, etc; i.e., we have to keep defining different notation to refer to different groups, subgroups, or sums of a single fundamental underlying object.
Tableau notation seeks the best of each world: its basic worldview is consistent with assignment notation, but we define a structure called a tableau and a set of operations that allows one to write equations in an invariant form, which can be understood across contexts without having to define specific sets each time. Moreover, we take the novel interpretation of the assignment mechanism as a mask, such that we may consider the subtableaus consisting of treatment and control assignments, and then apply our basic operations on these subtableaus to express key statistical formula.
Tableau Notation
Getting Started: Tableaus, Shapes, and Index Spaces
Recall that an l-dimensional matrix is a rectangular array structure specified by its side lengths (n₁, …, nₗ). Its n₁…nₗ components are indexed by an index space
An l-dimensional tableau T is an l-dimensional array indexed by values in a bounded set I ⊂ ℤˡ₊ of positive integer arrays called the index space. So a tableau is just a partially filled matrix.
We say that a tableau T is in standard form if each index ranges from 1 to some terminal integer (without skipping) and if the range of index iₖ depends only on the values of the (k-1) preceding indices. An l-dimensional tableau in standard form may be described by its shape ηˡ, which is itself an (l-1)-dimensional standard tableau, defined such that ι=(i₁,…,iₗ)∈I if and only if (i₁,…,iₗ₋₁)∈I(ηˡ) and iₗ ∈ {1,…, ηˡ{i₁,…,iₗ₋₁}}_.
Alternatively, the shape is given by a sequence of tableaus ⟨η¹,…,ηˡ⟩, such that ηᵏ is a (k-1)-dimensional tableau defined inductively as the shape of ηᵏ⁺¹, for k=(l-1),…,1. The tableau ηᵏ thus determines the range of the index iₖ, as it depends on the preceding indices. This defines the index space inductively by the relations i₁∈{1,…η¹} and
for k=2, …, l. Notice how the range of each index may depend on the values of the preceding indices.
In Experimentation, we always start in standard form; i.e., the standard form is the enumeration of the experimental units within clusters, strata, etc. Our definition of tableau is, however, more general, to accommodate other structures we will encounter in a bit (namely, masked tableaus, which, in the context of experimentation, are the subset of components assigned to treatment or control; more on this later).
Example 1. As a simple example, consider a 3-dimensional tableau T, with shape η³ as shown below.

Here, η¹=5, and η²=⟨ 5, 2, 3, 2, 4 ⟩ represents the number of elements in each row. For example, when i=3 and j=2, the component k would range k=1,…, 22, and so forth.
p-Cells; Outer and Inner Index Spaces
For any p=1, …, l, a p-cell of a tableau T is a subtableau consisting of all units that share the first p components; i.e., an individual p-cell is specified by ιₚ = (i₁, …, iₚ), which we shall denote T[i₁, …, iₚ]_. In the context of experimentation, the cells, without reference to a p, are typically assumed to be the penultimate cells with p=l-1. A p-cell is a q-dimensional tableau, with q=l-p, indexed over the space
which we refer to as the outer index space at ιₚ. The set of all p-cells is indexed by the inner index space
where Πₚ is the projection operator. In this way, for any p, we may loosely think of the index space as the bundle structure
where the fibers I[ιₚ]_ vary based on the location ιₚ of the base. (Note: I use the word loosely, as the fibers, though of the same dimension, are in general of different shape as we vary location in the base space, which is why they depend on ιₚ.)
In continuing the example from the previous section, the cell at (3,2) would consist of the (unseen) 1-dimensional array of 22 components, whereas the cell at (3,1) would only contain 10 components. The inner space I₂ would consist of the 16 permutations of (ij) defined in the the shape tableau shown in the table.
Similarly, the 1-cell at i=4 would consist of the 2-dimensional tableau indexed by the outer space at i=4 given by {1} × {1,…,5} ∪ {2} × {1,…,15}. The inner space I₁ is just the enumeration of the rows: I₁={1,2,3,4,5}.
Total and Partial Sums
The total sum of a tableau T, denoted T∘, is the sum of all its components.
The total sum of a given p-cell (i₁…iₚ) is referred to as a q-partial sum of T, as it requires a total of q=l-p summations, and is defined by
where the second equality holds whenever T is in standard form.
For a given p, we may therefore construct a p-tableau consisting of all q-partial sums of T, denoted by _⊕q T or T(q)_, such that the ιₚ component is defined as
for all ιₚ ∈ I.
In summary, for p+q=l:
- a p-cell at ιₚ is a q-dimensional subtableau T[ιₚ]_ defined over the outer index space I[ιₚ], for which the first p indices are held fixed; and
- the _qth partial sum ⊕qT or T(q)_ is a p-dimensional tableau defined on the inner index space Iₚ, as the total sum of the individual p-cells.
In general, whenever we have an operator