A Tableau Calculus for the Analysis of Experiments

Author:Murphy | View: 21058 | Time: 2025-03-23 12:31:07

Experimental analysis often involves analyzing groups containing varying numbers of elements; for example, a different number of units for each treatment assignment within each stratum. We therefore encounter objects that are like matrices, except they are not perfect rectangular blocks; i.e., they are not always "filled."

In this note, we define a new structure, called a tableau, which can be regarded as a partially filled matrix, and seek to formalize the operations on tableaus that are used in the analysis of experiment. We then show how tableau notation can be used to express the key equations in a variety of statistical contexts, including stratification, Clustering, and the sum-of-squares decomposition. Moreover, we express these equations in both an invariant and index form:

invariant notation (coordinate-free form) – defined in terms of objects and operators, much like the matrix-vector product A⋅x, and
index notation (coordinate form) – defined explicitly in terms of indexed arrays and summation of multiple indices, much like expressing the matrix-vector product as ∑ⱼAᵢⱼ xⱼ.

Outline

This post consists of four main sections:

Review of classic notation, the pros and cons;
Theoretical development of the Tableau Calculus;
Application to Experiments (completely randomized, block-randomized, adjustment formula, cluster-randomized, block-cluster, and ANOVA sum of squares decomposition);
Python implementation

Classic Notation: Pros and Cons

In experimental analysis, there are three main styles of notation that are commonly used:

classic notation – treatment assignment is explicitly enumerated: unit (ijk) describes the _k_th unit in the _j_th stratum of the _i_th treatment group (see [1], [2], and [5]);
assignment notation – the assignment mechanism is treated as an independent variable, and we consider sums over quantities like ZᵢYᵢ or Zᵢⱼ Yᵢⱼ (see [2], [3], and [4]); and
set notation – explicit variables referring to treatment and control sets; Yᵗ and Yᶜ, or Y⁺ⱼ for the aggregate sums of the _j_th cluster, and then _Y⁺ₜ and Y⁺c for the set of aggregate sums for treatment and control clusters, etc. (see [5]).

Classic notation allows one to express formula in the most compact manner, as treatment assignment is directly indexed in the response array, which is useful in describing stratification, multi-level experiments, and ANOVA. However, this notation is philosophically unsettling as the enumeration of units directly depends on the treatment assignment.

Assignment notation, on the other hand, enumerates units without regard to treatment assignment, but requires an auxiliary assignment mechanism Z and a doubling of the number of multi-summations, with one set of sums containing a factor Z and the other a factor (1-Z). This has the shortcoming that it is not amenable to multi-level design.

Finally, set notation simplifies things a great deal, but requires special definitions for every different set: in clustering, Y⁺ⱼ is the aggregate sum for the _j_th cluster, and Y⁺ₜ={Y⁺ⱼ : zⱼ=1}, but these are not used in block design, etc; i.e., we have to keep defining different notation to refer to different groups, subgroups, or sums of a single fundamental underlying object.

Tableau notation seeks the best of each world: its basic worldview is consistent with assignment notation, but we define a structure called a tableau and a set of operations that allows one to write equations in an invariant form, which can be understood across contexts without having to define specific sets each time. Moreover, we take the novel interpretation of the assignment mechanism as a mask, such that we may consider the subtableaus consisting of treatment and control assignments, and then apply our basic operations on these subtableaus to express key statistical formula.

Tableau Notation

Getting Started: Tableaus, Shapes, and Index Spaces

Recall that an l-dimensional matrix is a rectangular array structure specified by its side lengths (n₁, …, nₗ). Its n₁…nₗ components are indexed by an index space

An l-dimensional tableau T is an l-dimensional array indexed by values in a bounded set I ⊂ ℤˡ₊ of positive integer arrays called the index space. So a tableau is just a partially filled matrix.

We say that a tableau T is in standard form if each index ranges from 1 to some terminal integer (without skipping) and if the range of index iₖ depends only on the values of the (k-1) preceding indices. An l-dimensional tableau in standard form may be described by its shape ηˡ, which is itself an (l-1)-dimensional standard tableau, defined such that ι=(i₁,…,iₗ)∈I if and only if (i₁,…,iₗ₋₁)∈I(ηˡ) and iₗ ∈ {1,…, ηˡ{i₁,…,iₗ₋₁}}_.

Alternatively, the shape is given by a sequence of tableaus ⟨η¹,…,ηˡ⟩, such that ηᵏ is a (k-1)-dimensional tableau defined inductively as the shape of ηᵏ⁺¹, for k=(l-1),…,1. The tableau ηᵏ thus determines the range of the index iₖ, as it depends on the preceding indices. This defines the index space inductively by the relations i₁∈{1,…η¹} and

for k=2, …, l. Notice how the range of each index may depend on the values of the preceding indices.

In Experimentation, we always start in standard form; i.e., the standard form is the enumeration of the experimental units within clusters, strata, etc. Our definition of tableau is, however, more general, to accommodate other structures we will encounter in a bit (namely, masked tableaus, which, in the context of experimentation, are the subset of components assigned to treatment or control; more on this later).

Example 1. As a simple example, consider a 3-dimensional tableau T, with shape η³ as shown below.

Table 1. The shape η³ of a 3-dimensional tableau T; Image by the Author.

Here, η¹=5, and η²=⟨ 5, 2, 3, 2, 4 ⟩ represents the number of elements in each row. For example, when i=3 and j=2, the component k would range k=1,…, 22, and so forth.

p-Cells; Outer and Inner Index Spaces

For any p=1, …, l, a p-cell of a tableau T is a subtableau consisting of all units that share the first p components; i.e., an individual p-cell is specified by ιₚ = (i₁, …, iₚ), which we shall denote T[i₁, …, iₚ]_. In the context of experimentation, the cells, without reference to a p, are typically assumed to be the penultimate cells with p=l-1. A p-cell is a q-dimensional tableau, with q=l-p, indexed over the space

which we refer to as the outer index space at ιₚ. The set of all p-cells is indexed by the inner index space

where Πₚ is the projection operator. In this way, for any p, we may loosely think of the index space as the bundle structure

where the fibers I[ιₚ]_ vary based on the location ιₚ of the base. (Note: I use the word loosely, as the fibers, though of the same dimension, are in general of different shape as we vary location in the base space, which is why they depend on ιₚ.)

In continuing the example from the previous section, the cell at (3,2) would consist of the (unseen) 1-dimensional array of 22 components, whereas the cell at (3,1) would only contain 10 components. The inner space I₂ would consist of the 16 permutations of (ij) defined in the the shape tableau shown in the table.

Similarly, the 1-cell at i=4 would consist of the 2-dimensional tableau indexed by the outer space at i=4 given by {1} × {1,…,5} ∪ {2} × {1,…,15}. The inner space I₁ is just the enumeration of the rows: I₁={1,2,3,4,5}.

Total and Partial Sums

The total sum of a tableau T, denoted T∘, is the sum of all its components.

The total sum of a given p-cell (i₁…iₚ) is referred to as a q-partial sum of T, as it requires a total of q=l-p summations, and is defined by

where the second equality holds whenever T is in standard form.

For a given p, we may therefore construct a p-tableau consisting of all q-partial sums of T, denoted by _⊕q T or T(q)_, such that the ιₚ component is defined as

for all ιₚ ∈ I.

In summary, for p+q=l:

a p-cell at ιₚ is a q-dimensional subtableau T[ιₚ]_ defined over the outer index space I[ιₚ], for which the first p indices are held fixed; and
the _qth partial sum ⊕qT or T(q)_ is a p-dimensional tableau defined on the inner index space Iₚ, as the total sum of the individual p-cells.

In general, whenever we have an operator

Tags: Clustering Data Science Deep Dives Experimentation Statistics

Add Fav

Comment

Murphy

Add friends

View space

Message

Recommend

◦ A Weekend AI Project: Making a Visual Assistant for People with Vision Impairments

◦ Named Entity Recognition Unmasked – The Essential Guide

◦ Maximizing AI Efficiency in Production with Caching: A Cost-Efficient Performance Booster

◦ Transformers Pipeline: A Comprehensive Guide for NLP Tasks

◦ Monitoring unstructured data for LLM and NLP

◦ Advanced GUI interface with Python

◦ Navigating Networks with NetworkX: A Short Guide to Graphs in Python

◦

◦ Downscaling a Satellite Thermal Image from 1000 m to 10 m (Python)

◦ XGBoost: Theory and Hyperparameter Tuning

◦ Say Once! Repeating Words Is Not Helping AI

◦ Leveraging Machine Learning for Effective Marketing Strategy Development