May 5, 2020
bnels.github.io/phd-talk
Collaborators who appear in this work
Funding I've received while working on these topics
Concerned with understanding and using data through topology.
Data sets as topological spaces. Data points as topological spaces.
Compute algebraic signatures of shapes.
Application of homology to data begun by Robins [R 00],
furthered by [ELZ 02], [ZC 05], [CdS 10].
Applications (necessarily incomplete list):
Data visualization (mapper) [SMC 07], neuroscience [GGB 16], molecular properties [CWD 17], materials discovery [H+ 16], regularization [GND+ 20], genetics [CGR 13]
Figure from "Potentially highly potent drugs for 2019-nCoV" Nguyen et al. [N+ 20]
Data set (discrete topological space) $X$.
$k$-simplex $(x_0,\dots,x_k)$ is the span of $k+1$ points in $X$.
Simplicial complex (topological space) $\mathcal{X}$ built from simplices.
Chain complex $C_\ast(\mathcal{X})$ (assume field coefficients).
$C_k(\mathcal{X})$ vector space with basis vector for every $k$-simplex.
Differential maps $\partial_k: C_k(\mathcal{X}) \to C_{k-1}(\mathcal{X})$, $\partial_k \circ \partial_{k+1} = 0$.
Homology $H_k(\mathcal{X}) = \ker \partial_k / \text{img} \partial_{k+1}$ quotient vector space.
$\dim H_0(\mathcal{X})$ = # connected components, $\dim H_1(\mathcal{X})$ = # of loops, ...
Maps $f:\mathcal{X}\to \mathcal{Y}$ produce induced maps $H_k(f) : H_k(\mathcal{X}) \to H_k(\mathcal{Y})$
Homotopy invariance: if $\mathcal{X}\simeq \mathcal{Y}$, $H_\ast(\mathcal{X}) \cong H_\ast(\mathcal{Y})$.
Persistent homology [ELZ 02]
Zigzag homology [CdS 10]
Barcodes track how homology is created and destroyed.
Classical topology:
$\mathcal{B}$ called base space. Study $\mathcal{X}$ through fibers $p^{-1}(b)$ parameterized by $b\in \mathcal{B}$.
Data is typically discrete. Sometimes more sensible to talk about covers $\mathcal{U}$, and pullbacks of covers $p^{-1}(\mathcal{U})$.
Why? Lots of structure to use - gives more ways to compare different data.
Can we bring computational tools (long exact and spectral sequences) to setting of data?
Simplicial complexes often contain much more information than is needed.
This can make computations much more expensive than necessary.
One idea is to use a cover $\mathcal{U}$ of $X$, and use the nerve $\mathcal{N}(\mathcal{U})$.
Nerve Theorem [B 48] (informal): For nice $\mathcal{U}$ and nice $X$, $\mathcal{N}(\mathcal{U}) \simeq X$.
Unfortunately, this isn't immediately useful for discrete data.
We'll call $\cal N(\cal U, \cal V)$ the bivariate nerve.
Something like this proposed in [CdS 10] for Witness Complexes
No computational tools for bivariate Nerve diagram
(maps are not inclusions)
We can compute zigzag homology of bivariate nerve diagrams!
Persistent and Zigzag Homology: A Matrix Factorization Viewpoint [CDN 19]
Code: BATS and BATS.py
See also Anjan's forthcoming thesis [D 20].
Existing TDA packages work on chain complexes
(Makes sense if all maps are inclusions)
We take advantage of this parallelization.
Example: Indecomposable for bar with birth at 1, death at 2:
$I[1,2] = 0 \leftrightarrow \mathbb{F} \overset{\cong}\leftrightarrow \mathbb{F} \leftrightarrow 0 \leftrightarrow \dots$
Type-A quiver representations are isomorphic to something in the form
$\bigoplus_i I[b_i, d_i]$
Acts on $V_0 \oplus V_1 \oplus V_2$
$E_i$ have at most 1 nonzero in each row and column
Can read barcode off from $\Lambda$ by tracking basis vectors
[CDN 19]: the barcode form $\Lambda$ doesn't depend on choice of basis in each vector space
Rips filtrations $\mathcal{R}(X; r)$ are expensive to use.
Difficult to compute high dimensional homology in most situations. True for other constructions as well.
To compute $PH_3(\mathcal{R}(X; r))$ when $X$ has 1000 points, need $\binom{1000}{5}$, about 10 trillion, 4-simplices.
May need even more than 1000 samples to resolve interesting structure.
[S 13] Linear-size approximations to Rips complex.
Constant factor may be large.
[GS 17, CS 18] Approximations using filtered Nerves.
Focus is on when Nerve is good approximation.
[LM 15] Mayer Vietoris spectral sequence
[Y 18] Leray spectral sequence + sheaf cohomology
We'll analyze the cover filtrations $\mathcal{X}(\mathcal{U}; r)$.
Idea: restrict simplices in a filtration $\mathcal{X}(r)$ to open sets in $\mathcal{U}$.
If we cover 1000 points with 40 overlapping sets of 50 points each, get
$\le 40 \binom{50}{5}$, or about 100 million 4-simplices (instead of 10 trillion).
Implementations available in BATS.
We can compare $PH_k(\mathcal{X}(t))$ and $PH_k(\mathcal{Y}(s))$ using an interleaving
In this case, we say $PH_k(\mathcal{X}(t))$ and $PH_k(\mathcal{Y}(s))$ are $(\alpha,\beta)$-interleaved.
Equivalent to the bottleneck distance on persistence diagrams [L 15].
In general can be difficult to write down "by hand".
Acyclic carrier theorem [M 80] can be used to
algorithmically generate maps from initial data (easier).
[N 20] extends the acyclic carrier theorem to filtrations.
Basic idea: carrier $\mathscr{C}$ assigns simplices in $\mathcal{X}$ to sub-complexes of $\mathcal{Y}$.
We can always generate maps when these subcomplexes satisfy acyclic condition.
$\ker \partial_k = \text{img}\, \partial_{k+1} \Rightarrow H_k = 0$, or contractible.
Let $\bar{\mathcal{U}}$ denote the collection of all non-empty intersections of sets in $\mathcal{U}$.
[N 20] (Informal) Suppose $PH_\ast(\mathcal{X}(U; r))$ is interleaved with something acyclic for all $U\in \bar{\mathcal{U}}$. Then $PH_\ast(\mathcal{X}(\mathcal{U}; r))$ and $PH_\ast(\mathcal{N}(\mathcal{U}))$ are also interleaved.
Specialization of results in [GS 17, CS 18]If points in $X$ are perturbed, how does $\mathcal{R}(X,\mathcal{U};r)$ change?
[N 20] (Informal) Let $X,Y$ be samples, and $\mathcal{U}$ a cover of $X\sqcup Y$. If all the sets in $\bar{\mathcal{U}}$ restricted to $X$ are close (in Hausdorff distance) to the sets restricted to $Y$, then the cover complexes have close persistent homology.
Different regimes [N 20]:
Flat torus sampled on $20 \times 10$ grid (200 points).
Cover pulled back from cover of 1st circular coordinate.
2D images: $\ell \times \ell$ pixel squares sampled from images.
3D images: sample $\ell \times \ell \times \ell$ voxel cubes.
Patches are sampled from natural (non-random) images, so there is structure in the data.
Filter for high-contrast, dense subsets of data.
k, p: take top p% ranked by distance to k-nearest neighbor.
Van Hateren data set [vHvdS 98] - data base of 2D natural images.
BRATS MRI data set [BRATS]
Penobscot seismic interpetation data [Pen]
2-dimensional PCA embedding with eigenpatches.
$7\times 7$ patches
Primary circle [dSC 04]
2-dimensional PCA embedding with eigenpatches.
Consider $d$-dimensional patches.
We'll consider idealized patches as functions on the disk $D^d$
We'd like to capture primary spheres and secondary circles in a topological model.
The primary $(d-1)$ sphere:
$\{(v_\phi^T x) \mid v_\phi\in S^{d-1}\}$
Fix $v_\phi$. A secondary circle:
$\{\cos(\theta)(v_\phi^T x)^2 + \sin(\theta)(v_\phi^T x)\mid \theta \in [0,2\pi)\}$
Note that $v_\phi$ and $-v_\phi$ share the same secondary circle.
$\mathcal{K}^d = \{ k(v_\phi, \theta; x) = \cos(\theta)(v_\phi^T x)^2 + \sin(\theta)(v_\phi^T x) \mid v_\phi\in S^{d-1}, \theta \in [0,2\pi) \}$
There is an identification $k(-v_\phi, -\theta;x) = k(v_\phi, \theta; x)$
$\mathcal{K}^2$ is a Klein bottle. $\mathcal{K}^3$ is a $3$-dimensional manifold.
Harris corner/edge detector [HS 88] analyzes eigenvalues of:
$M(x) = \sum_i \Delta(x_i) \Delta(x_i)^T$, $\Delta$ finite difference gradient.
Harris map $h: x\mapsto \text{MaxEigVec}(M(x))$
There is a sign/scale ambiguity for eigenvectors, so range is naturally $\mathbb{R}P^{d-1}$
Continuous limit
$M(f(x)) = \int_x dx\, \nabla(f(x))\nabla(f(x))^T$
For $k(v_\phi, \theta; x)\in \mathcal{K}^d$, $\nabla(k(x)) = k'(x) v_\phi$
so $M(k(x))$ is rank-1, and $h(k(v_\phi, \theta; x)) = [v_\phi]$
$\mathcal{K}^d$ is a fiber bundle over $\mathbb{R}P^{d-1}$
(non-trivial because of twist in identification)
The Harris map $h$ is a fibration for $\mathcal{K}^d$
Fibers $h^{-1}([v])$ are secondary circles.
Because of fibration structure, we can use Leray-Serre spectral sequence to compute homology.
We'll use integer ($\mathbb{Z}$) coefficients, which can be used to obtain field coefficients.
$H_\ast(\mathcal{K}^2) = (\mathbb{Z}, \mathbb{Z}\oplus \mathbb{Z}_2, 0)$
$H_\ast(\mathcal{K}^3) = (\mathbb{Z}, \mathbb{Z}_2\oplus \mathbb{Z}_2, 0, \mathbb{Z})$
We've developed a model $\cal K^d$ to explain structures we see in image patch data.
Potential applications: 3D compression, texture, neural nets [MSC 08, PC 14, GC 19]
If we have higher-dimensional patches, we could perform similar analyses.
Where would a 4-dimensional image come from?
Other applications where we might use a map:
Evasion [AC 14],
Time series,
Control systems.
Review:
Where to next?
Starters:
What are barriers to matrix factorization frameworks for other types of quiver representations?
How tight are the interleaving bounds for cover complexes?
What happens for different sized image patches?
The contents of this talk, with complete references, can be found in:
[N 20] Bradley Nelson. Parameterized Topological Data Analysis.
Ph.D. Dissertation. Stanford University. 2020.