November 5, 2020
bnels.github.io/cam-intro
Collaborators who appear in this work
Funding I received while working on these topics
Concerned with understanding and using data through topology.
Data sets as topological spaces. Data points as topological spaces.
Compute algebraic signatures of shapes.
Application of homology to data begun by Robins [R 00],
furthered by [ELZ 02], [ZC 05], [CdS 10].
Applications (necessarily incomplete list):
Data visualization (mapper) [SMC 07], neuroscience [GGB 16], molecular properties [CWD 17], materials discovery [H+ 16], regularization [GND+ 20], genetics [CGR 13]
Figure from "Potentially highly potent drugs for 2019-nCoV" Nguyen et al. [N+ 20]
Homology turns statements about topology into statements about linear algebra.
The index k refers to the dimension of topological features.
Persistent homology [ELZ 02]
Zigzag homology [CdS 10]
Barcodes track how homology is created and destroyed.
Topologically, look at maps to union.
Something like this proposed in [CdS 10]: "Topological bootstrapping"
Applications limited by slow computations
Persistent and Zigzag Homology: A Matrix Factorization Viewpoint [CDN 19]
Code: BATS and BATS.py
Existing TDA packages work on chain complexes
(Makes sense in certain situations)
We take advantage of this parallelization.
Example: Indecomposable for bar with birth at 1, death at 2:
$I[1,2] = 0 \leftrightarrow \mathbb{F} \overset{\cong}\leftrightarrow \mathbb{F} \leftrightarrow 0 \leftrightarrow \dots$
Type-A quiver representations are isomorphic to something in the form
$\bigoplus_i I[b_i, d_i]$
These are Jordan 0 blocks in the case where all arrows go the same direction. [H 17]
Block matrix with structure of adjacency matrix of underlying graph.
Acts on $V_0 \oplus V_1 \oplus V_2$
$E_i$ have at most 1 nonzero in each row and column
Can read barcode off from $\Lambda$ by tracking basis vectors
[CDN 19]: the barcode form $\Lambda$ doesn't depend on choice of basis in each vector space
The game: you can commute $L$ matrices with $E_L$ and $\hat{E}_L$ in certain ways.
You can commute $U$ matrices with $E_U$ and $\hat{E}_U$ in certain ways.
Choose factorizations accordingly.
Sparsity makes all these bounds pessimistic in practice.
Performance vs. Dionysus [M] on zigzag of samples
Left: Parallel speedup of divide-and-conquer algorithm.
Right: breakdown of computation time for full problem.
"A Topology Layer for Machine Learning" (AISTATS 2020) [GND+ 20]
PyTorch extension: TopologyLayer
Sampling of other recent work in this area:
Our contributions
Left: the digit "4". Right: digits dreamed by a GAN. What's wrong?
Top: Image. Bottom: Level sets as simplicial complexes, filtered by pixel intensity.
Lemma (trivial): Every simplex addition either creates homology or destroys homology.
Implication: We can map the birth and death times of particular homology classes back to function values at particular points.
Generally, compute a sub-gradient of loss with respect to filtration value.
We consider simple algebraic functions of barcodes. Others are possible.
$$\{(b_i, d_i)\} \mapsto \sum_i (d_i - b_i)^p (b_i + d_i)^q$$ We can choose $p, q$ to measure desired topological behavior.
Idea: penalize GANs for unnatural topology.
We want one connected component, so penalize sum of lengths of $H_0$ bars.
Setup for training GAN with topological loss.
Left: under-trained GAN.
Center: improve for 50 batch iterations with topology loss.
Right: improve for 60k batch iterations.
Clockwise from top center: Uniformly distributed points. Decrease $H_0$. Decrease $H_1$. Increase $H_1$ + decrease $H_0$. Increase $H_1$. Increase $H_0$.
Left: Image of the digit "4". Right: An adversarial attack to misclassify as "8".
Review:
Where to next?
Winter Quarter: STAT 37411 = CAAM 37411 Topological Data Analysis