Persistent Homology

A detailed look at 0-dimensional persistent homology for 1D signals and how phspectra uses it to decompose spectra into Gaussian components.

Mathematical foundation

The theory of persistent homology was introduced by Edelsbrunner et al. (2002) and developed into a complete computational framework by Zomorodian & Carlsson (2005). For a textbook treatment, see Edelsbrunner & Harer (2010). Here we restrict ourselves to the 0-dimensional case on 1D signals, which is all that phspectra needs.

Upper-level sets and connected components

Given a 1D signal $f : \{0, 1, \ldots, n-1\} \to \mathbb{R}$ , the upper-level set at threshold $t$ is

U_t = \{ i \mid f(i) \geq t \}

As $t$ decreases from $\max(f)$ toward $-\infty$ , the topology of $U_t$ changes:

Birth: when $t$ drops below a local maximum $f(i)$ , a new connected component appears. We say the component is born at value $f(i)$ .
Death: when $t$ drops enough that two components merge, the younger component (the one whose maximum is lower) dies at that threshold. The elder component absorbs it.

This is 0-dimensional persistent homology ( $H_0$ ): we track connected components (dimension 0) across a filtration parameterized by $t$ .

Birth-death pairs and persistence

Each local maximum except the global one produces a pair $(b, d)$ where:

$b = f(i)$ is the birth value (peak height)
$d$ is the death value (threshold at which it merges into an older component)
Persistence $\pi = b - d$ quantifies how prominent the peak is

The global maximum never merges into anything -- it has $d = 0$ and persistence equal to its height.

Key property: persistence provides a scale-independent ranking of peaks. High-persistence peaks correspond to real features; low-persistence peaks are noise fluctuations. No smoothing or derivative is needed.

Complexity

For a 1D signal of length $n$ , the algorithm sorts indices by decreasing value ( $O(n \log n)$ ) and processes each with a union-find structure (Tarjan 1975; amortized $O(\alpha(n))$ per operation). Total complexity is $O(n \log n)$ .

Visual intuition

The descending water level

Think of the signal as a mountain range seen from the side, with water starting above all peaks. As the water drains:

Figure 1 shows this process on a synthetic three-peak spectrum (the same kind of signal phspectra is designed to decompose). The three peaks are labelled by decreasing amplitude:

Peak	Channel	Amplitude	Role
A	60	$\approx 2.5$	Global maximum (tallest)
B	100	$\approx 1.4$	Second tallest
C	140	$\approx 0.7$	Weakest

The blue shaded region represents the water, and the dashed line marks the current threshold $t$ . Red dots appear at the moment a peak is born, annotated with its final persistence $\pi$ .

Water level stages — **Figure 1.** Descending water-level analogy for 0-dimensional persistent homology applied to a synthetic three-peak spectrum. As the threshold $t$ decreases (blue shaded region), connected components are born at local maxima (red dots) and die when they merge into a taller neighbor. Annotations show the final persistence $\pi = b - d$ for each peak. Generated by `uv run benchmarks persistence-plot`; see Reproducing results.

Walk through the four panels of Figure 1 left-to-right, top-to-bottom:

Top-left -- $t$ above all peaks. The water covers the entire signal. No part of the curve pokes above the surface, so the upper-level set $U_t$ is empty. There are zero connected components and nothing has been born yet.
Top-right -- $t$ just below peak A. The water drops past the tallest peak (channel 60, amplitude $\approx 2.5$ ). A single island emerges: connected component A is born at $b_A = f(60)$ . Because A is the global maximum it will never merge into anything else -- its persistence equals its full height. The annotation $\pi = 2.58$ reflects this: it is the most significant feature in the signal by a wide margin.
Bottom-left -- $t$ just below peak B. The water continues to descend and now drops past the second-tallest peak (channel 100, amplitude $\approx 1.4$ ). A second island appears: component B is born at $b_B = f(100)$ . At this moment two disconnected islands coexist. Peak C (channel 140) is still submerged. Note that both born peaks already carry their final persistence annotations -- persistence is determined entirely by when a peak is born and when it eventually dies, not by the current water level.
Bottom-right -- $t$ below the saddle between A and C. The water has dropped far enough that the valley between peaks A and C fills in and their islands connect. Because C has a lower maximum than A, C is the younger component: it dies at the merge threshold. Its persistence $\pi_C = b_C - d_C = 0.73$ measures the height difference between its peak and the saddle where it was absorbed. Meanwhile peak B, sitting in a separate valley, remains an independent island with $\pi_B = 1.20$ . The algorithm continues until only one component (the global maximum) survives.

The key insight is that persistence directly encodes significance: peak A ( $\pi = 2.58$ ) is clearly the dominant feature, peak B ( $\pi = 1.20$ ) is a solid secondary detection, and peak C ( $\pi = 0.73$ ) is weaker but still well above the noise floor. Noise fluctuations, by contrast, produce tiny islands that merge almost immediately, resulting in persistence values close to zero. Setting a threshold $\pi_{\min} = \beta \times \sigma_{\rm rms}$ cleanly separates real peaks from noise without any smoothing or derivative computation.

The persistence diagram

The persistence diagram provides a compact summary of every birth-death event in the filtration. Each peak is plotted as a point $(b, d)$ where $b$ is the birth value (peak height) and $d$ is the death value (merge threshold). The diagonal line $b = d$ represents zero persistence -- a peak born and immediately killed. The farther a point sits from the diagonal, the more significant the peak.

In Figure 2, the three real peaks (red points) are clearly separated from the cluster of noise points hugging the diagonal. The dashed line parallel to the diagonal marks the persistence threshold $\pi_{\min} = \beta \times \sigma_{\rm rms}$ . For this synthetic signal ( $\sigma_{\rm rms} = 0.08$ , $\beta = 3.5$ ), the threshold is $\pi_{\min} = 0.28$ . Points below this line -- farther from the diagonal -- have persistence exceeding $\pi_{\min}$ and are retained as real peaks. Points above the line are rejected as noise.

The visual gap between signal and noise in Figure 2 illustrates why $\beta$ is insensitive: any threshold line drawn through this gap selects the same three peaks. Moving $\beta$ from 3.5 to 4.5 shifts the line slightly but does not change which points it separates -- the result is stable across a wide range of thresholds.

This is the fundamental advantage over derivative-based methods: instead of relying on regularization to suppress noise (where too little creates false peaks and too much destroys real ones), persistence ranks every candidate peak by an intrinsic topological measure that is robust to the noise level.

From persistence to Gaussian candidates

The persistence filtration produces a list of PersistentPeak objects, each carrying five pieces of information: the channel index of the local maximum, its birth value (peak height), its death value (merge threshold), its persistence ( $\pi = b - d$ ), and the saddle index (the channel where the component merged into an older neighbor). This is the raw output of the topology -- but these are not yet Gaussian fits. The next step is to convert them into initial guesses that a nonlinear least-squares solver can refine.

For each peak that survives the persistence threshold $\pi_{\min} = \beta \times \sigma_{\rm rms}$ :

Amplitude is set to the signal value at the peak's channel index: $a_0 = f(\text{index})$ .
Mean (center position) is set to the peak's channel index itself. The persistence algorithm identifies exactly which sample is the local maximum, so no interpolation is needed -- the index is already the best discrete estimate of the peak location.
Standard deviation is estimated from the peak-to-saddle distance. The saddle is the channel where the component died (merged into a taller neighbor), so $d = |\text{peak\_index} - \text{saddle\_index}|$ gives a one-sided extent. For a Gaussian with peak value $b$ (birth) that drops to $d_v$ (death) at distance $d$ , the exact relationship is:

\sigma_0 = \frac{d}{\sqrt{2 \ln(b / d_v)}}

This provides a physically grounded initial width from the topology alone. For the global maximum (which never dies and has no saddle), the estimate falls back to $\sigma_0 = 1.0$ .

The peaks are ordered by persistence (most significant first), so the solver starts with the strongest features anchored in place, which improves convergence.

This initial guess is then passed to a bounded Levenberg-Marquardt solver, which fits a sum of Gaussians to the full signal:

F(x, \mathbf{a}, \mathbf{\mu}, \mathbf{\sigma}) = \sum_i A_i \exp\left(-\frac{(x - \mu_i)^2}{2\sigma_i^2}\right).

The solver adjusts all three parameters per component simultaneously (with bounds: $a \geq 0$ , $\mu \in [0, n)$ , $\sigma \in [0.3, n/2]$ ). Because the initial positions and amplitudes are already close to the truth -- persistence detected the right peaks -- the fit typically converges in few iterations, and the solver's main job is to determine the correct widths and fine-tune the positions and amplitudes. That being said, optimizing this function is the slowest step in the pipeline, which is why the refinement loop is designed to minimize unnecessary refits by only accepting changes that improve the AICc.

The full decomposition pipeline

The persistence-based peak detection described above is the first stage of a multi-step pipeline that includes curve fitting, component validation, and iterative refinement. For the complete algorithm -- noise estimation, initial Gaussian fit, validation criteria, and the AICc-driven refinement loop -- see the Algorithm Overview.

Why this works for spectra

Radio-astronomical spectra are superpositions of Gaussian emission lines sitting on a noisy baseline. Persistent homology is well-suited because:

No smoothing needed -- derivatives require choosing a kernel width; persistence works directly on the raw signal.
Multi-scale by construction -- broad and narrow features are detected simultaneously, ranked by the same persistence measure.
Single parameter -- $\beta$ controls the noise/signal boundary in physically meaningful units ( $\sigma$ ).
Robust to noise -- the signal-masked RMS estimation and persistence ranking naturally separate noise fluctuations from real components without explicit SNR heuristics at the detection stage.

Mathematical foundation​

Upper-level sets and connected components​

Birth-death pairs and persistence​

Complexity​

Visual intuition​

The descending water level​

The persistence diagram​

From persistence to Gaussian candidates​

The full decomposition pipeline​

Why this works for spectra​