DDD + Category Theory for Healthcare
Healthcare provider directories suffer from well-documented data quality issues. Studies report 40%+ inaccuracy rates in provider data — wrong addresses, stale credentials, phantom networks — leading to denied claims, patient frustration, and regulatory exposure.
A key contributing factor is that provider data is scattered across multiple bounded contexts (EHR systems, credentialing databases, contracting platforms, public directories) with limited tooling for principled merging, synchronization, or querying.
This project explores one possible approach: applying category theory as a formal foundation for reasoning about these integration challenges. It is not the only way to tackle these problems, and it comes with its own trade-offs (see Limitations), but it offers structural guarantees that ad-hoc approaches typically lack.
Five Structural Results
We show that several well-known categorical constructions map naturally to concrete infrastructure problems in this domain:
| # | Problem | Categorical Tool | Module |
|---|---|---|---|
| 1 | Entity Resolution — merging partial, overlapping records into a single golden record | Colimit in | fragment.ts |
| 2 | CRDT Merge — reconciling concurrent updates without coordination | Join in a semilattice (not a colimit in | crdt.ts, semilattice.ts |
| 3 | Schema Translation — safely moving data between different schemas | Adjoint triple | schema.ts |
| 4 | Event Sourcing — reconstructing state at any point in time | Presheaf over a time poset | temporal.ts, snapshot.ts |
| 5 | Consistency (Sheaf Condition) — guaranteeing convergence across replicas | Sheaf gluing axiom | Verified via chaos tests |
Project Components
packages/implementation— TypeScript + fp-ts library implementing all five results with full test coveragepackages/pre-print— LaTeX manuscript with formal proofs and categorical diagramspackages/docs— This documentation site
Limitations
- Learning curve — Category theory introduces unfamiliar abstractions; teams without prior exposure will need ramp-up time.
- Scope — The five results address structural integration problems (merging, translation, temporal consistency). They do not cover data entry errors at the source, organizational process failures, or incentive misalignment.
- Validation — The implementation is a proof-of-concept project, not a production-hardened system. Real-world adoption would require significant engineering beyond what is shown here.
- Alternatives exist — Master Data Management (MDM) platforms, probabilistic record linkage, and event-driven architectures address overlapping concerns with different trade-offs.
Quick Links
- Installation — Get up and running
- Project Structure — Navigate the monorepo
- DDD Primer — Domain-Driven Design in context
- Category Theory Primer — Accessible definitions
- API Reference — Module-by-module exports
- Worked Example — The Dr. Jane Doe walkthrough