Skip to main content

DDD + Category Theory for Healthcare

Healthcare provider directories suffer from well-documented data quality issues. Studies report 40%+ inaccuracy rates in provider data — wrong addresses, stale credentials, phantom networks — leading to denied claims, patient frustration, and regulatory exposure.

A key contributing factor is that provider data is scattered across multiple bounded contexts (EHR systems, credentialing databases, contracting platforms, public directories) with limited tooling for principled merging, synchronization, or querying.

This project explores one possible approach: applying category theory as a formal foundation for reasoning about these integration challenges. It is not the only way to tackle these problems, and it comes with its own trade-offs (see Limitations), but it offers structural guarantees that ad-hoc approaches typically lack.

Five Structural Results

We show that several well-known categorical constructions map naturally to concrete infrastructure problems in this domain:

#ProblemCategorical ToolModule
1Entity Resolution — merging partial, overlapping records into a single golden recordColimit in fragment.ts
2CRDT Merge — reconciling concurrent updates without coordinationJoin in a semilattice (not a colimit in )crdt.ts, semilattice.ts
3Schema Translation — safely moving data between different schemasAdjoint triple schema.ts
4Event Sourcing — reconstructing state at any point in timePresheaf over a time posettemporal.ts, snapshot.ts
5Consistency (Sheaf Condition) — guaranteeing convergence across replicasSheaf gluing axiomVerified via chaos tests

Project Components

  • packages/implementation — TypeScript + fp-ts library implementing all five results with full test coverage
  • packages/pre-print — LaTeX manuscript with formal proofs and categorical diagrams
  • packages/docs — This documentation site

Limitations

  • Learning curve — Category theory introduces unfamiliar abstractions; teams without prior exposure will need ramp-up time.
  • Scope — The five results address structural integration problems (merging, translation, temporal consistency). They do not cover data entry errors at the source, organizational process failures, or incentive misalignment.
  • Validation — The implementation is a proof-of-concept project, not a production-hardened system. Real-world adoption would require significant engineering beyond what is shown here.
  • Alternatives exist — Master Data Management (MDM) platforms, probabilistic record linkage, and event-driven architectures address overlapping concerns with different trade-offs.