Contributing¶

Development Setup¶

Clone the repository
Install dependencies with uv

git clone https://github.com/caverac/insurance-fraud-detection.git
cd insurance-fraud-detection

# Install all dependencies (Python + Node.js + pre-commit hooks)
make install

Or manually:

uv sync
uv run pre-commit install
cd packages/infra && yarn install

Code Style¶

Formatting¶

We use Black for formatting and isort for import sorting:

# Format code
make format
# Or manually:
uv run black packages/
uv run isort packages/

Linting¶

We use Pylint, Flake8, and pydocstyle (NumPy convention):

# Run all linters
make lint
# Or manually:
uv run flake8 packages/
uv run pylint packages/
uv run pydocstyle packages/

Type Hints¶

All code should include type hints:

def detect_outliers(
    df: DataFrame,
    column: str,
    threshold: float = 3.0,
) -> DataFrame:
    ...

Run type checking:

make type-check
# Or: uv run mypy packages/

Docstrings¶

Use NumPy-style docstrings:

def calculate_fraud_score(
    claims: DataFrame,
    weights: dict[str, float],
) -> DataFrame:
    """
    Calculate composite fraud score for each claim.

    Parameters
    ----------
    claims : DataFrame
        DataFrame with fraud flags from detection methods.
    weights : dict[str, float]
        Dictionary mapping flag categories to weights.

    Returns
    -------
    DataFrame
        DataFrame with fraud_score column added.

    Raises
    ------
    ValueError
        If required columns are missing.

    Examples
    --------
    >>> scores = calculate_fraud_score(claims, {"rules": 0.3})
    >>> scores.select("fraud_score").show()
    """

Testing¶

Running Tests¶

# Run all tests
make test
# Or: uv run pytest

# Run specific package tests
uv run pytest packages/fraud_detection/tests/

# Run specific test
uv run pytest packages/fraud_detection/tests/test_outliers.py::TestOutlierDetector::test_zscore_outliers_detected

Coverage Requirements¶

The project requires 100% code coverage. Tests will fail if coverage drops below this threshold.

Writing Tests¶

Follow these conventions:

import pytest
from pyspark.sql import SparkSession

class TestOutlierDetector:
    """Tests for OutlierDetector."""

    @pytest.fixture
    def detector(self, spark: SparkSession) -> OutlierDetector:
        """Create detector instance."""
        return OutlierDetector(spark, DetectionConfig())

    def test_detects_high_outliers(
        self,
        detector: OutlierDetector,
        sample_claims,
    ) -> None:
        """Test that high outliers are detected."""
        result = detector.detect_zscore_outliers(
            sample_claims, "charge_amount", "is_outlier"
        )
        outliers = result.filter(result.is_outlier).count()
        assert outliers > 0

Pull Request Process¶

Create a feature branch
Make your changes
Add tests for new functionality
Run the test suite
Update documentation if needed
Submit a pull request

Branch Naming¶

feature/description - New features
fix/description - Bug fixes
docs/description - Documentation changes
refactor/description - Code refactoring

Commit Messages¶

Use conventional commits:

feat: add Benford's Law analysis
fix: correct Z-score calculation for grouped data
docs: update configuration guide
test: add tests for duplicate detection
refactor: simplify outlier detection interface

PR Checklist¶

[ ] Tests pass (uv run pytest)
[ ] 100% code coverage maintained
[ ] Code is formatted (uv run black packages/ && uv run isort packages/)
[ ] Linting passes (make lint)
[ ] Type hints added (uv run mypy packages/)
[ ] Docstrings updated (NumPy style)
[ ] Documentation updated if needed

Adding New Detection Methods¶

1. Create the Module¶

# packages/fraud_detection/src/fraud_detection/rules/<module_name>.py
from pyspark.sql import DataFrame

class MyCustomRules:
    def __init__(self, spark, config):
        self.spark = spark
        self.config = config

    def check_my_pattern(self, claims: DataFrame) -> DataFrame:
        """Check for my custom pattern."""
        # Implementation
        return claims

2. Add Tests¶

# packages/fraud_detection/tests/test_<module_name>.py
class TestMyCustomRules:
    def test_detects_pattern(self, spark, sample_claims):
        rules = MyCustomRules(spark, DetectionConfig())
        result = rules.check_my_pattern(sample_claims)
        # Assertions

3. Integrate with Detector¶

# In FraudDetector.__init__
self.my_rules = MyCustomRules(spark, config)

# In FraudDetector._apply_rules
claims = self.my_rules.check_my_pattern(claims)

4. Update Documentation¶

Add documentation for the new method in the appropriate guide.

Release Process¶

Update version in pyproject.toml
Update CHANGELOG.md
Create release PR
After merge, tag the release
CI/CD deploys automatically

# Tag a release
git tag v0.2.0
git push origin v0.2.0