Contributing¶
Development Setup¶
- Clone the repository
- Install dependencies with uv
git clone https://github.com/caverac/insurance-fraud-detection.git
cd insurance-fraud-detection
# Install all dependencies (Python + Node.js + pre-commit hooks)
make install
Or manually:
Code Style¶
Formatting¶
We use Black for formatting and isort for import sorting:
Linting¶
We use Pylint, Flake8, and pydocstyle (NumPy convention):
# Run all linters
make lint
# Or manually:
uv run flake8 packages/
uv run pylint packages/
uv run pydocstyle packages/
Type Hints¶
All code should include type hints:
Run type checking:
Docstrings¶
Use NumPy-style docstrings:
def calculate_fraud_score(
claims: DataFrame,
weights: dict[str, float],
) -> DataFrame:
"""
Calculate composite fraud score for each claim.
Parameters
----------
claims : DataFrame
DataFrame with fraud flags from detection methods.
weights : dict[str, float]
Dictionary mapping flag categories to weights.
Returns
-------
DataFrame
DataFrame with fraud_score column added.
Raises
------
ValueError
If required columns are missing.
Examples
--------
>>> scores = calculate_fraud_score(claims, {"rules": 0.3})
>>> scores.select("fraud_score").show()
"""
Testing¶
Running Tests¶
# Run all tests
make test
# Or: uv run pytest
# Run specific package tests
uv run pytest packages/fraud_detection/tests/
# Run specific test
uv run pytest packages/fraud_detection/tests/test_outliers.py::TestOutlierDetector::test_zscore_outliers_detected
Coverage Requirements¶
The project requires 100% code coverage. Tests will fail if coverage drops below this threshold.
Writing Tests¶
Follow these conventions:
import pytest
from pyspark.sql import SparkSession
class TestOutlierDetector:
"""Tests for OutlierDetector."""
@pytest.fixture
def detector(self, spark: SparkSession) -> OutlierDetector:
"""Create detector instance."""
return OutlierDetector(spark, DetectionConfig())
def test_detects_high_outliers(
self,
detector: OutlierDetector,
sample_claims,
) -> None:
"""Test that high outliers are detected."""
result = detector.detect_zscore_outliers(
sample_claims, "charge_amount", "is_outlier"
)
outliers = result.filter(result.is_outlier).count()
assert outliers > 0
Pull Request Process¶
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Run the test suite
- Update documentation if needed
- Submit a pull request
Branch Naming¶
feature/description- New featuresfix/description- Bug fixesdocs/description- Documentation changesrefactor/description- Code refactoring
Commit Messages¶
Use conventional commits:
feat: add Benford's Law analysis
fix: correct Z-score calculation for grouped data
docs: update configuration guide
test: add tests for duplicate detection
refactor: simplify outlier detection interface
PR Checklist¶
- [ ] Tests pass (
uv run pytest) - [ ] 100% code coverage maintained
- [ ] Code is formatted (
uv run black packages/ && uv run isort packages/) - [ ] Linting passes (
make lint) - [ ] Type hints added (
uv run mypy packages/) - [ ] Docstrings updated (NumPy style)
- [ ] Documentation updated if needed
Adding New Detection Methods¶
1. Create the Module¶
# packages/fraud_detection/src/fraud_detection/rules/<module_name>.py
from pyspark.sql import DataFrame
class MyCustomRules:
def __init__(self, spark, config):
self.spark = spark
self.config = config
def check_my_pattern(self, claims: DataFrame) -> DataFrame:
"""Check for my custom pattern."""
# Implementation
return claims
2. Add Tests¶
# packages/fraud_detection/tests/test_<module_name>.py
class TestMyCustomRules:
def test_detects_pattern(self, spark, sample_claims):
rules = MyCustomRules(spark, DetectionConfig())
result = rules.check_my_pattern(sample_claims)
# Assertions
3. Integrate with Detector¶
# In FraudDetector.__init__
self.my_rules = MyCustomRules(spark, config)
# In FraudDetector._apply_rules
claims = self.my_rules.check_my_pattern(claims)
4. Update Documentation¶
Add documentation for the new method in the appropriate guide.
Release Process¶
- Update version in
pyproject.toml - Update CHANGELOG.md
- Create release PR
- After merge, tag the release
- CI/CD deploys automatically