Pytifex — Automated Differential Testing for Python Type Checkers
Pytifex automatically discovers disagreements between Python type checkers by mining real bugs from type checker repositories, generating targeted test cases with an LLM, and establishing ground truth through multi-tiered evaluation.
How It Works
Mine bugs from Generate code Run 4 type Evaluate which
GitHub issues → variations via → checkers on → checker is
(mypy, ty, ...) Gemini LLM each example correct
- Mine — Fetch real bug reports (false positives, false negatives) from mypy, pyrefly, ty, and pyright GitHub repositories
- Mutate — Use the bugs as seeds for Gemini to generate new code targeting similar edge cases
- Test — Run mypy, pyrefly, zuban, and ty on each generated example; keep only disagreements
- Evaluate — Determine which checker is correct using runtime crash detection, Hypothesis testing, PEP spec matching, and AST analysis
Type Checkers Tested
| Checker | Version |
|---|---|
| mypy | 1.19.0 |
| pyrefly | 0.44.2 |
| zuban | 0.3.0 |
| ty | 0.0.1-alpha.32 |
Quick Start
pip install pytifex
export GEMINI_API_KEY=your_key
uv run pytifexNote: Pytifex is a research tool developed for a senior comprehensive project. It implements a bug-seeded mutation methodology for proactively finding type checker bugs before users encounter them.
Documentation
- Getting Started — Installation, setup, and first run
- Architecture — System design, pipeline flow, and file structure
- Bug Mining & Mutation — How seeds are mined from GitHub and mutated into test cases
- Evaluation System — Multi-tiered evaluation: runtime, Hypothesis, PEP specs, static analysis
- Type Checkers — How checkers are invoked, output parsed, and how to add new ones
- Divergence Patterns — The 10 built-in edge-case patterns and how to add more
