Usage
Run from the src/tc_disagreement directory.
# Full pipeline: mine → generate → filter → evaluate
uv run pytifex
# Target a specific number of disagreements
uv run pytifex --num-examples 10
# Use a more capable model
uv run pytifex --model gemini-2.5-pro
# Skip GitHub seed fetching
uv run pytifex --no-github
# Verbose output
uv run pytifex -vCommands
| Command | Description |
|---|---|
uv run pytifex | Full pipeline (generate + evaluate) | |uv run pytifex generate| Generate disagreements only | |uv run pytifex check| Run type checkers on existing examples | |uv run pytifex eval` |
Evaluate existing results |
Options
| Option | Default | Description |
|---|---|---|
--num-examples N |
5 |
Target number of disagreements to find |
--batch-size N |
15 |
Examples to generate per LLM batch |
--max-attempts N |
5 |
Maximum generation attempts |
--max-refinements N |
2 |
Refinement attempts per non-divergent example |
--model MODEL |
gemini-2.5-flash |
Gemini model to use |
--eval-method METHOD |
comprehensive |
Evaluation method |
--no-github |
— | Skip fetching seeds from GitHub issues |
-v, --verbose |
— | Show all examples, not just disagreements |
Example Output
============================================================
PYTIFEX - Full Pipeline (with disagreement filtering)
============================================================
[STEP 1/2] Generating examples with disagreement filtering...
Target: 2 disagreement examples
Using model: gemini-2.5-flash
[STEP 0] Fetching seed examples from GitHub issues...
Found 5 code examples from python/mypy
Found 5 code examples from astral-sh/ty
Total: 10 examples from GitHub issues
[Attempt 1/5] Generating batch of 15...
Using 5 GitHub issue seeds
Parsed 19 examples, running type checkers...
✓ generic-typevar-bound: DISAGREEMENT {'mypy': 'ok', 'pyrefly': 'ok', 'zuban': 'error', 'ty': 'ok'}
✓ self-in-protocol: DISAGREEMENT {'mypy': 'error', 'pyrefly': 'error', 'zuban': 'error', 'ty': 'ok'}
Progress: 2/2 disagreements found
GENERATION COMPLETE: 2 disagreements from 19 total examples
Troubleshooting
No disagreements found — Increase --max-attempts or --batch-size, or try --model gemini-2.5-pro.
GitHub rate limit errors — Set GITHUB_TOKEN, or use --no-github.
Type checker not found — Use uv run (auto-installs), or manually pip install mypy pyrefly zuban ty.