Self-Improving
Agentic Dev Gym
Train smaller language models from your Claude Code execution traces. Build your own personalized coding assistant with the Ouroboros Flywheel.
10 Core Capabilities
Everything you need to capture traces, train models, and deploy your own coding assistant.
Trace Capture
Intercept Claude Code tool calls via hooks. Capture prompts, tool outputs, and reasoning traces automatically.
Quality Framework
Multi-judge scoring with syntax, semantic, and execution validators. Only high-quality traces become training data.
Privacy by Design
PII detection, secret scrubbing, and path anonymization. Your code stays private throughout the pipeline.
Training Strategies
SFT, DPO, and progressive distillation pipelines. Choose the right strategy for your model and data.
Model Registry
Version, tag, and manage trained model artifacts. Track lineage from trace to deployed checkpoint.
Progressive Routing
Confidence-based routing between your local model and Claude. Your model handles what it knows, Claude handles the rest.
Real-Time Dashboard
Monitor trace collection, training progress, model performance, and routing decisions in a live dashboard.
Multi-Cloud
Train on Lambda Labs, RunPod, Vast.ai, or your own GPUs. Cloud-agnostic infrastructure provisioning.
Benchmarks
SWE-bench, HumanEval, and custom project-specific benchmarks. Measure real improvement on real tasks.
Safety Guardrails
Harmful content filtering, bias detection, and output validation. Safe models from safe data.
The Ouroboros Flywheel
A self-reinforcing loop: use Claude, capture traces, train your model, deploy it, repeat.
ACT
Use Claude Code normally
VERIFY
Judge trace quality
SYNTHESIZE
Build training data
TRAIN
Fine-tune your model
DEPLOY
Route to your model
REPEAT
Continuously improve
Live Training Monitor
Watch your model improve in real time. Track trace collection, training epochs, loss curves, and deployment status from a single dashboard.
- Trace collection stats and quality scores
- Training progress with loss and metric curves
- Model registry with version comparison
- Routing confidence and fallback rates
- Benchmark results across model versions
Three Steps to Your Own Model
Install Hooks
Install BashGym hooks into Claude Code. Traces are captured automatically as you work.
Use Claude Code Normally
Keep coding as usual. BashGym silently captures, scores, and curates high-quality training data.
Train Your Model
Launch training with one command. BashGym handles data prep, fine-tuning, evaluation, and deployment.
8-Layer Architecture
A modular system from trace capture to API serving.
Works With Your Stack
Start Training Your Own Model
Capture traces today, train your model tomorrow. The flywheel starts with one command.
View on GitHub