Case Notes: Steve Trading Bot
Trading AI operates in a unique environment: milliseconds matter, mistakes cost real money, and the market actively tries to exploit your strategy. Building a trading bot that survives production requires engineering discipline that goes far beyond typical ML deployments.
This post distills lessons from building Steve, an algorithmic trading system. For the full case study, see Steve — Trading Bot.
The trading environment
Trading systems face challenges other AI systems don’t:
- Adversarial environment: Other participants are trying to profit from your patterns
- Non-stationarity: Markets constantly change — yesterday’s edge becomes today’s loss
- Execution risk: The price you see isn’t the price you get
- Regulation: Audit trails, risk controls, and compliance are mandatory
- Real money: Bugs don’t just cause errors — they cause losses
What we learned
Lesson 1: Backtests lie (creatively)
Backtesting is essential but dangerous. Common traps:
- Look-ahead bias: Using information that wasn’t available at decision time
- Survivorship bias: Only testing on assets that still exist
- Overfitting: Strategies that perfectly fit historical noise
- Unrealistic execution: Assuming you can trade any size at any price
We implemented:
- Point-in-time data: Strict separation of what was knowable when
- Out-of-sample testing: Hold out data never seen during development
- Walk-forward analysis: Retrain periodically, test on future data
- Realistic execution simulation: Slippage, partial fills, market impact
Lesson 2: Risk controls are the product
The model predicts opportunities. The risk system keeps you alive.
Pre-trade controls:
- Position size limits (per trade, per asset, total)
- Sector/factor concentration limits
- Volatility-adjusted sizing
- Correlation-aware portfolio constraints
Intra-trade controls:
- Stop losses (hard and trailing)
- Time-based exits
- Drawdown circuit breakers
- Unusual execution pattern detection
Post-trade controls:
- P&L reconciliation
- Execution quality analysis
- Strategy performance attribution
- Anomaly detection on results
Lesson 3: Reproducibility is non-negotiable
For auditing, debugging, and improvement:
- Deterministic backtests: Same inputs → same outputs, always
- Decision logging: Every trade decision with full context
- Version control for everything: Models, data, config, code
- Immutable audit trails: Append-only logs with timestamps
When something goes wrong at 3 AM, you need to reconstruct exactly what happened and why.
Lesson 4: Latency is a first-class feature
In trading, latency isn’t just UX — it’s edge:
| Latency | Impact |
|---|---|
| Under 10ms | Competitive for time-sensitive signals |
| 10-50ms | Acceptable for medium-frequency strategies |
| 50-500ms | Only viable for longer-term signals |
| >500ms | Edge likely arbed away before execution |
We optimized:
- Co-location with exchanges where viable
- Pre-computed decision trees for common scenarios
- Connection pooling and warm connections
- Minimal-copy data paths
Lesson 5: Market regimes change everything
A strategy that works in trending markets fails in ranging markets. We built:
- Regime detection: Classifying current market state
- Strategy selection: Different models for different regimes
- Confidence gating: Reducing size when regime is uncertain
- Automatic pause: Stopping trading during regime transitions
This is where trading systems diverge most from other AI — the underlying distribution changes constantly and adversarially.
Metrics snapshot
Typical performance ranges for production trading systems:
| Metric | Range |
|---|---|
| Decision latency | 10–50ms |
| Backtest reproducibility | 100% deterministic |
| Risk limit breaches | 0 (hard requirement) |
| Execution slippage vs. expected | Under 10bps typical |
| Sharpe ratio (after costs) | 0.5–2.0 depending on strategy |
Technical architecture
Separation of concerns
- Signal generation: Separate service, stateless, versioned
- Portfolio construction: Combines signals with risk constraints
- Execution: Separate system with own failsafes
- Monitoring: Independent watchdog with kill switch authority
No single component can trade without others agreeing.
Failsafe hierarchy
Multiple layers of protection:
- Model-level: Confidence thresholds, position limits
- Service-level: Rate limits, circuit breakers
- System-level: Portfolio-wide risk limits
- Infrastructure-level: Network kill switches, external monitoring
Paper trading → Shadow → Live
Deployment progression:
- Paper trading: Full simulation, no real orders
- Shadow mode: Real signals, compare to live without executing
- Small live: Minimal capital, full monitoring
- Ramp up: Gradual increase with continuous validation
Each stage must pass before proceeding.
Key takeaways
- Backtests are hypotheses, not proofs: Live trading will surprise you
- Risk controls > return optimization: Staying in the game matters more than any single trade
- Reproducibility enables everything: Debugging, auditing, improvement
- Latency is alpha: Every millisecond you waste, someone else captures
- Markets are adversarial: Your edge decays as others adapt