Case Notes: Steve Trading Bot

Trading AI operates in a unique environment: milliseconds matter, mistakes cost real money, and the market actively tries to exploit your strategy. Building a trading bot that survives production requires engineering discipline that goes far beyond typical ML deployments.

This post distills lessons from building Steve, an algorithmic trading system. For the full case study, see Steve — Trading Bot.

The trading environment

Trading systems face challenges other AI systems don’t:

Adversarial environment: Other participants are trying to profit from your patterns
Non-stationarity: Markets constantly change — yesterday’s edge becomes today’s loss
Execution risk: The price you see isn’t the price you get
Regulation: Audit trails, risk controls, and compliance are mandatory
Real money: Bugs don’t just cause errors — they cause losses

What we learned

Lesson 1: Backtests lie (creatively)

Backtesting is essential but dangerous. Common traps:

Look-ahead bias: Using information that wasn’t available at decision time
Survivorship bias: Only testing on assets that still exist
Overfitting: Strategies that perfectly fit historical noise
Unrealistic execution: Assuming you can trade any size at any price

We implemented:

Point-in-time data: Strict separation of what was knowable when
Out-of-sample testing: Hold out data never seen during development
Walk-forward analysis: Retrain periodically, test on future data
Realistic execution simulation: Slippage, partial fills, market impact

Lesson 2: Risk controls are the product

The model predicts opportunities. The risk system keeps you alive.

Pre-trade controls:

Position size limits (per trade, per asset, total)
Sector/factor concentration limits
Volatility-adjusted sizing
Correlation-aware portfolio constraints

Intra-trade controls:

Stop losses (hard and trailing)
Time-based exits
Drawdown circuit breakers
Unusual execution pattern detection

Post-trade controls:

P&L reconciliation
Execution quality analysis
Strategy performance attribution
Anomaly detection on results

Lesson 3: Reproducibility is non-negotiable

For auditing, debugging, and improvement:

Deterministic backtests: Same inputs → same outputs, always
Decision logging: Every trade decision with full context
Version control for everything: Models, data, config, code
Immutable audit trails: Append-only logs with timestamps

When something goes wrong at 3 AM, you need to reconstruct exactly what happened and why.

Lesson 4: Latency is a first-class feature

In trading, latency isn’t just UX — it’s edge:

Latency	Impact
Under 10ms	Competitive for time-sensitive signals
10-50ms	Acceptable for medium-frequency strategies
50-500ms	Only viable for longer-term signals
>500ms	Edge likely arbed away before execution

We optimized:

Co-location with exchanges where viable
Pre-computed decision trees for common scenarios
Connection pooling and warm connections
Minimal-copy data paths

Lesson 5: Market regimes change everything

A strategy that works in trending markets fails in ranging markets. We built:

Regime detection: Classifying current market state
Strategy selection: Different models for different regimes
Confidence gating: Reducing size when regime is uncertain
Automatic pause: Stopping trading during regime transitions

This is where trading systems diverge most from other AI — the underlying distribution changes constantly and adversarially.

Metrics snapshot

Typical performance ranges for production trading systems:

Metric	Range
Decision latency	10–50ms
Backtest reproducibility	100% deterministic
Risk limit breaches	0 (hard requirement)
Execution slippage vs. expected	Under 10bps typical
Sharpe ratio (after costs)	0.5–2.0 depending on strategy

Technical architecture

Separation of concerns

Signal generation: Separate service, stateless, versioned
Portfolio construction: Combines signals with risk constraints
Execution: Separate system with own failsafes
Monitoring: Independent watchdog with kill switch authority

No single component can trade without others agreeing.

Failsafe hierarchy

Multiple layers of protection:

Model-level: Confidence thresholds, position limits
Service-level: Rate limits, circuit breakers
System-level: Portfolio-wide risk limits
Infrastructure-level: Network kill switches, external monitoring

Paper trading → Shadow → Live

Deployment progression:

Paper trading: Full simulation, no real orders
Shadow mode: Real signals, compare to live without executing
Small live: Minimal capital, full monitoring
Ramp up: Gradual increase with continuous validation

Each stage must pass before proceeding.

Key takeaways

Backtests are hypotheses, not proofs: Live trading will surprise you
Risk controls > return optimization: Staying in the game matters more than any single trade
Reproducibility enables everything: Debugging, auditing, improvement
Latency is alpha: Every millisecond you waste, someone else captures
Markets are adversarial: Your edge decays as others adapt

Case Notes: Steve Trading Bot

Case Notes: Steve Trading Bot

The trading environment

What we learned

Lesson 1: Backtests lie (creatively)

Lesson 2: Risk controls are the product

Lesson 3: Reproducibility is non-negotiable

Lesson 4: Latency is a first-class feature

Lesson 5: Market regimes change everything

Metrics snapshot

Technical architecture

Separation of concerns

Failsafe hierarchy

Paper trading → Shadow → Live

Key takeaways

Ready to build production AI systems?

Related posts

Related reading