Applied AI Is Not a Web Service
The most common mistake in AI projects? Treating AI like a web service — a stateless endpoint you call and forget. This mental model works for CRUD APIs. It fails catastrophically for AI systems.
Applied AI requires a fundamentally different approach: treating AI as a living system with its own lifecycle, dependencies, and failure modes.
The web service mental model
Traditional web services are relatively simple:
- Stateless: Each request is independent
- Deterministic: Same input produces same output
- Stable: Behavior doesn’t change without deploys
- Well-understood: Debugging follows familiar patterns
You build it, deploy it, monitor response codes and latency, and move on.
Why AI breaks this model
AI systems violate every assumption above:
1. State is everywhere
AI systems depend on:
- Training data (historical state)
- Feature stores (current state)
- Model weights (learned state)
- Context windows (session state)
A “stateless” inference endpoint actually depends on gigabytes of hidden state.
2. Non-determinism is the norm
Even with the same input:
- LLMs produce different outputs based on temperature
- Models trained on different random seeds behave differently
- Feature freshness affects predictions
- A/B test routing creates divergent paths
3. Silent degradation
Web services fail loudly — 500 errors, timeouts, exceptions. AI systems fail quietly:
- Accuracy drops 10% over 3 months
- Edge cases get worse while averages stay stable
- Confidence scores drift without outcome tracking
4. Novel failure modes
AI introduces failure categories that don’t exist for traditional services:
- Data drift
- Concept drift
- Training-serving skew
- Adversarial inputs
- Hallucination (LLMs)
See Production ML Failure Modes for a comprehensive breakdown.
The system mindset shift
To build AI that works, shift from “endpoint” to “system” thinking:
Treat data as a first-class citizen
Data is not input — it’s infrastructure:
- Version your datasets like code
- Monitor data quality with automated tests
- Track lineage from source to prediction
- Build contracts with upstream data providers
Design for observability from day one
You need visibility into:
- Input distributions (are production inputs similar to training?)
- Prediction distributions (is the model behaving normally?)
- Outcome tracking (are predictions actually correct?)
- Pipeline health (is data flowing as expected?)
Plan for the full lifecycle
An AI system is never “done”:
| Phase | Activities |
|---|---|
| Development | Training, evaluation, iteration |
| Deployment | Serving, scaling, integration |
| Monitoring | Drift detection, alerting |
| Maintenance | Retraining, updating, deprecating |
Build feedback loops
Production data is your best training signal:
- Log predictions alongside inputs
- Collect outcome labels when possible
- Build annotation pipelines for edge cases
- Continuously improve through iteration
Domain-specific implications
The “AI is not a web service” principle manifests differently by domain:
Computer Vision
CV systems need:
- Input health monitoring (image quality, lighting, camera calibration)
- Drift detection for visual changes (seasonal, environmental)
- Edge device considerations (latency, connectivity)
Our approach is detailed in Computer Vision in Applied AI.
Trading Systems
Trading bots need:
- Real-time risk controls (not just predictions)
- Reproducible backtests for auditability
- Market regime detection (the market changes constantly)
- Execution quality monitoring (slippage, fill rates)
See Trading Systems & Platforms for our approach.
LLM Applications
LLM systems need:
- Retrieval quality monitoring (RAG performance)
- Cost tracking per request (token usage)
- Safety guardrails (content filtering, prompt injection protection)
- User feedback loops (thumbs up/down, corrections)
What changes in practice
Adopting the system mindset means:
Instead of: “We’ll build an API endpoint that returns predictions”
Think: “We’ll build a system that ingests data, trains models, serves predictions, monitors outcomes, and continuously improves”
Instead of: “The model is deployed, we’re done”
Think: “The model is deployed, now we need to monitor, maintain, and iterate”
Instead of: “Our SLA is 99.9% uptime and sub-200ms latency”
Think: “Our SLA includes prediction accuracy, data freshness, and business outcome targets”
Key takeaways
- AI is a system, not an endpoint: Dependencies, state, and lifecycle matter
- Silent failures are worse than loud failures: Monitor predictions, not just infrastructure
- Data is infrastructure: Treat it with the same rigor as code
- Plan for continuous improvement: Production data is your best training signal
- Own the full lifecycle: Deployment is the beginning, not the end