Applied AI Is Not a Web Service

The most common mistake in AI projects? Treating AI like a web service — a stateless endpoint you call and forget. This mental model works for CRUD APIs. It fails catastrophically for AI systems.

Applied AI requires a fundamentally different approach: treating AI as a living system with its own lifecycle, dependencies, and failure modes.

The web service mental model

Traditional web services are relatively simple:

Stateless: Each request is independent
Deterministic: Same input produces same output
Stable: Behavior doesn’t change without deploys
Well-understood: Debugging follows familiar patterns

You build it, deploy it, monitor response codes and latency, and move on.

Why AI breaks this model

AI systems violate every assumption above:

1. State is everywhere

AI systems depend on:

Training data (historical state)
Feature stores (current state)
Model weights (learned state)
Context windows (session state)

A “stateless” inference endpoint actually depends on gigabytes of hidden state.

2. Non-determinism is the norm

Even with the same input:

LLMs produce different outputs based on temperature
Models trained on different random seeds behave differently
Feature freshness affects predictions
A/B test routing creates divergent paths

3. Silent degradation

Web services fail loudly — 500 errors, timeouts, exceptions. AI systems fail quietly:

Accuracy drops 10% over 3 months
Edge cases get worse while averages stay stable
Confidence scores drift without outcome tracking

4. Novel failure modes

AI introduces failure categories that don’t exist for traditional services:

Data drift
Concept drift
Training-serving skew
Adversarial inputs
Hallucination (LLMs)

See Production ML Failure Modes for a comprehensive breakdown.

The system mindset shift

To build AI that works, shift from “endpoint” to “system” thinking:

Treat data as a first-class citizen

Data is not input — it’s infrastructure:

Version your datasets like code
Monitor data quality with automated tests
Track lineage from source to prediction
Build contracts with upstream data providers

Design for observability from day one

You need visibility into:

Input distributions (are production inputs similar to training?)
Prediction distributions (is the model behaving normally?)
Outcome tracking (are predictions actually correct?)
Pipeline health (is data flowing as expected?)

Plan for the full lifecycle

An AI system is never “done”:

Phase	Activities
Development	Training, evaluation, iteration
Deployment	Serving, scaling, integration
Monitoring	Drift detection, alerting
Maintenance	Retraining, updating, deprecating

Build feedback loops

Production data is your best training signal:

Log predictions alongside inputs
Collect outcome labels when possible
Build annotation pipelines for edge cases
Continuously improve through iteration

Domain-specific implications

The “AI is not a web service” principle manifests differently by domain:

Computer Vision

CV systems need:

Input health monitoring (image quality, lighting, camera calibration)
Drift detection for visual changes (seasonal, environmental)
Edge device considerations (latency, connectivity)

Our approach is detailed in Computer Vision in Applied AI.

Trading Systems

Trading bots need:

Real-time risk controls (not just predictions)
Reproducible backtests for auditability
Market regime detection (the market changes constantly)
Execution quality monitoring (slippage, fill rates)

See Trading Systems & Platforms for our approach.

LLM Applications

LLM systems need:

Retrieval quality monitoring (RAG performance)
Cost tracking per request (token usage)
Safety guardrails (content filtering, prompt injection protection)
User feedback loops (thumbs up/down, corrections)

What changes in practice

Adopting the system mindset means:

Instead of: “We’ll build an API endpoint that returns predictions”

Think: “We’ll build a system that ingests data, trains models, serves predictions, monitors outcomes, and continuously improves”

Instead of: “The model is deployed, we’re done”

Think: “The model is deployed, now we need to monitor, maintain, and iterate”

Instead of: “Our SLA is 99.9% uptime and sub-200ms latency”

Think: “Our SLA includes prediction accuracy, data freshness, and business outcome targets”

Key takeaways

AI is a system, not an endpoint: Dependencies, state, and lifecycle matter
Silent failures are worse than loud failures: Monitor predictions, not just infrastructure
Data is infrastructure: Treat it with the same rigor as code
Plan for continuous improvement: Production data is your best training signal
Own the full lifecycle: Deployment is the beginning, not the end

Applied AI Is Not a Web Service

Applied AI Is Not a Web Service

The web service mental model

Why AI breaks this model

1. State is everywhere

2. Non-determinism is the norm

3. Silent degradation

4. Novel failure modes

The system mindset shift

Treat data as a first-class citizen

Design for observability from day one

Plan for the full lifecycle

Build feedback loops

Domain-specific implications

Computer Vision

Trading Systems

LLM Applications

What changes in practice

Key takeaways

Ready to build production AI systems?

Related posts

Related reading