Applied AI Is Not a Web Service: Why AI Projects Fail After Deployment

Applied AI Is Not a Web Service: Why AI Projects Fail After Deployment

Why AI projects fail after deployment: treating AI like a web service. What changes in architecture, monitoring, ownership, and delivery.

The most common mistake in AI projects? Treating AI like a web service — a stateless endpoint you call and forget. This mental model works for CRUD APIs. It fails catastrophically for AI systems.

Applied AI requires a fundamentally different approach: treating AI as a living system with its own lifecycle, dependencies, and failure modes. The teams that understand this ship systems that work. The teams that don’t ship demos that degrade. See concrete examples in MTRobot and Steve Trading Bot.

If you want the broad operational answer to why machine learning models degrade in production, start there. This article explains why the web-service mental model causes those failures in the first place.

DataTrainServeMonitorcontinuous feedback loop
AI lifecycle: data, training, serving, and monitoring feed back into each other. Deployment is not the end.

The web service mental model

Traditional web services are relatively simple from an operational standpoint:

You build it, deploy it, monitor response codes and response time, and move on. SLA is uptime and latency. Done.


Why AI breaks every assumption of the web service model

AI systems violate all four assumptions above — and the violations are not edge cases. They are fundamental properties of how machine learning works.

Web serviceStateless — each request independentDeterministic — same in, same outStable — only changes on deploySLA: uptime + latencyFails loud: 500 errors, exceptionsAI systemStateful — depends on data + featuresNon-deterministic — same in, different outDrifts — degrades without touching codeSLA: accuracy + data freshness + outcomesFails silent: confident wrong answers
Every assumption that works for web services breaks for AI systems.

1. State is everywhere

AI systems depend on:

A “stateless” inference endpoint actually depends on gigabytes of hidden state. When any of that state changes — upstream data source, feature store freshness, model version — the output changes.

Real example: At MTRobot, the trading platform maintains persistent MT5 terminal connections, per-user execution contexts, and real-time position state. None of this is stateless — and treating it as such during design leads to race conditions and stale execution.

2. Non-determinism is the norm

Even with identical input:

3. Silent degradation — the dangerous one

Web services fail loudly: 500 errors, timeouts, stack traces. AI systems fail quietly:

Real example: At Steve Trading Bot, market regime shifts can invalidate a model’s signals without producing any system error. The model continues generating predictions with the same confidence — but the underlying market structure has changed. Without explicit regime detection and monitoring, you don’t know until you’ve taken losses.

This is why model skewing detection is a core MLOps concern, not an optional monitoring add-on. It is one concrete slice of the broader problem of why models fail after deployment.

4. Novel failure modes with no web service equivalent

AI introduces failure categories that simply don’t exist for traditional services:

See why ML models degrade in production for a comprehensive breakdown of each.


The hidden costs of treating AI as a web service

When teams apply the web service mental model to AI, they pay these costs downstream:

Technical debt compounds faster. Hardcoded preprocessing in serving code diverges from training code. Schema changes in upstream data sources aren’t communicated as breaking changes. Every “quick fix” in the serving layer creates another potential training-serving skew source.

Retraining becomes expensive and risky. Without a clear data versioning strategy, retraining requires reconstructing the exact training dataset — which may no longer be possible if data sources have changed. Teams that don’t version training data eventually can’t reproduce past model behavior.

Monitoring overhead grows without structure. Adding monitoring reactively — after each incident — produces a pile of ad-hoc alerts with no coherent model health picture. Proactive monitoring architecture costs 1x to build. Reactive incident-driven monitoring costs 5-10x over time.

Rollback is harder than a web service. Rolling back a web service means deploying the previous Docker image. Rolling back an ML model means reverting model weights AND ensuring the data pipeline produces the same feature values the old model expects. Without versioned feature stores, this is painful.


The system mindset shift

To build AI that works in production, shift from “endpoint” to “system” thinking.

Treat data as infrastructure

Data is not input — it’s infrastructure:

Practical example from AgrigateVision: camera hardware is a data dependency. When the vendor pushed a firmware update that changed image preprocessing, the model’s input distribution shifted without any code change on our end. A contract with the hardware vendor and automated input monitoring would have caught it before it caused a pipeline failure.

Design for observability from day one

You need visibility into four layers:

  1. Input distributions: Are production inputs similar to training?
  2. Prediction distributions: Is the model behaving normally?
  3. Outcome tracking: Are predictions actually correct? (Requires ground truth labels)
  4. Pipeline health: Is data flowing with expected freshness and completeness?

Plan for the full lifecycle

An AI system is never “done”:

PhaseActivities
DevelopmentTraining, evaluation, iteration
DeploymentServing, scaling, integration
MonitoringDrift detection, alerting, segment analysis
MaintenanceRetraining, updating, deprecating

The maintenance phase has no defined end. This is the most important difference from a web service — budgeting, staffing, and architecture must account for it.

Build feedback loops

Production data is your best training signal:


Domain-specific implications

The “AI is not a web service” principle manifests differently by domain:

Computer Vision

CV systems need:

Our approach is detailed in Computer Vision in Applied AI.

Trading Systems

Trading bots need:

See Trading Systems & Platforms for our approach.

LLM Applications

LLM systems need:


What changes in practice

Instead of: “We’ll build an API endpoint that returns predictions”

Think: “We’ll build a system that ingests data, trains models, serves predictions, monitors outcomes, and continuously improves”

Instead of: “The model is deployed, we’re done”

Think: “The model is deployed. Now we need to monitor, maintain, and iterate — indefinitely”

Instead of: “Our SLA is 99.9% uptime and sub-200ms response time”

Think: “Our SLA includes prediction accuracy, data freshness, segment-level performance, and business outcome targets”

Key takeaways
  • AI is a system, not an endpoint. State, non-determinism, and silent degradation are fundamental properties.
  • Silent failures are worse than loud failures. Monitor predictions and outcomes, not just infrastructure.
  • Data is infrastructure. Version it, contract it, monitor it — with the same rigor as code.
  • Deployment is the beginning, not the end. Budget for monitoring, retraining, and maintenance — they never stop.

Frequently asked questions

Why do AI projects fail after deployment? The most common reasons: no monitoring for data drift or model degradation (so failures go undetected for weeks), training-serving skew (model sees different features in production than in training), no ownership clarity (nobody knows who to page when accuracy drops), and treating retraining as a one-time event. The underlying cause is applying web service assumptions to a system that violates all of them.

What is the difference between an AI system and a web service? A web service is stateless, deterministic, and stable — it only changes when you deploy new code. An AI system is stateful (depends on training data and feature stores), non-deterministic (same input can produce different outputs), and degrades silently over time as input distributions shift. These differences require a fundamentally different operational approach.

What is production ML monitoring? Production ML monitoring is the practice of tracking model health across four layers: input distributions (are inputs similar to training?), prediction distributions (is the model behaving normally?), outcome tracking (are predictions correct?), and business outcomes (are the metrics that matter improving?). Infrastructure monitoring (CPU, latency) is necessary but not sufficient.

What is training-serving skew in machine learning? Training-serving skew is when the model sees different data at inference time than it did during training — due to different preprocessing code, different feature versions, or different null handling. It’s one of the most common causes of production ML failures and is often invisible until you compare training feature distributions to live feature distributions.

How is AI delivery different from traditional software delivery? Traditional software delivery has a defined end state — you ship a version, it either works or it doesn’t. AI delivery is continuous: models degrade, data drifts, and retraining is an ongoing operational requirement. Success criteria include not just initial accuracy but sustained accuracy over time. Teams need data engineering, ML engineering, and platform capabilities working in coordination.


Ready to build production AI systems?

We help teams ship AI that works in the real world. Let's discuss your project.

Related posts

Related reading