Case Notes: MobilEA

Hybrid AI systems — those combining computer vision, optimization algorithms, and real-time decision-making — represent some of the most complex production deployments. They also fail in the most interesting ways.

MobilEA integrated multiple AI components into a unified mobility workflow. This post captures the lessons learned. For the full context, see the MobilEA case study.

The integration challenge

A mobility system is only as reliable as its weakest component. When you combine:

Computer vision for vehicle/asset detection
Optimization algorithms for route planning
Real-time orchestration for live coordination
User interfaces for operator interaction

…you create a system where failures cascade in unexpected ways.

What we learned

Lesson 1: Interface contracts are non-negotiable

The seams between components are where hybrid systems fail. A CV model that outputs bounding boxes expects downstream consumers to handle those boxes consistently. When assumptions differ:

CV outputs pixel coordinates; optimization expects GPS
Detection confidence scales differ between models
Timestamp formats vary between systems
Missing data handling is inconsistent

We learned to define explicit interface contracts:

Schemas for every data exchange
Validation at every boundary
Clear error handling for malformed data
Version management for evolving interfaces

This is a recurring Applied AI pattern — systems fail at boundaries.

Lesson 2: Orchestration complexity explodes

With multiple AI components, orchestration becomes its own problem:

Dependency management: Which components need results from others?
Timeout handling: What happens when one component is slow?
Partial failures: How do you proceed when only some components succeed?
State management: Where is the source of truth?

We implemented:

Explicit dependency graphs between components
Timeout budgets per component with fallback behaviors
Partial result handling with degraded but functional outputs
Event sourcing for complete state reconstruction

Lesson 3: End-to-end latency is the constraint

Individual component latency looked good:

CV inference: 150ms
Optimization solver: 200ms
API calls: 50ms each

But end-to-end paths compound:

Sequential processing: 150 + 200 + 50 + 50 + 50 = 500ms
Add network variability: P95 jumped to 1.2s
Under load: P99 exceeded 3s

Sub-second UX required:

Parallelizing independent operations
Aggressive caching of intermediate results
Speculative computation for likely scenarios
Streaming partial results to UI

Lesson 4: Monitoring needs to be holistic

Component-level monitoring wasn’t enough:

Each component showed green
End-to-end user experience was poor
Root cause wasn’t in any single component

We added:

End-to-end transaction tracing (distributed tracing)
Cross-component correlation IDs
Business outcome monitoring (successful trips, not just API calls)
SLOs defined at user journey level, not component level

Metrics snapshot

Typical performance ranges for production mobility orchestration:

Metric	Range
Decision latency (critical paths)	Under 1 second
Orchestration service availability	99.5–99.9%
End-to-end success rate	>95% of initiated operations
Fallback activation rate	Under 5% of requests

Technical architecture patterns

Pattern 1: Circuit breakers everywhere

When a downstream component fails:

Don’t retry forever (cascade failures)
Open the circuit (fail fast)
Provide degraded alternatives
Close gradually as health returns

Pattern 2: Bulkhead isolation

Separate components into isolated pools:

Slow CV processing doesn’t block optimization
Failed optimization doesn’t prevent basic routing
User operations have reserved capacity

Pattern 3: Eventual consistency with optimistic UI

For better UX:

Show optimistic results immediately
Validate and correct in background
Handle conflicts gracefully
Communicate corrections clearly to users

Pattern 4: Shadow mode for new components

Before production rollout:

Run new components in shadow mode
Compare outputs to production system
Measure accuracy and latency in real conditions
Gradually shift traffic as confidence grows

Key takeaways

Interface contracts prevent integration failures: Define, validate, version
Orchestration is a first-class concern: Not just “glue code”
Optimize for end-to-end latency: Component latency is misleading
Monitor user journeys, not just components: The user doesn’t care which component failed
Design for graceful degradation: Partial function beats total failure

Case Notes: MobilEA

Case Notes: MobilEA

The integration challenge

What we learned

Lesson 1: Interface contracts are non-negotiable

Lesson 2: Orchestration complexity explodes

Lesson 3: End-to-end latency is the constraint

Lesson 4: Monitoring needs to be holistic

Metrics snapshot

Technical architecture patterns

Pattern 1: Circuit breakers everywhere

Pattern 2: Bulkhead isolation

Pattern 3: Eventual consistency with optimistic UI

Pattern 4: Shadow mode for new components

Key takeaways

Ready to build production AI systems?

Related posts

Related reading