// Practical guide

Machine Learning —
it's not magic,
it's an engineering tool

An honest explanation of how ML actually works and what a business owner needs to know to keep a project from turning into an expensive experiment with no outcome.

87%
of ML projects never reach production
Correlation is not causation
Accuracy isn't the metric — business outcome is

When rules stop working

ML is not needed where a task is formalizable. It appears where rules are impossible to write — because there are too many factors, they change, or some are hidden.

Rules work

Tax rate calculation

IF income > $500k
THEN rate = 37%
ELSE rate = 22%

→ Logic is complete and stable. ML is overkill here.

Rules break

Fraud transaction detector

IF amount > $10,000
THEN fraud?
...but wealthy clients spend more
IF new country
THEN fraud?
...but people go on business trips
IF night AND amount > $5k AND...
...thousands of combos, still misses

→ Rules won't scale. Fraudsters adapt faster.

Hidden factors

Customer churn prediction

IF no purchases in 30 days
THEN churn?
...but seasonal gaps exist
IF support complaints
THEN churn?
...or that's just an active user
Real churn cause
...is not in any single feature

→ You cannot describe in rules what is itself non-linear.

ML is not needed where a task is complex. It is needed where it is fundamentally non-formalizable — because factors interact non-linearly, are hidden, or change over time.

Two ways to create logic

The entire difference between classical programming and ML is where the rules come from. Click the tabs to compare the two approaches.

INPUT Data HAND-WRITTEN Rules OUTPUT Result ↑ developer writes rules manually
Works well when the logic is clear and fully formalizable: tax calculations, deterministic transactions, explicit business conditions.
INPUT Data CORRECT Answers TRAINING Model DISCOVERED Rules ↑ rules are extracted from data automatically
Essential where logic is too complex to write manually: image recognition, user behavior prediction, anomaly detection.

What happens inside a model

A model does not "understand" meaning or seek truth. It works with features — numerical representations of data — and looks for stable patterns.

Features are the model's "eyes"

Instead of abstract concepts, the model sees only numbers. Better features = better predictions.

👤

User

Age, purchase frequency, geography, session time, click history

🖼️

Image

Pixel values, brightness, contrast, CNN-layer vectors

📝

Text

Word frequencies, TF-IDF, embeddings, sentiment, sentence length

📦

Transaction

Amount, time, device, geolocation, payment pattern

📡

Sensor / IoT

Temperature, vibration, current, pressure — time series

🔗

Proxy features

Indirect signals — when you can't measure directly (intelligence, risk, loyalty)

ML almost always works this way: many weak and noisy signals are aggregated into one useful prediction.

Why ML almost never works with "real" features

In real business tasks, what you want to predict is almost never directly measurable. So the model is built on proxies — indirect features that correlate with it. This is one of the key insights separating surface-level ML understanding from practical expertise.

Want to predict
Credit risk
cannot measure directly
Using proxy features
payment history debt load tenure at job employment type zip code age
Proxy risk

Zip code and age are strong predictors, but unfair. The model will reproduce systemic bias if the data contains it.

Want to predict
Customer churn
cannot measure directly
Using proxy features
purchase frequency last visit support contacts NPS scores season acquisition channel
Proxy risk

Silence from a customer can mean loyalty, not churn. No support contacts is a good sign. Feature interpretation is non-obvious.

Want to predict
Fraud
cannot measure directly
Using proxy features
transaction pattern new device time of day geolocation amount merchant category
Proxy risk

A business trip looks like fraud: new country, night, unusual amount. The model sees the pattern but not the context — see our AxisCorePay case.

Proxy features are not a weakness of ML — they are its nature. The team's job is to select features that predict the target and do not reinforce unwanted patterns from historical data.

How a model learns: the training loop

The "magic" of ML is not magic. It is mathematical optimization with a concrete mechanism.

STEP 1 Prediction STEP 2 Error STEP 3 Gradient STEP 4 Update weights repeats thousands of times until error is minimized error = |prediction − correct answer|
After training, a model is just a function. Input → computation → number. There is no "AI" in the inference process. The only difference from regular code: this function was not written by a human — it was found by an algorithm on data.

The output is always a probability

An ML model does not say "this is fraud." It outputs a number — a probability. But even that is not a decision: you need a decision threshold, and the business sets it, not the model. There is no universally "correct" threshold.

Below — three distinct concepts in one simulator: features you control; weights the training found; and the threshold that defines the business decision.

// simulator: fraud transaction detector
Features
supplied at inference time — different for every transaction
Unusually high amount 0.70
New device / location 0.85
Unusual time of day 0.40
Matches known fraud patterns 0.55
×
Weights 🔒 training
found by model on 120,000 transactions — not set by hand
w₁ amount
0.32
w₂ device
0.41
w₃ time
0.18
w₄ pattern
0.27
z = 0.32·0.70 + 0.41·0.85 + 0.18·0.40 + 0.27·0.55 − 0.65 = 0.14
P(fraud) = σ(z) = 1 / (1 + e−z) = 54%
54%
FRAUD PROBABILITY
0% — Not fraud 50% 100% — Definite fraud
Decision threshold — set by the business
No universal standard. A bank, a clinic, and a marketplace will set different thresholds — because the cost of each error type differs.
0% 99%
50%
threshold
probability
0%25%50%75%100%
Fraud caught
of all real fraud cases
False blocks
of legitimate transactions rejected
Approve transaction
probability 54% below threshold 50%

Where ML learns the wrong thing

The most dangerous property of ML: a model can perform perfectly on training data and make completely nonsensical predictions in the real world. This is covered in depth in our Production ML failure modes guide.

Correlation ≠ causation. ML does not seek truth. It seeks stable patterns — and learns them, even when there is no real connection.

Classic example: wolf or husky?

Researchers trained a model to distinguish wolves from huskies. Accuracy was high. But when they investigated — the model had learned something completely different.

🐺
wolf
vs
🐕
husky
Model learns to distinguish:
face shape ear shape coat colour

We assumed the model was learning anatomical features of the animals.

❄️
snow
=
🐺
wolf?
Model learned:
snow in frame → wolf

In the training data, wolves were photographed on snow, huskies on grass. The model learned the background, not the animals.

This is not an algorithmic bug — it is a data consequence. The problem was in the training set. The model did exactly what it was supposed to do. This is directly related to data skew and model drift patterns we see in production systems.

Other dangerous correlations

Expensive watch → premium purchases

The model learns: "expensive watch = premium buyer." There is a correlation, but if you remove the watch — the signal disappears. It is a correlate, not a cause.

🌞

Sunny weather → sales growth

Summer data shows higher sales. The model may "learn" weather as a predictor — when the real cause is seasonality.

👔

Gender as proxy for profession

If historical data shows certain roles held predominantly by men — the model will reinforce that pattern in future predictions.

📍

Zip code as proxy for risk

Credit models may deny people from "bad" neighborhoods — even if the individual is creditworthy.

The cost of different errors varies by business

ML always makes two types of errors. Crucially, you cannot minimize both simultaneously. You must choose which error is more expensive — and that is a business decision, not a technical one.

False Positive — false alarm
Model says "threat", but there is none
Bank / anti-fraud
Legitimate transaction blocked
Frustrated customer calls support. Sometimes switches to a competitor.
tolerable — banks choose strict threshold
Marketing / lead-gen
Loyal customer flagged as "churning"
Resources spent retaining someone who was staying anyway.
cheap — better safe than sorry
False Negative — missed threat
Model says "all clear", but it is not
Bank / anti-fraud
Fraudulent transaction passes
Direct financial loss. Regulatory scrutiny. Reputational damage.
expensive — hence strict threshold
Marketing / lead-gen
Churning customer not identified
Customer leaves — revenue lost. Acquisition cost for a new one is higher.
expensive — hence soft threshold
A bank and a marketer will configure the same model with different thresholds — because for the bank, missing fraud is critical, and for the marketer, missing a churning customer is critical. Same model, different business decisions.

How an ML project is structured

A real ML project is not "train a model." It is an iterative engineering process with seven stages, each of which can become a failure point. See also: why ML systems fail in production.

01

Problem framing

What exactly is being predicted? How is success measured? How does it connect to business metrics?

⚠ Most common failure point
02

Data collection and audit

Is there a signal in the data at all? How representative is it? Is there any leakage of future information into the past?

03

Feature engineering

What does the model "see"? The right features deliver more impact than complex algorithms.

💡 More critical than algorithm choice
04

Model training

Architecture selection, hyperparameter tuning, regularization. An iterative experimental process.

05

Evaluation and validation

Metrics on the held-out set. Error analysis. Bias and fairness checks.

06

Integration into product

API, latency, fallback logic, A/B testing. 70% of effort often goes here.

⚠ Frequently underestimated at the start
07

Monitoring and maintenance

Models degrade. The world changes. Data drift, concept drift, user behavior shifts. ML is a process, not a product.

Where ML projects most often break

// % of failed projects where this factor was present
Incorrect problem framing 67%
Poor or insufficient data 58%
Missing deployment infrastructure 49%
Inflated expectations from stakeholders 41%
Causes are not mutually exclusive — one failed project usually contains several simultaneously. Hence the sum exceeds 100%.

What every stakeholder needs to understand

Seven things that separate a successful ML project from an expensive experiment. These apply equally to Applied AI and trading systems.

01

ML does not guarantee an outcome

It is not a deterministic system. It is a hypothesis tested on data. There is always a risk that the data contains no signal.

02

Data matters more than the model

Poor data yields poor results — regardless of algorithm complexity. Garbage in, garbage out.

03

Features matter more than the algorithm

In most tasks, good feature engineering delivers more impact than switching to a more complex model.

04

The output is a probability

ML says "83% chance of fraud," not "this is fraud." You need to manage thresholds and risks.

05

ML ≠ neural networks

Gradient boosting, decision trees, linear models — often faster, cheaper, and more interpretable.

06

Monitoring is mandatory

Models degrade. The world changes. ML is a process, not a one-shot solution. Quality control after deployment is required.

07

Optimize for the right thing

A model optimizes what you ask it to. Make sure the ML metric aligns with the business goal — this is often non-obvious.

Signs of a successful project

Clear business metric · quality data · iterative approach · integration into processes · post-deployment monitoring

Is ML right for your use case?

Before talking about models and algorithms — answer four questions. If even one answer is "no," an ML project is likely premature.

Question 01
Do you have historical data with the right signal?
Not "we have a customer database" — but specifically: do you have labeled examples of what you want to predict? Churn, fraud, conversions — with known outcomes.
Question 02
Does the task repeat often enough?
ML is justified when the same decision needs to be made thousands or hundreds of thousands of times. A one-off task — however valuable — is not a fit for ML.
Question 03
Is there a measurable success metric?
"Make it smarter" is not a metric. You need a concrete number: reduce churn by X%, catch Y% of fraud at Z% false-positive rate. Without a metric, you cannot train or evaluate.
Question 04
Is some model error acceptable?
ML is probabilistic — it will always be wrong some percentage of the time. If 100% accuracy is required and the outcome must be deterministic, ML is not the right tool.
Answer the questions above
Your readiness result will appear here

Ready to evaluate your ML opportunity?

We run a short audit to determine whether your use case has the signal, data, and business conditions for a successful ML project — and what the realistic timeline and outcome look like.

Let's talk →