Commercial discount strategy: where value is created and where it’s quietly destroyed

A commercial discount strategy supported by machine learning can transform causal pricing decisions.

Most commercial discounts destroy value. Reporting simply fails to show it. When sales increase after a campaign, the conclusion seems obvious: the discount worked. Revenue went up, conversion improved. The dashboard confirms the intuition. This is where a commercial discount strategy often fails — it optimizes for visible revenue instead of incremental margin. But this reasoning rests on the fragile assumption that the customers who received the discount bought more than those who did not.

That’s not the point… The point is to find who are the customers most likely to buy if given a discount. And those who would buy anyway, even without it. With that knowledge, it’s finally possible to direct the efforts of the campaign to where it is more effective.

And what makes this particularly uncomfortable is that the same mistake repeats itself far beyond pricing.

In human resources, a training program is declared successful because performance improved afterwards. But perhaps the most motivated employees were selected to attend. The improvement reflects prior potential, not the training itself.

In finance, a cost-cutting initiative is praised because margins have increased. But what would margins have been without it? Were external market conditions responsible? Or was demand already recovering?

You get the idea: if the observed result of the “business-as-usual” metrics improve, the decision is assumed to be good. No, not necessarily so!

What is missing in all these cases is a causal question. Separating cause and effect from correlation. Separating decision outcomes from decision quality.

Causal analysis begins with a simple but demanding idea: every decision creates two possible realities. One where the action is taken, and one where it is not. We only observe one of them. The other remains hidden.

In pricing, this hidden world is decisive. For any given customer, there are two potential outcomes:

  • The probability of purchase if we offer a discount.
  • The probability of purchase if we do not.

We only ever see one of these outcomes. If we give the discount, we observe whether the customer buys. We cannot simultaneously observe what that same customer would have done at full price.

Causal analysis attempts to estimate that missing scenario. It reframes the pricing question from:

Did sales increase after the discount?

to:

How much did the discount change the probability of purchase for this specific customer?

The interesting part is that solving this does not require exotic data, a new AI platform, or a massive transformation program. Most organizations already have what is needed.

The data is already there

To showcase this approach, I’m collecting data from a standard CRM system into Dataverse. Let’s look at the most interesting data tables.

Customer Purchase Profile

This is the customer master table. It contains one row per customer (cliente_id) with structural and behavioral attributes extracted from the CRM and transaction history.

  • Segment and channel: segmento, canal_preferencial describe the commercial profile (e.g. SMB, Online).
  • Recency & frequency: recencia, frequencia capture purchase recency and intensity.
  • Monetary value: valor_medio reflects average order value.
  • Pricing sensitivity: price_elasticity estimates how responsive the customer is to discounts.
  • Basket behaviour: basket_size_score approximates cross-sell depth.
  • Behavioural signals: purchase_intent, brand_loyalty, promo_fatigue quantify propensity to buy, attachment to the brand, and saturation from promotions.
  • Technical metadata: dataset_version, generated_at allow reproducibility and model traceability.

This is the raw material for causal reasoning. Most of these indicators are not imported from external systems. They are engineered directly from transactional history. Recency is simply the number of days since the last purchase. Frequency counts transactions over a defined period. Average order value is computed from invoices already stored in the CRM.

Even price elasticity is not observed directly. It is inferred from behavior. If a customer consistently purchases when discounts are present and rarely at full price, the data reveals a pattern of price dependency. If purchase behavior remains stable regardless of promotion, the signal points to relative inelasticity. These are not theoretical constructs. They are empirical patterns embedded in historical transactions.

The same applies to behavioral indicators such as brand loyalty or promotion fatigue. Loyalty can be approximated through repeat purchase concentration within product categories or brands. Promotion fatigue emerges when response rates decline after repeated exposure to incentives.

Campaign Impact Record

This table records customer-level exposure to commercial actions. It contains one row per customer per campaign event and defines what, in causal terms, we call the treatment.

  • Customer reference: cliente_id links each record to the customer master.
  • Treatment flag: impactado indicates whether the customer was exposed to the campaign (1) or not (0).
  • Campaign type: tipo_campanha differentiates actions such as discount or bundle.
  • Campaign date: data_campanha anchors the intervention in time.
  • Discount intensity: discount_depth captures the percentage discount applied (e.g. 0.15 = 15%).
  • Technical metadata: dataset_version, generated_at ensure traceability and reproducibility.

This table is more important than it first appears.

Most reporting systems treat campaigns as context. In causal modelling, they become structure. The treatment flag is not just descriptive information, but the pivot around which the entire analysis turns.

The campaign record makes the intervention explicit. It tells us who was exposed, when the exposure occurred, and at what intensity. That temporal anchoring is crucial. It allows us to align behavior before and after the intervention, and to separate pre-existing patterns from potential behavioral shifts.

In practical terms, this table transforms a marketing action into a measurable treatment. Without it, you observe outcomes. With it, you can begin to estimate incremental impact.

Customer Purchase Record

This table records the commercial outcome after campaign exposure. It contains one row per customer per purchase observation and defines the response variable for uplift modelling.

  • Customer reference: cliente_id links each transaction to the customer master.
  • Purchase indicator: comprou is a binary flag (1/0) indicating whether a purchase occurred.
  • Revenue: valor_compra captures transaction value.
  • Margin: margem reflects the gross contribution generated by the purchase.
  • Purchase date: data_compra anchors the outcome in time, allowing alignment with campaign exposure.
  • Technical metadata: dataset_version ensures dataset consistency and reproducibility.

On its own, this table is what most organizations use to evaluate campaign performance. It tells us who bought, how much they spent, and how much margin was generated. From here, dashboards compute conversion rates, revenue uplift and contribution per campaign.

But in causal terms, this table gains meaning only when combined with the treatment record. The purchase indicator tells us what happened whereas the campaign table tells us who was exposed.

Customer Purchase Analysis

This is the decision table generated by the model. It contains one row per customer per model run and stores the analytical output required for operational targeting.

  • Customer reference: Cliente_ID (lookup) links the analysis to the customer profile table. This is the binding key for execution.
  • Primary identifier: cliente_id stores the readable customer code.
  • Treatment and outcome flags: impactado, comprou allow validation against observed campaign exposure and realised purchase.
  • Behavioural context: fields such as frequencia and canal_preferencial provide interpretability for segmentation and filtering.
  • Model versioning: ModelVersion ensures reproducibility and controlled iteration across scoring runs.
  • System metadata: creation and modification fields provide auditability by design.

Up to this point, we have described customers, interventions and outcomes. Here, we introduce estimated counterfactuals. For each customer, the model stores the predicted probability of purchase with and without discount, and the difference between them. That difference is the estimated incremental effect.

Unlike the transactional tables, this table does not record what happened. It records what is expected to happen if we intervene.

Conceptually, this converts prediction into governance. It defines:

  • who is likely to generate incremental margin,
  • who is indifferent to discounting,
  • who would buy anyway,
  • and who might even react negatively.

Because it is stored directly in Dataverse, linked to the customer entity and versioned by model run, it can be used immediately in campaign logic. Target lists are no longer built on past conversion rates alone, but on estimated behavioral change.

Dataverse SDK for Python

Before training the model, the data layer must be in place. The entire pipeline reads from and writes back to Dataverse using the Dataverse SDK for Python.

The first step is configuration. You define the Dataverse environment URL, authenticate via Azure AD (service principal or delegated login), and store credentials securely in environment variables or a .env file. Once configured, the SDK allows the training script to query input tables directly: customers, campaign exposures and outcomes. And later write back the scoring results into the Customer Purchase Analysis table.

This integration is not cosmetic. It ensures that:

  • Training data is pulled from the same operational system used by the business.
  • No manual CSV exports are required.
  • Model outputs are written back into governed tables with versioning and timestamps.

In practical terms, the SDK acts as the bridge between the analytical layer and the operational CRM layer. It allows the model to behave as a native component of the enterprise system rather than a disconnected experiment.

Note that for the purpose of this demo I’m training off Dataverse, at scale I could probably stage the training set in a lakehouse.

In architectural terms: Dataverse is perfect for the operational demo and early production. When volume grows, you stage training data in a lakehouse for cheaper storage, larger history, and heavier feature pipelines while keeping Dataverse as the governed operational store for entities and decisions.

In production, the architecture is intentionally simple. Dataverse remains the system of record for customer profiles, campaign exposure, and purchase outcomes. A scheduled compute job (an Azure Function on a timer) reads those tables, builds the training set for a defined time window, trains and scores an uplift model, and writes the results back into a dedicated analysis table versioned by SnapshotDate and ModelVersion. Power BI then sits on top to monitor two things that revenue dashboards cannot: whether the campaign changed behaviour for the targeted customers, and whether that incremental behaviour created margin. The demo trains directly off Dataverse; at scale, the same training set can be staged into a lakehouse without changing the operational contract.

The machine learning training process

Once data access is configured, the training process follows a structured sequence.

First, the pipeline retrieves three observable datasets from Dataverse: customer attributes, campaign exposure (treatment indicator), and realized purchase outcome. These are merged into a single modelling dataset using the customer identifier as the key.

Crucially, only information available before the campaign decision is used as model input. Post-treatment variables such as realized revenue or margin are excluded. This prevents data leakage and preserves causal validity.

The dataset is then split into training and test partitions. The split is stratified by treatment status to preserve the balance between treated and non-treated customers.

Next, two separate models are trained:

  • one using only customers who were not targeted by the campaign,
  • and another using only those who were exposed.

Each model learns the probability of purchase conditional on the observed features within its respective group.

After training, both models are applied to the hold-out set. For each customer, we estimate:

  • Probability of purchase without treatment
  • Probability of purchase with treatment

The difference between these two probabilities represents the estimated uplift.

Finally, the scoring results

Including uplift estimate, probabilities and model metadata are written back into Dataverse. Each record is tagged with model version and snapshot date to ensure traceability and reproducibility.

At this stage, the system produces not just predictions, but structured, decision-ready outputs embedded directly in the operational data model.

The learning architecture behind the model (read: the algorithm)

The core idea is simple but powerful: we are not trying to predict who will buy.
We are trying to estimate what changes because of the intervention.

To do that, the training process follows a classical T-Learner framework.

Instead of fitting a single predictive model, we train two independent models:

  • One model estimates the probability of purchase for customers who were not exposed to the campaign.
  • A second model estimates the probability of purchase for customers who were exposed.

Each model learns its own conditional response function. In statistical terms, we are approximating:

  • P(Y = 1 | X, T = 0)
  • P(Y = 1 | X, T = 1)

Where:

  • X represents customer features available before the decision
  • T is the treatment indicator
  • Y is the purchase outcome

The uplift is then computed as the difference between these two estimates for each customer:

Uplift(X) = P̂(Y = 1 | X, T = 1) −P̂(Y = 1 | X, T = 0)

This structure allows us to simulate two parallel worlds for every individual:

  • A world where the customer receives the discount.
  • A world where the customer does not.

The difference between these two simulated probabilities is what drives the decision.

Why two models?

Because treatment fundamentally changes the data-generating process. Customers who were exposed to a discount are not statistically identical to those who were not. Targeting bias, behavioral differences and commercial logic all influence who receives treatment. Training separate models allows each response surface to be learned independently, rather than forcing a single model to implicitly infer the interaction.

This makes the approach:

  • More flexible
  • More transparent
  • Easier to diagnose
  • More robust in production

Model choice

The implementation uses a gradient boosting classifier designed for structured tabular data. This type of model is well suited for heterogeneous behavioral signals: frequency, recency, price elasticity proxies, loyalty indicators and segment membership without requiring extensive manual feature engineering.

Boosting models handle:

  • Non-linear interactions
  • Threshold effects
  • Mixed numerical and categorical inputs
  • Moderate class imbalance

Importantly, the output is probabilistic. The model produces calibrated purchase probabilities rather than binary labels. That is essential for uplift estimation.

From prediction to decision

The training process does not end with model fitting.

After scoring the hold-out set, we compute:

  • Estimated probability without treatment
  • Estimated probability with treatment
  • Estimated uplift

These outputs are written back into Dataverse as structured fields, together with model version and snapshot date.

At this point, the system is no longer a machine learning experiment. It becomes a decision layer embedded in the operational CRM.

Commercial teams can now:

  • Target only customers with positive uplift
  • Exclude “Sure Things” who would buy anyway
  • Avoid “Sleeping Dogs” whose probability decreases with discount
  • Rank customers by expected incremental value

This is the moment where modelling stops being descriptive and becomes prescriptive.

The reporting – business as usual

This is how we could begin by designing our Power BI report.

The dashboards above describe the customer base with precision. We know how many customers we have, how they are distributed across segments, which channels dominate, how frequently they purchase, and what the average order value looks like.

We can see that 60% of customers received at least one discount during the year. We can observe stable monthly purchase volumes. We can compare segments and revenue contributions.

Nothing here is wrong, and these dashboards are useful. They provide structural clarity and they help operations, that’s it.

From this view, a campaign that increases revenue appears successful. A segment with higher order value looks more attractive. A stable monthly curve feels reassuring.

Yet nowhere on this page can we see whether the discount altered behavior. Nowhere can we measure whether margin was created or quietly sacrificed.

The model in action

The difference between correlation and causation is financial. When discounts are evaluated through aggregate sales, “success” is almost guaranteed. Sales increase. Customers buy. The campaign looks effective. But once we estimate the counterfactual (what would have happened without the discount) the picture changes. Some customers would have bought anyway. Others would not buy at any price. Only a subset truly changes behavior because of the incentive.

That subset defines incremental value. The uplift segmentation makes this visible. Persuadables generate real incremental demand. Sure Things inflate reported performance. Sleeping Dogs quietly destroy margin. Lost Causes consume budget without return.

The cumulative curve makes the trade-off explicit. Targeting expands. Revenue rises. But incremental margin peaks and then declines. Beyond that point, you are no longer buying growth. You are subsidizing behavior that would have occurred anyway.

This is the uncomfortable conclusion:

Most commercial discounts do not fail because they are badly designed. They fail because they are evaluated incorrectly.

Causal analysis does not optimize campaigns. It restores discipline to decision-making. And once you see value in incremental terms, it becomes very difficult to go back to dashboards that measure activity instead of impact.

Ultimately, a commercial discount strategy only creates value when it is grounded in causal evidence.

Building a robust commercial discount strategy requires more than dashboards — it requires teams that understand causal thinking and data modeling.

That’s why we design tailored training programs in Business Intelligence and applied AI to help companies make better pricing and margin decisions.

Share your love
Nuno Nogueira
Nuno Nogueira
Articles: 33

Leave a Reply

Your email address will not be published. Required fields are marked *