From Correlation to Causation: Why AI Analytics Needs a Knowledge Graph | Fig Blog

The Correlation Trap

Here's a finding from a hypothetical AI analytics tool: "We detected a strong correlation between your blog traffic and enterprise deal closures. Blog traffic increased 40% in Q3, and enterprise revenue increased 28%."

This sounds like a valuable insight. It might even be true. But it's also potentially dangerous, because it invites a specific conclusion: investing more in blog content will drive more enterprise revenue.

That conclusion might be correct. Or the blog traffic and enterprise revenue might both be driven by a third factor — say, a major product launch in Q3 that generated press coverage (driving blog traffic) and renewed interest from enterprise prospects (driving deal closures). In that case, investing more in blog content without another product launch would produce the traffic without the revenue.

This is the correlation trap, and it's the fundamental limitation of AI analytics tools that operate without a causal model. They can find patterns. They can surface relationships. But they cannot distinguish between a pattern that reflects a real causal mechanism and a pattern that is a statistical coincidence or a confounded relationship.

Why Correlation-Based AI Is So Appealing

The appeal of correlation-based analytics is obvious: it requires no prior knowledge. You point an algorithm at your data, and it finds relationships. No one has to specify what drives what. No domain expertise is required. The algorithm discovers everything on its own.

This is genuinely useful for hypothesis generation. "Hey, we noticed that customers who use Feature X in their first week have 3x higher retention." That's a pattern worth investigating. Maybe Feature X actually drives retention. Maybe Feature X is used by a specific segment that retains well regardless. Maybe Feature X adoption correlates with a level of onboarding engagement that is the real retention driver.

The correlation identifies the question. It doesn't answer it.

The problem arises when correlation-based insights are used to make decisions rather than generate hypotheses. And in practice, that happens constantly. A dashboard shows a correlation. A manager acts on it. No one goes back to test whether the relationship is causal.

What Causation Requires That Correlation Doesn't

Establishing causation requires three things that pure statistical correlation doesn't provide:

1. Direction

Correlation is symmetric — if A correlates with B, then B correlates with A. Causation has a direction. Marketing spend drives lead volume (not the reverse). Lead volume drives pipeline (not the reverse). Getting the direction wrong leads to exactly the wrong intervention.

A knowledge graph explicitly encodes direction. Each edge in the graph has an arrow: Metric A drives Metric B. This direction is established through domain knowledge and validated with data, not inferred from correlation alone.

2. Mechanism

A causal relationship has a mechanism — a logical chain of events through which the cause produces the effect. Marketing spend increases ad impressions, which increase website visits, which increase form submissions, which increase MQLs. Each step in this chain is a testable link.

When an AI system reports that "marketing spend correlates with revenue," it's skipping the entire mechanism. The knowledge graph maps the mechanism explicitly, which means each link can be measured, monitored, and validated independently.

3. Timing

Causes precede effects, and there's a specific lag time between them. If you increase marketing spend today, you don't see a pipeline impact today — you see it in 4-6 weeks. If you change your pricing, the revenue impact materializes over months as new deals close at the new price and existing contracts come up for renewal.

Fig's knowledge graph measures these lag times from your historical data. Each causal edge has a quantified delay: "Changes in MQL volume take approximately 3 weeks to materialize in SQL volume, and an additional 4 weeks to affect Opportunity count." This timing information is essential for accurate forecasting and for correctly attributing effects to causes.

How a Knowledge Graph Enables Causal Analysis

A knowledge graph transforms analytics from pattern matching to causal reasoning. Here's how each capability changes:

Root Cause Analysis

Without a knowledge graph: "Revenue dropped. Here are five metrics that also changed around the same time." (But which change caused the revenue drop? Were any of them effects rather than causes? Were any coincidental?)

With a knowledge graph: "Revenue dropped because Deal Count dropped, because Opportunity Count dropped, because SQL-to-Opportunity Conversion dropped. The conversion drop started on February 3rd and took approximately 2 weeks to fully impact revenue." (Each step in the chain is a validated causal link with a measured lag time.)

Impact Forecasting

Without a knowledge graph: "Based on historical patterns, revenue next quarter will be approximately $X." (But what if you're changing pricing? What if marketing budget is being cut? The forecast doesn't account for planned interventions.)

With a knowledge graph: "Revenue next quarter is projected at $X under current conditions. If marketing budget is reduced by 20%, the MQL impact will begin in 3 weeks, with pipeline impact in 7 weeks and revenue impact in 11 weeks, resulting in a projected revenue of $Y." (The forecast propagates the planned change through measured causal links.)

Anomaly Attribution

Without a knowledge graph: "Support ticket volume spiked 45%. Other metrics that correlate with support tickets include deployment frequency, new user signups, and product updates."

With a knowledge graph: "Support ticket volume spiked 45%. The knowledge graph identifies that product update v3.2.1, released 2 days ago, is the most likely cause based on the established relationship between deployment events and ticket volume, with a measured 1-2 day lag. Concentration analysis shows 80% of the increase is in tickets tagged 'performance,' consistent with the deployment hypothesis."

The Elasticity Difference

One of the most valuable properties encoded in a knowledge graph is elasticity — the measured sensitivity of one metric to changes in another.

Most analytics tools can tell you that marketing spend and pipeline are correlated. Very few can tell you: "A 1% increase in paid search spend historically produces a 0.6% increase in MQL volume, with a 2-week lag, and a resulting 0.4% increase in pipeline, with an additional 4-week lag."

That elasticity number — 0.6% MQL increase per 1% spend increase — is what makes quantitative planning possible. Without it, you're left with qualitative statements: "If we spend more on marketing, we'll probably get more pipeline." With it, you can model specific scenarios with specific expected outcomes.

Elasticities aren't constants. They change with scale (diminishing returns), with market conditions, and with competitive dynamics. Fig continuously recalculates elasticities from your latest data, so the knowledge graph reflects your current business reality rather than a static model built six months ago.

Why Lag Times Matter More Than You Think

Most business metrics have lag times between cause and effect. These lags are not just a measurement nuance — they fundamentally change how you interpret data and make decisions.

Without measured lag times, if you increase marketing spend in January and revenue grows in March, you might conclude that the marketing investment "worked." But if the lag time from marketing spend to revenue impact is 10 weeks, the March revenue growth was actually driven by something that happened in December — before the spend increase. The actual impact of the January spend increase won't show up until mid-March at the earliest.

With measured lag times, you know exactly when to look for the effect. You can correctly attribute the March revenue growth to the actual cause (December activity) and patiently wait for the January investment to show results at the right time.

This is not a theoretical distinction. It's the difference between correctly evaluating a strategy and abandoning a strategy that was actually working because you measured the results at the wrong time.

Building vs. Assuming

The knowledge graph is not assumed. It's built from your data and validated against your business reality.

Fig constructs the knowledge graph through a combination of three inputs:

Domain knowledge: Your team specifies the known causal relationships in your business. Revenue is driven by deal count and average deal size. Pipeline is driven by MQL volume and conversion rates. These are the structural relationships that domain experts know to be true.
Statistical validation: Each specified relationship is tested against your historical data. Fig measures the strength (elasticity), timing (lag), and consistency of each causal link. Relationships that don't hold up in the data are flagged for review.
Continuous refinement: As new data flows in, the knowledge graph's measurements are updated. Elasticities shift. Lag times change. New relationships emerge. The knowledge graph evolves with your business.

This approach combines the rigor of causal modeling with the adaptability of machine learning. The structure is informed by domain knowledge (which prevents the algorithm from "discovering" spurious correlations). The measurements are derived from data (which prevents the model from relying on outdated assumptions).

The Decision-Making Difference

Ultimately, the distinction between correlation and causation matters because decisions require causal reasoning.

When you decide to increase marketing spend, you're making a causal claim: more spend will cause more pipeline. When you decide to change pricing, you're making a causal claim: a different price will cause different revenue behavior. When you decide to invest in customer success, you're making a causal claim: more CS investment will cause lower churn.

If your analytics system only provides correlations, you're making causal decisions based on non-causal evidence. Sometimes you'll get lucky. Sometimes you won't. And you'll have a hard time distinguishing between the two.

A knowledge graph doesn't guarantee correct decisions. But it makes the causal logic explicit, testable, and measurable. When a decision doesn't produce the expected result, you can trace the causal chain and identify which link broke — was the elasticity wrong? Was the lag time different? Was there a confounding factor?

That feedback loop — from hypothesis to measurement to refinement — is what turns analytics from pattern matching into genuine business intelligence. And it requires a causal model. It requires a knowledge graph.