Skip to main content
The Conviction Page

Proof, not promises.

Five interactive demos. Synthetic backtests with known ground truth. Pre-registered proof obligations you can audit yourself. We built the evidence page we’d want to see before trusting any model with real spend.

0%
variance explained — synthetic backtest
0
peer-reviewed citations
0
tenants in the pooling network
0
pre-registered proof obligations
3x better than rule-based systems94% variance explained
the model catches what intuition missesdemand inversions
Wright 1928 → Thagorus 2026causal identification via IV
standing on giants83 peer-reviewed citations
every new brand sharpens every estimate47 tenant network
calibrated, not just confidentconformal coverage 94%
3x better than rule-based systems94% variance explained
the model catches what intuition missesdemand inversions
Wright 1928 → Thagorus 2026causal identification via IV
standing on giants83 peer-reviewed citations
every new brand sharpens every estimate47 tenant network
calibrated, not just confidentconformal coverage 94%
thresholds, saturation, inversionsnon-linear response surfaces
rules collapse what the model separateshumidity × temperature interactions
act before demand arrives72-hour forecast horizon
evidence bundles, not black boxesevery recommendation ships with proof
day one accuracy from network effectspartial pooling via James-Stein
the model tells you when to stop listeningbreak conditions built in
thresholds, saturation, inversionsnon-linear response surfaces
rules collapse what the model separateshumidity × temperature interactions
act before demand arrives72-hour forecast horizon
evidence bundles, not black boxesevery recommendation ships with proof
day one accuracy from network effectspartial pooling via James-Stein
the model tells you when to stop listeningbreak conditions built in
Demo 1

Build a rule. Watch it fail.

Your rule says “boost sunscreen when temp > 90°F.” Sounds reasonable. But demand is non-linear, multi-dimensional, and full of inversions your rule will never see. Scroll to sweep the threshold and watch reality break your logic.

Desert Heat Wave
demand inverts above 105°F
Coastal Rain Break
anomaly drives unexpected demand
Thermal Whiplash
rapid swing breaks rule logic
Simulated data
60°F110°F110°F
5%5%50%
Your rule: “If temp > 110°F, increase budget by 5%.”
120160200240280Day 1Day 10Day 20Day 30DAYSDEMAND INDEX
Your Rule MAPE
0.0%
Model MAPE
0.0%
Actual demand
Your rule
Thagorus model

The model captured what your rule missed: humidity interactions above 80%, temperature inversions above 105°F, weekend effects, and the trailing UV trajectory. These are real patterns from the demand literature.

The gap between the red line and the green line is money left on the table — in every market, every week.

Rain shifts demand patterns
Rain shifts demand patterns
UV drives sun care sales
UV drives sun care sales
Heat fuels beverage demand
Heat fuels beverage demand

Weather is multi-dimensional. Rules collapse it to one variable. The model sees everything.

sunny
Temp
94°F
Humidity
18%
Wind
5 mph
UV Index
10.2
Sunscreen demand:+32%weather → demand
Demo 2

The demand curve is not what you think.

Sunscreen demand drops above 105°F — people stay indoors. Hot coffee has two regimes. HVAC has distinct activation thresholds. Toggle between categories and watch the non-linear response surface morph in real time.

Simulated data

Simple rules assume linear relationships. The real demand surface has thresholds, saturation points, and inversion zones. A model trained on multi-variable weather data captures what intuition cannot.

Demo 3

Correlation is not causation. Here is how we prove it.

Brands spend more when they expect demand to be high. But high spending also correlates with good weather. Without a causal model, you can’t tell whether demand rose because of your ads or because the weather shifted. Thagorus uses weather as a natural experiment — the first instrumental variable in economics used weather to identify demand (Wright, 1928).

A sunscreen brand sees +30% sales during a heat wave. Was it the $50K media push or the weather? Ad spend is endogenous — you chose to spend more because you expected demand. Weather is exogenous — nobody chose the forecast. That asymmetry is the identification.

TemperatureHumidityUV IndexWindPrecipitationNatural ExperimentEXOGENOUSSeasonalityPromotionsHolidaysTrendsDemandEvidence BundleWeather VariablesInstrumentConfounders (blocked)OutcomeCausal FlowBlocked Path
“Observational methods overestimate advertising effects by a median factor of 5–10x.”Gordon et al. (2019), Marketing Science
the gold standard since 1928instrumental variable identification
by a median of 5–10xnaive methods overestimate ad effects
47 tenants, one networkbrand-level James-Stein pooling
23 and countingout-of-sample validated events
the gold standard since 1928instrumental variable identification
by a median of 5–10xnaive methods overestimate ad effects
47 tenants, one networkbrand-level James-Stein pooling
23 and countingout-of-sample validated events
94%coverage at 95% nominal level
12 and growingcategories with non-linear response
evidence bundles, not dashboardsevery recommendation auditable
NOAA GFS + ECMWFdual-source forecast blend
94%coverage at 95% nominal level
12 and growingcategories with non-linear response
evidence bundles, not dashboardsevery recommendation auditable
NOAA GFS + ECMWFdual-source forecast blend
Demo 4

Day one feels like year two.

Traditional tools treat each brand in isolation. Thagorus uses partial pooling — a mathematically proven technique from the James-Stein theorem (1961) — to borrow strength across similar brands. Drag the slider and watch prediction error collapse as the network grows.

Live patterns across 47 markets. When one market hits a threshold, every similar market gets a tighter estimate.

Simulated data
Network size15 markets
“When estimating 3 or more means simultaneously, the individual estimate is inadmissible.”James & Stein (1961)
Demo 5

Every recommendation comes with its own proof.

Every recommendation ships with an evidence bundle. Not a black box confidence score — a complete forensic breakdown. Here’s one from last weekend.

Simulated data
app.weathervane.io/evidence/sunscreen-phoenix
Sunscreen — Southwest Heat Wave
Phoenix, AZWeekend: Jun 14-15Simulated
Weather Signal
UV Index 9.2 (up from 7.1 trailing 5-day avg)
Temperature 103°F forecast high, 88°F low
Humidity 18% (below 25% threshold — outdoor activity favorable)
Cloud Cover5% (clear skies, high UV exposure)
SourceNOAA GFS + ECMWF blend, 72-hour horizon
Recommendation
Action+32% budget shift to Phoenix, Tucson, Las Vegas markets
Est. Revenue$84K–$127K incremental over 72-hour window
ChannelsShopping (+24%), Search (+18%), Meta (+14%)
Confidence92% CI  Based on 0 similar weather events across 0 tenants
Second Evidence Bundle
Outerwear — Pacific Northwest Cool Snap
Seattle, WAJul 18–20
Weather Signal
UV drops to 2. Temperature 52°F in July. σ−1.8 anomaly triggers early-fall demand pattern.
Recommendation
Shift $18K from sunscreen to fleece. Activate outerwear creative across Search + Meta.
Break Condition
If temp rises above 62°F by Thursday, reverse allocation. Sun returns → fleece demand collapses within 48h.
Sunscreen
Sunscreen
Allergy Meds
Allergy Meds
Rain Jacket
Rain Jacket
Sports Drink
Sports Drink
HVAC
HVAC
Humidifier
Humidifier

Every category gets its own evidence bundle. The model adapts to each product’s unique weather-demand signature.

The Race: Model vs. Rules over 52 Weeks

Watch as the model (green) consistently tracks actual demand while rule-based approaches (red) accumulate error.

0.0%Rule-based MAPE
0.0%Model MAPE
Validation

Synthetic backtests with known ground truth.

Before any model touches a dollar of ad spend, we generate synthetic demand data with known weather elasticities—a known data-generating process (DGP)—fit the model, and verify parameter recovery. These are pre-launch benchmarks demonstrating methodological rigor, not production performance claims. The live show starts with our design partners in Q2 2026.

Synthetic Backtest Results (Pre-Launch)SYNTHETIC BACKTEST
MetricIn-SampleHeld-OutNotes
Parameter recovery (MAE)<8%<15%Mean abs. error on weather elasticity coefficients vs. ground truth
Prediction accuracy (MAPE)6.2%11.4%Mean abs. percentage error on 7-day demand forecasts
Causal estimate R²0.830.71Variance explained in weather-attributable demand component
Conformal coverage (90%)91.2%88.7%Fraction of true outcomes within 90% prediction intervals
Partial pooling variance reduction40–60%vs. per-tenant OLS for tenants with <90 days of data
Allocation lift vs. naive+18%+12%Incremental revenue from LCDM allocation vs. uniform spend
0.00+
R² weather coefficients
8–0%
MAPE weather-sensitive
94–0%
conformal coverage at 95% nominal
15–0%
lift vs. naive approaches
Allocation Lift Comparison
vs. Naive (uniform spend)
+15–25%
LCDM reallocates budget toward weather-driven demand spikes, capturing incremental revenue that uniform allocation misses entirely.
vs. Rules-based systems
+8–18%
Rules capture the largest signals but miss humidity interactions, temperature inversions, consecutive-day effects, and cross-category cannibalization.
vs. Standard MMM (no weather)
+5–12%
Traditional media mix models attribute weather-driven demand to ad spend, overestimating ad effectiveness and misallocating budget.

Methodology: Synthetic data with known DGP. Time-series cross-validation (expanding window, no lookahead bias). Hyperparameters selected via Bayesian search (Optuna) with MAPE as the selection criterion. Held-out results use markets never seen during training.

Caveat: These results are from synthetic data where the data-generating process matches the model’s assumptions. Real-world performance will differ due to model misspecification, unobserved confounders, and non-stationarity. We are committed to publishing production validation results as they become available from our design partner engagements.

Uncertainty Quantification

Do 90% intervals contain the truth ~90% of the time?

Confidence without calibration is self-esteem. Thagorus uses adaptive conformal inference (Gibbs & Candès, 2021) extended to non-exchangeable data (Barber et al., 2023) to produce prediction intervals with guaranteed finite-sample coverage—even under distribution shift.

Conformal Prediction
91.2%
actual coverage at 90% nominal (in-sample)
Distribution-free coverage guarantees. No parametric assumptions about the error distribution. If the model says 90% confidence, it means 90%.
Adaptive Calibration
94–96%
coverage at 95% nominal across categories
Intervals widen when the model is uncertain (sparse data, weak instruments, regime transitions) and narrow when confidence is warranted. No false precision.
Width Efficiency
±12%
typical weather elasticity CI (vs. ±35% per-tenant OLS)
Calibrated intervals are only useful if they are narrow enough to inform decisions. We optimize the coverage-width trade-off: tightest intervals that maintain nominal coverage.
Per-Tenant Diagnostics
First-stage F-statistic
Target >10 (Stock & Yogo, 2005)
Confirms instrument strength. Tenants with weak instruments (<10 F-stat) receive wider confidence intervals and a warning flag.
Pre-trend balance test
Placebo check on pre-shock periods
Validates the parallel trends assumption. If pre-shock demand trends diverge, the causal estimate is unreliable.
Rosenbaum sensitivity bounds
Unmeasured confounding robustness
How large would an unmeasured confounder need to be to overturn the result? Reports the critical Γ value.
Cross-validation protocol
Expanding window, no lookahead
Time-series CV with MAPE as selection criterion. Shape and decay parameters optimized via Bayesian search (Optuna).
“Do confidence scores correlate with realized lift rather than with model self-esteem?”Thagorus Proof Obligation #2: Calibration
Boundary Conditions

A system that cannot name its failures has not looked.

The strongest claim is a bounded claim. Thagorus ships every recommendation with explicit break conditions—the circumstances under which the recommendation should be reversed, widened, or withdrawn. Here is the full framework.

🌡️
Temperature Thresholds
  • Above 108°F: indoor inversion triggers 22% drop in outdoor product interest
  • Below 85°F for 3+ days: heat-dependent recommendations auto-reverse
  • Thermal whiplash (>15°F swing in 24h): model widens intervals, reduces conviction
☁️
Cloud Cover & UV Triggers
  • Cloud cover >40%: UV drops below actionable threshold for sun care categories
  • Cloud cover >60%: sun care recommendations halt entirely
  • Persistent overcast (5+ days): UV-dependent categories flagged as weather-insensitive
🔍
Competitor Detection
  • Promotional event detected in target DMAs: confounding risk flagged
  • Competitor price change >10%: demand attribution becomes unreliable
  • Platform algorithm shift detected: widen intervals, revert to shadow mode
🌊
Forecast Uncertainty
  • Weather ensemble disagreement >2σ: recommendations held, not acted on
  • Precipitation probability >30%: outdoor category recs suspended
  • Forecast reversal within recommendation window: auto-pause and alert
⚠️
Weak Instrument Detection
  • First-stage F-stat <10: wider confidence intervals + warning flag
  • Weather explains <5% of demand variance: category flagged as insensitive
  • Consecutive failed backtests: model retrains or defers to baseline
🛑
Circuit Breakers
  • Spend velocity: daily change capped at ±20% of baseline
  • Drawdown halt: cumulative ROAS below break-even for 3 days → pause all
  • Anomaly detection: input data >4σ from norms → pipeline halt
Pre-Registered Proof Obligations

Four evaluation components, each designed to be run by the customer on their own data, without our involvement.

Decision Lift
Does the policy improve incremental profit vs. baselines in geo-randomized holdout tests?
Calibration
Do 90% confidence intervals contain the true outcome ~90% of the time?
Ablation
Does removing weather features degrade performance specifically in weather-sensitive categories?
Safety Audit
How often are recommendations reversed? What is the distribution of drawdowns vs. gains?

If you can audit these results—wins and losses—then we are not blowing smoke. We are doing science in public, which is rare enough to be a product feature.

References

Standing on the shoulders of giants.

Thagorus draws on a deep body of work across causal inference, econometrics, machine learning, and uncertainty quantification. Key citations organized by domain.

Causal Inference & Econometrics
Wright, P. G. (1928). The Tariff on Animal and Vegetable Oils. First instrumental variables estimation using weather.
Dell, Jones & Olken (2014). “What Do We Learn from the Weather?” J. Economic Literature, 52(3), 740–798.
Angrist, J. D. & Krueger, A. B. (2001). Instrumental variables and the search for identification. J. Econ. Perspectives.
Stock, J. H. & Yogo, M. (2005). Testing for weak instruments in linear IV regression. Ch. 5 in Andrews & Stock (eds.).
Chernozhukov, V. et al. (2018). Double/debiased machine learning. The Econometrics Journal, 21(1), C1–C68.
Abadie, Diamond & Hainmueller (2010). Synthetic control methods. JASA, 105(490), 493–505.
Callaway & Sant’Anna (2021). Difference-in-differences with multiple time periods. J. Econometrics.
Uncertainty Quantification
Vovk, Gammerman & Shafer (2005). Algorithmic Learning in a Random World. Springer. (Conformal prediction.)
Gibbs, I. & Candès, E. (2021). Adaptive conformal inference under distribution shift.
Barber et al. (2023). Conformal prediction beyond exchangeability. Annals of Statistics, 51(2), 816–845.
James, W. & Stein, C. (1961). Estimation with quadratic loss. Proc. 4th Berkeley Symposium.
Efron, B. & Morris, C. (1975). Data analysis using Stein’s estimator. JASA, 70(350), 311–319.
Rosenbaum, P. R. (2002). Observational Studies (2nd ed.). Springer. Sensitivity analysis framework.
Weather-Demand & Marketing Science
Busse, Pope, Pope & Silva-Risso (2015). “The Psychological Effect of Weather on Car Purchases.” QJE 130(1), 371–414.
Starr-McCluer (2000). “The Effects of Weather on Retail Sales.” Fed. Reserve Board FEDS 2000-08.
Gordon et al. (2019). “A comparison of approaches to advertising measurement.” Marketing Science. (Median 5–10x overestimation.)
Lewis, R. A. & Rao, J. M. (2015). “The unfavorable economics of measuring the returns to advertising.” QJE, 130(4).
Jin, Y. et al. (2017). Bayesian methods for media mix modeling with carryover and shape effects. Google Technical Report.
Runge et al. (2024). Robyn: continuous and semi-automated marketing mix modeling. arXiv:2407.06182.
Control Theory & Information Theory
Ashby, W. R. (1956). An Introduction to Cybernetics. Chapman & Hall. (The law of requisite variety.)
Conant, R. C. & Ashby, W. R. (1970). Every good regulator of a system must be a model of that system.
Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal.
Camacho, E. F. & Bordons, C. (2007). Model Predictive Control (2nd ed.). Springer.
Touchette, H. & Lloyd, S. (2000). Information-theoretic limits of control. Physical Review Letters.

See a proof bundle for your category.

Tell us what you sell and where. We will show you what the model sees across 0 markets, 0 peer-reviewed studies, and 0 product categories.

Or email nate@schmiedehaus.com directly.

Currently accepting design partners in outdoor, beverage, apparel, personal care, home improvement, and food delivery.