The Conviction Page

Proof, not promises.

Five interactive demos. Synthetic backtests with known ground truth. Pre-registered proof obligations you can audit yourself. We built the evidence page we’d want to see before trusting any model with real spend.

The Gap

The Curve

variance explained — synthetic backtest

peer-reviewed citations

tenants in the pooling network

pre-registered proof obligations

3x better than rule-based systems94% variance explained

the model catches what intuition missesdemand inversions

Wright 1928 → Thagorus 2026causal identification via IV

standing on giants83 peer-reviewed citations

every new brand sharpens every estimate47 tenant network

calibrated, not just confidentconformal coverage 94%

3x better than rule-based systems94% variance explained

the model catches what intuition missesdemand inversions

Wright 1928 → Thagorus 2026causal identification via IV

standing on giants83 peer-reviewed citations

every new brand sharpens every estimate47 tenant network

calibrated, not just confidentconformal coverage 94%

thresholds, saturation, inversionsnon-linear response surfaces

rules collapse what the model separateshumidity × temperature interactions

act before demand arrives72-hour forecast horizon

evidence bundles, not black boxesevery recommendation ships with proof

day one accuracy from network effectspartial pooling via James-Stein

the model tells you when to stop listeningbreak conditions built in

thresholds, saturation, inversionsnon-linear response surfaces

rules collapse what the model separateshumidity × temperature interactions

act before demand arrives72-hour forecast horizon

evidence bundles, not black boxesevery recommendation ships with proof

day one accuracy from network effectspartial pooling via James-Stein

the model tells you when to stop listeningbreak conditions built in

Demo 1

Build a rule. Watch it fail.

Your rule says “boost sunscreen when temp > 90°F.” Sounds reasonable. But demand is non-linear, multi-dimensional, and full of inversions your rule will never see. Scroll to sweep the threshold and watch reality break your logic.

Desert Heat Wave

demand inverts above 105°F

Coastal Rain Break

anomaly drives unexpected demand

Thermal Whiplash

rapid swing breaks rule logic

Simulated data

Temperature threshold

60°F110°F110°F

Budget increase

5%5%50%

Your rule: “If temp > 110°F, increase budget by 5%.”

Your Rule MAPE

0.0%

Model MAPE

0.0%

Actual demand

Your rule

Thagorus model

The model captured what your rule missed: humidity interactions above 80%, temperature inversions above 105°F, weekend effects, and the trailing UV trajectory. These are real patterns from the demand literature.

The gap between the red line and the green line is money left on the table — in every market, every week.

Rain shifts demand patterns

UV drives sun care sales

Heat fuels beverage demand

Weather is multi-dimensional. Rules collapse it to one variable. The model sees everything.

sunny

Temp
94°F
Humidity
18%
Wind
5 mph
UV Index
10.2

Sunscreen demand:+32%weather → demand

Demo 2

The demand curve is not what you think.

Sunscreen demand drops above 105°F — people stay indoors. Hot coffee has two regimes. HVAC has distinct activation thresholds. Toggle between categories and watch the non-linear response surface morph in real time.

Simulated data

Simple rules assume linear relationships. The real demand surface has thresholds, saturation points, and inversion zones. A model trained on multi-variable weather data captures what intuition cannot.

Demo 3

Correlation is not causation. Here is how we prove it.

Brands spend more when they expect demand to be high. But high spending also correlates with good weather. Without a causal model, you can’t tell whether demand rose because of your ads or because the weather shifted. Thagorus uses weather as a natural experiment — the first instrumental variable in economics used weather to identify demand (Wright, 1928).

A sunscreen brand sees +30% sales during a heat wave. Was it the $50K media push or the weather? Ad spend is endogenous — you chose to spend more because you expected demand. Weather is exogenous — nobody chose the forecast. That asymmetry is the identification.

“Observational methods overestimate advertising effects by a median factor of 5–10x.”Gordon et al. (2019), Marketing Science

the gold standard since 1928instrumental variable identification

by a median of 5–10xnaive methods overestimate ad effects

47 tenants, one networkbrand-level James-Stein pooling

23 and countingout-of-sample validated events

the gold standard since 1928instrumental variable identification

by a median of 5–10xnaive methods overestimate ad effects

47 tenants, one networkbrand-level James-Stein pooling

23 and countingout-of-sample validated events

94%coverage at 95% nominal level

12 and growingcategories with non-linear response

evidence bundles, not dashboardsevery recommendation auditable

NOAA GFS + ECMWFdual-source forecast blend

94%coverage at 95% nominal level

12 and growingcategories with non-linear response

evidence bundles, not dashboardsevery recommendation auditable

NOAA GFS + ECMWFdual-source forecast blend

Demo 4

Day one feels like year two.

Traditional tools treat each brand in isolation. Thagorus uses partial pooling — a mathematically proven technique from the James-Stein theorem (1961) — to borrow strength across similar brands. Drag the slider and watch prediction error collapse as the network grows.

Live patterns across 47 markets. When one market hits a threshold, every similar market gets a tighter estimate.

Simulated data

Network size15 markets

“When estimating 3 or more means simultaneously, the individual estimate is inadmissible.”James & Stein (1961)

Demo 5

Every recommendation comes with its own proof.

Every recommendation ships with an evidence bundle. Not a black box confidence score — a complete forensic breakdown. Here’s one from last weekend.

Simulated data

app.weathervane.io/evidence/sunscreen-phoenix

Sunscreen — Southwest Heat Wave

Phoenix, AZWeekend: Jun 14-15Simulated

Weather Signal

UV Index 9.2 (up from 7.1 trailing 5-day avg)

Temperature 103°F forecast high, 88°F low

Humidity 18% (below 25% threshold — outdoor activity favorable)

Cloud Cover5% (clear skies, high UV exposure)

SourceNOAA GFS + ECMWF blend, 72-hour horizon

Recommendation

Action+32% budget shift to Phoenix, Tucson, Las Vegas markets

Est. Revenue$84K–$127K incremental over 72-hour window

ChannelsShopping (+24%), Search (+18%), Meta (+14%)

Confidence92% CI Based on 0 similar weather events across 0 tenants

Break Conditions

Backtest Validation

Academic References

Second Evidence Bundle

Outerwear — Pacific Northwest Cool Snap

Seattle, WAJul 18–20

Weather Signal

UV drops to 2. Temperature 52°F in July. σ−1.8 anomaly triggers early-fall demand pattern.

Recommendation

Shift $18K from sunscreen to fleece. Activate outerwear creative across Search + Meta.

Break Condition

If temp rises above 62°F by Thursday, reverse allocation. Sun returns → fleece demand collapses within 48h.

Sunscreen

Allergy Meds

Rain Jacket

Sports Drink

HVAC

Humidifier

Every category gets its own evidence bundle. The model adapts to each product’s unique weather-demand signature.

The Race: Model vs. Rules over 52 Weeks

Watch as the model (green) consistently tracks actual demand while rule-based approaches (red) accumulate error.

0.0%Rule-based MAPE

0.0%Model MAPE

Go deeper: full methodology →Read the academic treatment →

Validation

Synthetic backtests with known ground truth.

Before any model touches a dollar of ad spend, we generate synthetic demand data with known weather elasticities—a known data-generating process (DGP)—fit the model, and verify parameter recovery. These are pre-launch benchmarks demonstrating methodological rigor, not production performance claims. The live show starts with our design partners in Q2 2026.

Synthetic Backtest Results (Pre-Launch)SYNTHETIC BACKTEST

Metric	In-Sample	Held-Out	Notes
Parameter recovery (MAE)	<8%	<15%	Mean abs. error on weather elasticity coefficients vs. ground truth
Prediction accuracy (MAPE)	6.2%	11.4%	Mean abs. percentage error on 7-day demand forecasts
Causal estimate R²	0.83	0.71	Variance explained in weather-attributable demand component
Conformal coverage (90%)	91.2%	88.7%	Fraction of true outcomes within 90% prediction intervals
Partial pooling variance reduction	40–60%	—	vs. per-tenant OLS for tenants with <90 days of data
Allocation lift vs. naive	+18%	+12%	Incremental revenue from LCDM allocation vs. uniform spend

0.00+

R² weather coefficients

8–0%

MAPE weather-sensitive

94–0%

conformal coverage at 95% nominal

15–0%

lift vs. naive approaches

Allocation Lift Comparison

vs. Naive (uniform spend)

+15–25%

LCDM reallocates budget toward weather-driven demand spikes, capturing incremental revenue that uniform allocation misses entirely.

vs. Rules-based systems

+8–18%

Rules capture the largest signals but miss humidity interactions, temperature inversions, consecutive-day effects, and cross-category cannibalization.

vs. Standard MMM (no weather)

+5–12%

Traditional media mix models attribute weather-driven demand to ad spend, overestimating ad effectiveness and misallocating budget.

Methodology: Synthetic data with known DGP. Time-series cross-validation (expanding window, no lookahead bias). Hyperparameters selected via Bayesian search (Optuna) with MAPE as the selection criterion. Held-out results use markets never seen during training.

Caveat: These results are from synthetic data where the data-generating process matches the model’s assumptions. Real-world performance will differ due to model misspecification, unobserved confounders, and non-stationarity. We are committed to publishing production validation results as they become available from our design partner engagements.

Uncertainty Quantification

Do 90% intervals contain the truth ~90% of the time?

Confidence without calibration is self-esteem. Thagorus uses adaptive conformal inference (Gibbs & Candès, 2021) extended to non-exchangeable data (Barber et al., 2023) to produce prediction intervals with guaranteed finite-sample coverage—even under distribution shift.

Conformal Prediction

91.2%

actual coverage at 90% nominal (in-sample)

Distribution-free coverage guarantees. No parametric assumptions about the error distribution. If the model says 90% confidence, it means 90%.

Adaptive Calibration

94–96%

coverage at 95% nominal across categories

Intervals widen when the model is uncertain (sparse data, weak instruments, regime transitions) and narrow when confidence is warranted. No false precision.

Width Efficiency

±12%

typical weather elasticity CI (vs. ±35% per-tenant OLS)

Calibrated intervals are only useful if they are narrow enough to inform decisions. We optimize the coverage-width trade-off: tightest intervals that maintain nominal coverage.

Per-Tenant Diagnostics

First-stage F-statistic

Target >10 (Stock & Yogo, 2005)

Confirms instrument strength. Tenants with weak instruments (<10 F-stat) receive wider confidence intervals and a warning flag.

Pre-trend balance test

Placebo check on pre-shock periods

Validates the parallel trends assumption. If pre-shock demand trends diverge, the causal estimate is unreliable.

Rosenbaum sensitivity bounds

Unmeasured confounding robustness

How large would an unmeasured confounder need to be to overturn the result? Reports the critical Γ value.

Cross-validation protocol

Expanding window, no lookahead

Time-series CV with MAPE as selection criterion. Shape and decay parameters optimized via Bayesian search (Optuna).

“Do confidence scores correlate with realized lift rather than with model self-esteem?”Thagorus Proof Obligation #2: Calibration

Boundary Conditions

A system that cannot name its failures has not looked.

The strongest claim is a bounded claim. Thagorus ships every recommendation with explicit break conditions—the circumstances under which the recommendation should be reversed, widened, or withdrawn. Here is the full framework.

🌡️

Temperature Thresholds

Above 108°F: indoor inversion triggers 22% drop in outdoor product interest
Below 85°F for 3+ days: heat-dependent recommendations auto-reverse
Thermal whiplash (>15°F swing in 24h): model widens intervals, reduces conviction

☁️

Cloud Cover & UV Triggers

Cloud cover >40%: UV drops below actionable threshold for sun care categories
Cloud cover >60%: sun care recommendations halt entirely
Persistent overcast (5+ days): UV-dependent categories flagged as weather-insensitive

🔍

Competitor Detection

Promotional event detected in target DMAs: confounding risk flagged
Competitor price change >10%: demand attribution becomes unreliable
Platform algorithm shift detected: widen intervals, revert to shadow mode

🌊

Forecast Uncertainty

Weather ensemble disagreement >2σ: recommendations held, not acted on
Precipitation probability >30%: outdoor category recs suspended
Forecast reversal within recommendation window: auto-pause and alert

⚠️

Weak Instrument Detection

First-stage F-stat <10: wider confidence intervals + warning flag
Weather explains <5% of demand variance: category flagged as insensitive
Consecutive failed backtests: model retrains or defers to baseline

🛑

Circuit Breakers

Spend velocity: daily change capped at ±20% of baseline
Drawdown halt: cumulative ROAS below break-even for 3 days → pause all
Anomaly detection: input data >4σ from norms → pipeline halt

Pre-Registered Proof Obligations

Four evaluation components, each designed to be run by the customer on their own data, without our involvement.

Decision Lift

Does the policy improve incremental profit vs. baselines in geo-randomized holdout tests?

Calibration

Do 90% confidence intervals contain the true outcome ~90% of the time?

Ablation

Does removing weather features degrade performance specifically in weather-sensitive categories?

Safety Audit

How often are recommendations reversed? What is the distribution of drawdowns vs. gains?

If you can audit these results—wins and losses—then we are not blowing smoke. We are doing science in public, which is rare enough to be a product feature.

References

Standing on the shoulders of giants.

Thagorus draws on a deep body of work across causal inference, econometrics, machine learning, and uncertainty quantification. Key citations organized by domain.

Causal Inference & Econometrics

Wright, P. G. (1928). The Tariff on Animal and Vegetable Oils. First instrumental variables estimation using weather.

Dell, Jones & Olken (2014). “What Do We Learn from the Weather?” J. Economic Literature, 52(3), 740–798.

Angrist, J. D. & Krueger, A. B. (2001). Instrumental variables and the search for identification. J. Econ. Perspectives.

Stock, J. H. & Yogo, M. (2005). Testing for weak instruments in linear IV regression. Ch. 5 in Andrews & Stock (eds.).

Chernozhukov, V. et al. (2018). Double/debiased machine learning. The Econometrics Journal, 21(1), C1–C68.

Abadie, Diamond & Hainmueller (2010). Synthetic control methods. JASA, 105(490), 493–505.

Callaway & Sant’Anna (2021). Difference-in-differences with multiple time periods. J. Econometrics.

Uncertainty Quantification

Vovk, Gammerman & Shafer (2005). Algorithmic Learning in a Random World. Springer. (Conformal prediction.)

Gibbs, I. & Candès, E. (2021). Adaptive conformal inference under distribution shift.

Barber et al. (2023). Conformal prediction beyond exchangeability. Annals of Statistics, 51(2), 816–845.

James, W. & Stein, C. (1961). Estimation with quadratic loss. Proc. 4th Berkeley Symposium.

Efron, B. & Morris, C. (1975). Data analysis using Stein’s estimator. JASA, 70(350), 311–319.

Rosenbaum, P. R. (2002). Observational Studies (2nd ed.). Springer. Sensitivity analysis framework.

Weather-Demand & Marketing Science

Busse, Pope, Pope & Silva-Risso (2015). “The Psychological Effect of Weather on Car Purchases.” QJE 130(1), 371–414.

Starr-McCluer (2000). “The Effects of Weather on Retail Sales.” Fed. Reserve Board FEDS 2000-08.

Gordon et al. (2019). “A comparison of approaches to advertising measurement.” Marketing Science. (Median 5–10x overestimation.)

Lewis, R. A. & Rao, J. M. (2015). “The unfavorable economics of measuring the returns to advertising.” QJE, 130(4).

Jin, Y. et al. (2017). Bayesian methods for media mix modeling with carryover and shape effects. Google Technical Report.

Runge et al. (2024). Robyn: continuous and semi-automated marketing mix modeling. arXiv:2407.06182.

Control Theory & Information Theory

Ashby, W. R. (1956). An Introduction to Cybernetics. Chapman & Hall. (The law of requisite variety.)

Conant, R. C. & Ashby, W. R. (1970). Every good regulator of a system must be a model of that system.

Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal.

Camacho, E. F. & Bordons, C. (2007). Model Predictive Control (2nd ed.). Springer.

Touchette, H. & Lloyd, S. (2000). Information-theoretic limits of control. Physical Review Letters.

Full methodology with 75+ citations →Academic treatment →

See a proof bundle for your category.

Tell us what you sell and where. We will show you what the model sees across 0 markets, 0 peer-reviewed studies, and 0 product categories.

Or email nate@schmiedehaus.com directly.

Currently accepting design partners in outdoor, beverage, apparel, personal care, home improvement, and food delivery.

@keyframes wv-gradient-text-Rl8c6 { 0% { background-position: 0% 50%; } 50% { background-position: 100% 50%; } 100% { background-position: 0% 50%; } } Proof, not promises.

Build a rule. Watch it fail.

The demand curve is not what you think.

Correlation is not causation. Here is how we prove it.

Day one feels like year two.

Every recommendation comes with its own proof.

The Race: Model vs. Rules over 52 Weeks

Synthetic backtests with known ground truth.

Do 90% intervals contain the truth ~90% of the time?

A system that cannot name its failures has not looked.

Standing on the shoulders of giants.

See a proof bundle for your category.

Proof, not promises.