How to build an economic
forecasting model for everyone
The complete playbook for building, training, and scaling a causal demand intelligence platform — from first data point to network dominance.
Nathaniel Schmiedehaus
Five things to know
Before the details — five claims this document will prove.
574M observations per year from 3,143 counties × 500 categories. Two years of daily data reaches full causal identification. 3–5 design partners provide enough.
~$95K Year 1 infrastructure — serious but 1,000× cheaper than frontier LLMs. The model architecture is an open research problem we solve via neural architecture search. The real moat is the causal graph that only grows with partners.
Grokking, double descent, catastrophic forgetting — each is a real threat with peer-reviewed countermeasures. Grokfast accelerates grokking 50×. Deliberate overparameterization tames double descent. The playbook exists; execution is the challenge.
Stein's Paradox: every new partner improves every existing partner's estimates. A pool chemical company in Phoenix sharpens a raincoat brand's model in Seattle.
The LCDM is a general-purpose causal inference engine. Weather-sensitive products are the beachhead. The platform extends to pricing, inventory, creative, and economic nowcasting.
The AlphaFold parallel
AlphaFold didn't succeed because DeepMind had better neural networks. It succeeded because the Protein Data Bank — decades of painstaking crystallography — gave the model something to learn from. The architecture was necessary but not sufficient. The data was the moat.
Causal demand modeling has the same structure. The statistical methods exist (instrumental variables, James-Stein shrinkage, hierarchical Bayes). The weather data exists (NOAA, ERA5). What didn't exist was a panel of actual sales outcomes dense enough to estimate causal effects across thousands of county-category pairs. That's what Thagorus's partner network creates — and like the Protein Data Bank, each new contribution makes everyone's estimates more precise.
| AlphaFold | Thagorus LCDM | |
|---|---|---|
| The model | Transformer architecture | Causal inference (IV + Stein shrinkage) |
| The dataset | Protein Data Bank (crystallography) | Multi-partner sales panel (3,143 counties × 500 categories) |
| The gap catalogued | Computational chemistry vs. reality | Correlation vs. causation (weather as natural experiment) |
| Why it couldn't exist before | No ground-truth protein structures at scale | No multi-partner causal demand panel at county-category resolution |
| The moat | Years of crystallography data | Years of partner sales data + daily causal learning |
Breakthrough = Model + Domain-Specific Empirical Data + Fast Feedback Loop. No single component is sufficient. The partner panel is Thagorus's Protein Data Bank — proprietary, compounding, and impossible to replicate from a standing start.
Nobody has assembled a multi-partner causal demand panel because the dataset simply doesn't pre-exist. Nielsen tracks sales. NOAA tracks weather. But nobody has systematically joined them at the county-category-day level, cleaned for confounders, and estimated causal effects. The data has to be built through partner integrations — and the only way to build it is to get live with real partners generating real signal.
This is why Thagorus's early partners are so valuable. They're not just customers paying for forecasts. They're co-creating the first dataset of its kind.
Why now
Three things converged in the last three years that make this possible for the first time.
Any one of these alone isn't enough. Together, they make a Large Causal Demand Model feasible for the first time — the same way AlphaFold required the convergence of transformers, the Protein Data Bank, and sufficient compute to train on both.
The Data Flywheel
Every new partner is simultaneously a customer and a data source. Their sales signal improves the model for everyone — including themselves. Revenue and data accumulation are the same act. Hover each node to see the mechanism.
The data
What the model needs to learn
The LCDM's panel dataset: county × category × day, ~60 dimensions per observation.

Most demand forecasting is correlation-based. A model sees that ice cream sales and sunscreen sales both spike in July and concludes they're related. But correlation isn't actionable — you can't intervene on a correlation.
Thagorus's approach is structural, not statistical. Weather is an instrumental variable — it affects demand but isn't affected by pricing, promotions, or competitor actions. This lets us isolate the causal effect of weather on demand, the way a randomized trial isolates drug effects. The result isn't "these things tend to move together." It's "a 10-degree temperature anomaly in Harris County causes a 23% increase in sunscreen demand within 48 hours, holding all else equal."
That's the difference between a correlation and a prescription. One tells you what happened. The other tells you what to do.
The LCDM's identification strategy rests on a panel dataset where the cross-sectional unit is a county × product-category pair observed daily. Each observation carries weather features, sales signals, ad-spend by channel, inventory levels, and promotional calendars — roughly 60 effective dimensions per observation after interaction reduction. Weather events serve as instrumental variables across all 3,143 counties simultaneously.
3,143 U.S. counties × ~500 product categories × 365 days per year = 573,597,500 observations per year, rounded to 574M. Each observation is a unique county-category-day triple with its associated feature vector.
The U.S. Census Bureau defines 3,143 county-equivalent administrative divisions (3,007 counties + 136 county equivalents in Louisiana, Alaska, and independent cities). Counties are the finest geographic resolution at which both NOAA weather data and BLS economic data are consistently available, making them the natural unit for the LCDM's spatial panel.
The Bureau of Labor Statistics Consumer Expenditure Survey defines roughly 300 base expenditure categories. We extend this to ~500 by adding weather-sensitive subcategories: splitting "outdoor recreation" into equipment vs. apparel, "beverages" into hot vs. cold, "home improvement" into indoor vs. outdoor projects. The additional granularity is necessary because weather affects subcategories asymmetrically — a heat wave boosts iced coffee but suppresses hot coffee, and the model needs to see both.
| Window | Observations | Seasonal Cycles | IV Power |
|---|---|---|---|
| 90 days | ~143M | 0.25 | Weak — network priors only |
| 6 months | ~287M | 0.5 | Noisy; "memorization phase" |
| 1 year | ~574M | 1.0 | Baseline seasonal coverage |
| 2 years | ~1.15B | 2.0 | Full identification regime |
Causal identification via weather instruments requires observing county-category pairs across multiple extremes. Two years provide at minimum two independent realizations of each seasonal extreme, enabling F > 20 for weather-sensitive categories.
| Signal | Dimensions | Source |
|---|---|---|
| Weather | ~20 | NOAA GHCN-Daily + forecast APIs |
| Sales/demand | ~8 | Shopify, Amazon SP-API, POS |
| Ad spend | ~12 | Google Ads, Meta, TikTok APIs |
| Inventory | ~4 | Tenant ERP / inventory system |
| Promotions | ~6 | Tenant promo calendar |
| Macro | ~10 | FRED, BLS, Census |
Data quality dominates quantity for causal inference. Missing confounder data creates omitted variable bias that no amount of additional observations can fix. This is why design partner onboarding requires OAuth access to all ad platforms and sales channels — partial data destroys identification.
Weather shapes demand. The model learns why.The model
How the LCDM is built
The LCDM: six stages from raw data to causal predictions.
The Large Causal Demand Model uses the same transformer architecture behind ChatGPT — but instead of predicting the next word, it predicts the next day's demand and identifies what caused it. The architecture is not fixed: we treat it as a research problem solved through neural architecture search, starting from a strong baseline and optimizing for our specific data structure.
1. Data ingestion
Every morning, the pipeline pulls the previous day's weather from NOAA and forecast APIs, sales data from Shopify and Amazon, ad spend from Google and Meta, and inventory snapshots from each partner. These feeds are aligned to a consistent county-category-day grid.
2. Feature engineering
Raw temperature tells you almost nothing. The feature engine transforms it into weather anomalies — deviations from the 30-year normal for that county and day. 95°F in Phoenix is unremarkable; 95°F in Seattle is a five-sigma event. The engine also constructs interaction terms: the humidity-temperature combo that drives "feels-like" discomfort, the UV trajectory over 5 days that predicts sunscreen demand, the wind chill delta that triggers coat purchases.
3. Transformer encoder
The core is a transformer adapted for multivariate time series. The starting architecture — informed by BERT-base and time-series foundation models like MOIRAI and TimesFM — uses ~12 layers with ~768 embedding dimensions. But these are hyperparameters to be optimized via NAS, not gospel. What matters: each attention head learns different temporal and cross-category patterns. One head might learn that rain in Miami suppresses outdoor dining within hours; another learns that a Midwest cold snap predicts heating equipment demand 3–5 days later.
4. Causal identification layer
This is where the LCDM diverges from every other demand model. Standard models learn correlations — "when it's hot, sunscreen sells." The causal layer asks: "how much of this increase is caused by the heat, and how much would have happened anyway due to summer promotions, school breaks, or seasonal trends?" Weather is a natural experiment — nobody controls or predicts it perfectly. When an unexpected heat wave hits Florida but not New York, and sunscreen sales spike only in Florida, the difference is causally attributable to the heat. The causal layer uses this logic at scale across all 3,143 counties.
5. Network pooling
James-Stein shrinkage pools estimates across all partners. A new partner with 90 days of data inherits the statistical power of the entire network. The shrinkage factor is learned per category and geography — this is the mathematical foundation of Thagorus's network effect.
6. Daily predictions
Every morning, the trained model produces a 10-day demand forecast for each partner, category, and geography — with confidence intervals from ensemble weather uncertainty, causal attribution breakdowns, and recommended budget adjustments.
The 12-layer / 768-dim / 12-head configuration is a well-studied starting point, not a final architecture. BERT-base, GPT-2 Small, and time-series foundation models (MOIRAI, TimesFM, Chronos) all converge on similar dimensions for medium-scale tasks. Our NAS pipeline will systematically explore width, depth, attention head count, and the causal sub-network structure. The final architecture will almost certainly differ — the point is that we start from a strong, well-understood baseline rather than guessing.
The target parameter range is ~200–400M, informed by Chinchilla scaling laws applied to our dataset size (~1.15B observations). For a BERT-base-style starting point: 768 embedding dim × 12 heads × 12 layers gives ~85M in the transformer stack. Add input embeddings (~46M), positional encodings, output heads, and the causal identification sub-network (~40M for IV estimation + Stein shrinkage) — baseline ~270M. NAS will explore the 100M–500M range to find the compute-optimal point.
At any point in this range, the model is firmly "medium" — 10–50× smaller than GPT-2 Large, small enough for single-GPU inference in <200ms, but large enough to capture the nonlinear interactions between weather, geography, category, and demand that make causal identification work.
An instrumental variable must satisfy three requirements:
1. Relevance: The instrument must be correlated with the treatment (weather must actually affect consumer behavior). Measured by the first-stage F-statistic — F > 10 is the minimum (Staiger & Stock, 1997), F > 20 is strong (Stock & Yogo, 2005).
2. Independence: The instrument must be unrelated to confounders (weather cannot be caused by your marketing budget or competitor actions). Weather is exogenous by definition.
3. Exclusion: The instrument must only affect the outcome through the treatment. Addressed by daily × county resolution, which controls for supply-side effects and competitive responses.
Data preprocessing: Raw feeds are cleaned, aligned to the county-category-day grid, and split into train (80%), validation (10%), and test (10%) sets with temporal splits to prevent data leakage.
Batching: Mini-batches of 2,048 county-category-day observations, stratified by geography and category.
Optimizer: AdamW with weight decay 0.01, betas (0.9, 0.999), gradient clipping at 1.0.
Learning rate schedule: Linear warmup over 1,000 steps to peak LR of 3e-4, then cosine decay to 1e-5.
Checkpointing: Save every 500 steps. Keep best 5 checkpoints by validation loss. Final model is an exponential moving average of the last 3 checkpoints.
Validation: Evaluate on held-out counties (spatial generalization) and held-out time periods (temporal generalization) separately.
Early stopping: Halt if validation loss does not improve for 10 consecutive evaluations (5,000 steps). Combined with weight decay, this prevents epoch-wise double descent.
The science
Two surprising things that happen during training
Grokking: the model memorizes first, then suddenly learns the real pattern.
Imagine a student who memorizes every answer for a math exam — "Question 7 is 42, Question 12 is 17" — and aces it. New exam, different numbers: they fail. They keep studying. For weeks, nothing visibly changes. Then suddenly, overnight, they understand the underlying math and can solve any problem they've never seen.
What changed? The student stopped memorizing individual data points and started discovering the structureunderneath. For the LCDM, that means the model transitions from memorizing "90°F in Miami on June 3 → sunscreen +15%" to understanding the mechanism: that it's not the absolute temperature that matters, but the deviation from the 30-year normal, modulated by humidity interaction, baseline seasonal demand, and how quickly the weather changed. It discovers that this same mechanism operates differently across geographies — 90°F in Phoenix barely registers while 90°F in Portland is a demand shock. It finds cross-category cascades: the same heat event that boosts pool chemical sales also suppresses hot coffee and outdoor dining, and these relationships are causal, not just correlated.
That's the "extra information" — not more data points, but the discovery of latent causal structure that was always present in the training data but invisible to a memorizing model. Neel Nanda et al. (2023) showed this transition happens because the model's internal representations reorganize from lookup tables to generalizable circuits. Grokfast (Lee et al., 2024) accelerates this 50× by amplifying slow-varying gradient components, reducing the grokking budget from thousands of GPU-hours to hundreds.
| Scenario | Extra Chip-hrs | Cost @ $0.48/hr | Cost @ $1.20/hr |
|---|---|---|---|
| With Grokfast (50×) | ~200 | $96 | $240 |
| Standard (3× train time) | ~3,600 | $1,728 | $4,320 |
| Worst case (10× train time) | ~12,000 | $5,760 | $14,400 |
Reference: Nanda, N. et al. (2023). "Progress measures for grokking via mechanistic interpretability." ICLR 2023.
Double descent: more parameters can actually reduce error, defying classical intuition.
Classical ML teaches that there's a "sweet spot" for model size: too small and it underfits, too large and it overfits. This U-shaped bias-variance trade-off is every ML textbook's chapter 1. It turns out to be incomplete.
When you keep making the model bigger past the point where it can perfectly memorize the training data (the "interpolation threshold"), something unexpected happens: test error peaks, then descends again. Larger models actually perform better, not worse. This is double descent, documented by Belkin et al. (2019) and Nakkiran et al. (2021), and it overturns fifty years of statistical intuition. For the LCDM, it means deliberate overparameterization — combined with weight decay and early stopping — is a feature, not a risk.
Model-wise double descent: Test error peaks when model capacity matches dataset size, then decreases as the model grows.
Epoch-wise double descent: Test error decreases, increases during the "critical regime," then decreases again. Eliminated by early stopping + tuned weight decay.
Sample-wise double descent: Adding more data can temporarily worsen performance. Resolves with either more data or larger models.
LCDM position: At ~270M parameters on ~1.15B observations (ratio 1:4.3), we're firmly in the overparameterized regime. The strongest defense is deliberate overparameterization + weight decay + early stopping + hierarchical shrinkage.
Catastrophic forgetting: Updating with new partner data risks degrading existing partners. Mitigated via elastic weight consolidation (EWC) and monthly full-network retrains on the complete pooled dataset.
Distribution shift: Weather-demand relationships are non-stationary. Addressed with anomaly encoding (deviations from 30-year normals, not raw temps), rolling retraining, and 4σ divergence detection with automatic checkpoint fallback.
Mode collapse: Prevented by conformal prediction wrappers, ensemble disagreement monitoring across 5 checkpoints, and heteroscedastic output heads.
Instrument strength: The first-stage F-statistic must exceed 10 and ideally 20+ for reliable causal estimates. Monte Carlo simulations show 24 months of data achieves F > 20 for 80% of county-category pairs.
Standing on the shoulders of AI weatherStanding on the shoulders of AI weather
The LCDM doesn't predict weather — it consumes the best weather forecasts available and uses them as instruments. A new generation of AI weather models produces forecasts that rival the European Centre (ECMWF) at a fraction of the cost.
| Model | Developer | Resolution | Inference | Notes |
|---|---|---|---|---|
| GraphCast | Google DeepMind | 0.25° | <1 min | 10-day forecast, single TPU, open weights |
| GenCast | Google DeepMind | 0.25° | ~8 min | Probabilistic ensembles, diffusion model |
| FourCastNet | NVIDIA | 0.25° | <2 sec | Fourier Neural Operator, 7-day forecast |
| Pangu-Weather | Huawei | 0.25° | ~1.4 sec | 3D Earth-specific transformer |
| Aurora | Microsoft | 0.1° | <1 min | Foundation model, flexible fine-tuning |
Physics + ML hybrid
Pure ML risks learning spurious correlations. Pure physics can't capture nonlinear demand responses. The LCDM uses a hybrid approach:
- Physics-informed priors: Known relationships are encoded as Bayesian priors, not hard constraints. The model can override them with sufficient data.
- Ensemble weather inputs: We ensemble GraphCast + GenCast + NOAA GFS. Ensemble disagreement provides built-in uncertainty for downstream causal estimates.
- Physics-guided regularization: The loss function penalizes causal estimates that violate known physical constraints (e.g., negative temperature elasticity for heating products above 70°F).
- Interpretable + flexible: The physics component is fully interpretable. The ML component captures nonlinearities the physics can't. Partners see both layers.
GraphCast generates a 10-day global forecast in under 1 minute on a single TPU v4. At $3.22/hr for a TPU v4, that's $0.05 per global forecast — roughly $18/year for daily county-level weather inputs. NOAA's HRES model costs ~$50M/year. We get comparable quality for 3,000× less.
The economics
What it takes to build the economic graph
Building the economic graph is a multi-stage, capital-intensive engineering challenge.
We considered selling weather-demand analytics as a tool. But demand data is fragmented across thousands of retailers, each with different POS systems, category taxonomies, and data quality. Selling into that fragmentation means years of enterprise sales cycles and bespoke integrations.
Instead, Thagorus controls the full inference pipeline: data ingestion, causal estimation, forecast generation, and decision delivery. Partners send us data; we send them decisions. We own the statistical methodology, the cross-partner shrinkage, and the feedback loop. This full-stack approach is more capital-intensive, but it's the only way to ensure the causal estimates are actually valid — and the only way to capture the network effects that make the model improve with scale.
The capital buys time to recruit PhD-level causal inference researchers, secure enterprise data partnerships that take 6–18 months to close, build SOC 2-compliant infrastructure, and stand up real-time serving for millions of daily predictions — before competitors realize what's possible.
Seed ($2–5M): 3–4 person team (ML + eng), $95K compute/infra, 3–5 design partners, SOC 2 Type I. Proves the model works on real data.
Series A ($10–20M): 12–18 person team, $200K–$500K/yr compute, 20–50 partners, SOC 2 Type II, enterprise sales motion. Proves the business.
Series B+ ($40–80M): 40–60 person team, $1M–$3M/yr compute, 200+ partners, in-house GPU cluster, FedRAMP. Builds the platform.
| Investment Area | Seed | Series A | Series B+ |
|---|---|---|---|
| Team | 3–4 (ML + eng) | 12–18 (ML, eng, data, sales) | 40–60 (full org) |
| Compute + Infra | $25K–$50K/yr | $200K–$500K/yr | $1M–$3M/yr |
| Data partnerships | 3–5 design partners | 20–50 paid + BDRs | 200+ self-serve + enterprise |
| Compliance & security | SOC 2 Type I | SOC 2 Type II, pen testing | FedRAMP, financial regs |
| GTM + Sales | Founder-led | 2–3 AEs + marketing | Full sales org + partnerships |
Foundation model companies raise billions because compute is their moat. Thagorus's moat is the causal graph — the data network, not the compute.
| Foundation Model | LCDM | |
|---|---|---|
| Parameters | ~1.8 trillion | 270 million |
| Training data | The entire internet | 1.15B structured observations |
| Where $ goes | Compute (thousands of GPUs) | Team, data, partners, GTM |
| Moat source | Scale of compute | Scale of causal graph |
| Defensibility | Anyone with $100M+ can try | 2+ years of multi-tenant data |
This is why $150M builds a category-defining economic intelligence platform while $10B builds one more language model. The capital goes to assets that compound — data partnerships, the causal graph, regulatory moats. The LCDM's architecture is defensible but ultimately replicable. The multi-partner causal panel is neither. Lead with the data asset, not the model.
At each stage, the platform becomes something qualitatively different:
Stage 1: Weather intelligence (Seed → Series A)
5–20 partners in weather-sensitive categories. The LCDM resolves DMA-level causal effects. Partners get causal demand signals no existing tool can provide. ARR: $120K–$2M.
Stage 2: Demand intelligence (Series A → Series B)
50–200 partners across dozens of verticals. The causal graph resolves county-level effects and transcends weather. Partners start asking: "what's causing the demand shift in Phoenix this week?" The model begins seeing cross-category demand cascades. This is the inflection point. ARR: $5M–$25M.
Stage 3: Economic forecasting (Series B → IPO)
500–5,000+ partners. The causal graph becomes a nowcasting engine for the real economy. Transaction signals from thousands of businesses, combined with weather and new instruments, create something that doesn't exist today — a real-time, causally-identified model of consumer economic behavior at county-level granularity. ARR: $50M+.
How costs evolve
Costs scale sublinearly. Adding partners doesn't mean retraining from scratch. At 1,000 partners: ~$200K/year training, ~$150K/year inference, ~$100K/year infra. Total ~$450K/year on $50M+ ARR = 99%+ gross margin.
Infrastructure budget
The standard FLOPs estimate for transformer training is C = 6 × N × D, where N is parameters and D is dataset size.
N = 200–400M parameters (NAS range). D = 1.15B observations. Using the midpoint (300M): C = 6 × 300M × 1.15B = 2.07 × 1018 FLOPs per pass.
A TPU v5e delivers ~393 TFLOPS (BF16) sustained. At ~40% practical throughput: ~4.4 chip-hours per epoch. 200–400 epochs → ~3,000 chip-hours base.
Multipliers: 12× HPO sweeps across the architecture search space, 5× grokking buffer (reduced by Grokfast), 2× ablation & validation, 12× monthly production retrains.
Compute total: ~108,000 chip-hours at blended $0.48–$0.72/hr = ~$67,500. Add 24/7 inference serving ($6,300), weather APIs × 3 providers ($8,400), data pipeline ($4,800), cloud infra ($6,000), spot preemption overhead ($8,300) = ~$95K Year 1.
This is 3–4× higher than a naive spot-only estimate because it accounts for: mixed spot/on-demand pricing, NAS requiring broader search, inference costs for production serving, and the overhead of preemption recovery.
Google Cloud TPU (Q1 2026 rates):
| Chip | On-Demand | 1-yr CUD | 3-yr CUD | Spot |
|---|---|---|---|---|
| TPU v5e | $1.20 | $0.84 | $0.54 | ~$0.48 |
| TPU v5p | $4.20 | $2.94 | $1.89 | ~$1.68 |
| TPU v6e | $1.38 | $0.97 | $0.55 | ~$0.55 |
AWS EC2 GPU:
| Instance | GPUs | On-Demand | Per-GPU | Spot |
|---|---|---|---|---|
| p4d.24xlarge | 8× A100 | $22.03/hr | $2.75 | ~$7.20 |
| p5.48xlarge | 8× H100 | $33.10/hr | $4.14 | ~$13.20 |
Local vs. cloud
An in-house GPU workstation amortizes to competitive hourly rates. The recommended approach is hybrid — local for daily work, cloud for burst compute.
$13,000 upfront, amortizes to $0.12/GPU-hr over 3 years. Always available, no spot preemption. ~35,000 GPU-hours/year at 24/7 utilization.
$0.48/chip-hr with no commitment. Elastic scaling to 32+ chips for hyperparameter search. Year 1 cloud estimate: ~$6,900 for burst HPO + architecture search.
| Configuration | Upfront | Amortized/yr | Effective $/GPU-hr |
|---|---|---|---|
| 1× RTX 4090 workstation | ~$3,500 | $1,167 | $0.13 |
| 4× RTX 4090 server | ~$13,000 | $4,333 | $0.12 |
| Lambda Scalar (8× A6000) | ~$48,000 | $16,000 | $0.23 |
| NVIDIA DGX A100 | ~$199,000 | $66,333 | $0.95 |
From data to decisionsModel your own scenario
The strategy
Solving the cold start
The causal demand panel doesn't exist in any usable form. Not at Nielsen. Not at IRI. Not at any retailer. The data is the company — and it only exists if we build it. Every decision should be evaluated against: does this get us to 10 partners with 12 months of data faster?
Build a fully functional demo using synthetic demand data calibrated to real weather. BLS retail sales indices + NOAA weather generate realistic panels.
3–5 DTC e-commerce brands provide OAuth access in exchange for 6–12 months free service. They get analytics they could never build in-house. We get the data to train the model.
Fine-tune MOIRAI on available weather-demand data. Delivers a working product within weeks of data access while the LCDM trains.
Pre-train on BLS Consumer Expenditure Survey, Census Retail Trade, FRED, Kilts-Nielsen panels. Transfer learning cuts per-partner data requirements dramatically.
Ideal design partners
Think Home Depot, Lowe's, Tractor Supply. Thousands of SKUs from HVAC to outdoor furniture. Strong seasonality, massive geographic spread, years of POS data.
Companies like YETI, Hydro Flask, Solo Stove, Traeger. Clear weather signal, Shopify or D2C data, $5M–$50M ad spend. Fast to onboard via API.
Companies like Liquid Death, Athletic Brewing, Olipop. Beverage sales are strongly weather-driven. Both DTC and retail channel data available.
Sunscreen, outdoor apparel, seasonal skincare. Shopify-native brands with clear weather sensitivity. Fast onboarding, immediate signal.
A company selling both sunscreen and books provides a natural within-company control. When a heat wave hits and sunscreen sales spike while book sales stay flat, we can isolate the causal effect of temperature on sunscreen — the books are the control group.
| Integration | Purpose | Effort |
|---|---|---|
| Shopify / Amazon SP-API | Sales, orders, inventory by SKU/day | OAuth, <1 hr |
| Google Ads API | Spend, impressions, clicks by campaign/day | OAuth, <1 hr |
| Meta Marketing API | Spend, reach, conversions by campaign/day | OAuth, <1 hr |
| Historical data export | Backfill 12–24 months | CSV/API, 1–3 hrs |
| Data sharing agreement | Legal, NDA, data usage terms | 1–2 weeks |
A typical Tier 2 partner with 50 categories across 200 DMAs generates 50 × 200 × 365 = 3.65M observations per year. With ~60 dimensions per observation, that's 219M data points annually. Three such partners provide 10.95M observations — enough for statistically significant causal estimates within 6 months.
Every partner makes every other partner betterWhy every new partner makes every existing partner better
As the network grows, minimum data requirements drop — that's how "for everyone" works.
In 1961, Charles Stein proved something counterintuitive: estimating many things at once is MORE accurate than estimating each separately. Even if they're unrelated. This is the James-Stein estimator, and it's the mathematical foundation of Thagorus's network effect.
Today, only companies with massive data teams can do causal demand modeling. Thagorus inverts this. A small DTC brand selling $2M/year has no chance of building these models alone. But with James-Stein shrinkage, a new partner joining with even 6 months of data in a single category immediately gets causal estimates informed by every other partner in the network.
The minimum data requirement drops as the network grows. A food truck in Austin and a $5B retailer both benefit from the same causal graph. The food truck couldn't build this alone in a hundred years. But it doesn't have to — the network already did the work.
The shrinkage factor for partner i is:
λi = σ2i / (σ2i + τ2)
Where σ2i is the individual partner's estimation variance, and τ2 is the between-partner variance. As the network grows, τ2 shrinks and every partner's estimates get pulled toward a more accurate group mean. Empirical Bayes estimates show 40–60% variance reduction vs. partner-only estimation.
What this unlocks at scale
| Network size | Minimum partner data | What becomes possible |
|---|---|---|
| 5 partners | 12 months, 20+ categories | DMA-level weather effects for similar businesses |
| 50 partners | 6 months, 5+ categories | County-level effects; cross-category signals |
| 500 partners | 90 days, 1+ category | Instant causal estimates for any weather-sensitive business |
| 5,000+ partners | 30 days, any category | Real-time demand nowcasting; economic forecasting for everyone |
Weather changes daily. Sales happen daily. Thagorus's causal learning loop runs every 24 hours — faster than any consulting engagement, quarterly review, or annual planning cycle. A competitor starting today faces the same cold-start problem we faced, but we've been compounding daily signal across a growing partner network. The advantage isn't the model architecture (which is published science). The advantage is the accumulated daily learning that no one can fast-forward.
Stein's Paradox says that estimating many things simultaneously is always more accurate than estimating each one alone. This isn't a business strategy — it's a mathematical theorem. The company with the most partner data will have the most accurate estimates, provably. A competitor with half the partners doesn't get half the accuracy — they get worse than half because they can't borrow as much statistical strength. This creates a natural winner-take-most dynamic driven by mathematics, not just economics.
At sufficient network density, Thagorus becomes the default infrastructure for demand intelligence — the way Stripe became the default for payments or Twilio for communications. Not because it's cheaper, but because the network effects make it categorically better than anything you could build yourself, regardless of how much you spend.
The platform play
Three layers, each serving different customers from the same causal graph.
Layer 1: Data platform. The causal graph becomes the most comprehensive real-time map of American consumer demand.
Layer 2: Intelligence marketplace. Partners share anonymized signals; a $49/mo food truck gets Fortune 500 calibration.
Layer 3: Economic infrastructure. Hedge funds and central banks subscribe to real-time economic indicators from the same graph.
Thagorus starts as a product (causal demand signals) but becomes a platform as the network grows. The same pattern that made Stripe inevitable for payments and Twilio inevitable for communications.
Layer 1: Third-party developers, analytics firms, and financial institutions build on top of the causal graph via API.
Layer 2: Partners opt-in to share anonymized, aggregated signals. A $49/mo food truck gets demand insights calibrated by Fortune 500 grocery data. The Fortune 500 gets granularity from thousands of small businesses filling geographic gaps.
Layer 3: Hedge funds, government agencies, and central banks subscribe to GDP nowcasting, regional consumer confidence, sector rotation signals. This is a new asset class built on the same data that powers the $49/mo dashboard.
What the causal engine powers
| Decision Domain | Value |
|---|---|
| Ad spend allocation | Optimal budget across channels & geos, causally identified |
| Dynamic pricing | Price elasticity estimates in demand context |
| Inventory positioning | Pre-position ahead of demand surges before competitors react |
| Product decisions | Which SKUs to promote by demand regime |
| Financial signals | Real-time consumer spending indicators for hedge funds & macro |
| Economic nowcasting | County-level GDP estimation from transaction + environmental signals |
The causal demand graph becomes valuable to industries far beyond performance marketing:
| Customer Segment | What They Pay For | Why It’s Unique |
|---|---|---|
| Hedge funds & quant traders | Real-time alternative data on consumer spending | Causally-identified — not scraped, not correlative |
| Insurance & reinsurance | Granular weather-economic impact models | County-level loss exposure calibrated to actual outcomes |
| Supply chain & logistics | Demand forecasts for inventory pre-positioning | Causal signals 7–14 days ahead of traditional indicators |
| Commercial real estate | Location intelligence | Multi-category demand maps that reveal site potential |
| Government & central banks | Economic nowcasting | Monthly GDP with 2-month lag → daily county-level in real time |
| CPG & food companies | Category-level demand planning | Weather × geography × category interactions at scale |
| Small businesses ($49/mo) | Causal intelligence they could never build alone | Network does the work — 90 days of data is enough |
Each segment represents a distinct revenue line. A hedge fund paying $50K/year for alpha signals and a food truck paying $49/month both pull from the same causal graph — and both contribute to it. That's the platform.
Pricing
Like ChatGPT made AI accessible, Thagorus makes causal business intelligence accessible. Marginal cost to serve: $3–$15/month. We price for adoption velocity and network growth.
- 5 categories, your region
- Weekly demand signals
- Connect Shopify or CSV
- Causal demand dashboard
- 50 categories, all DMAs
- Daily causal demand signals
- Full channel coverage
- Slack & email alerts
- 200 categories, all DMAs
- Real-time causal signals
- Budget optimization engine
- API access + scenario planning
- Unlimited categories
- Custom model calibration
- White-label & VPC deploy
- SLA + performance fee option
Unit economics
| Company | Funding | Valuation | Relevance |
|---|---|---|---|
| Scale AI | $600M+ | $13.8B | Data infra for AI; network effects in labeling |
| Measured | $47M | ~$200M | Closest comp — incrementality for ad spend |
| Recast | $18M | ~$80M | Bayesian MMM as a service; Series A 2023 |
| Tomorrow.io | $190M | ~$1B | Weather intelligence platform; launched satellite |
| Round | Target | Gate | Use of Funds |
|---|---|---|---|
| Pre-seed | $500K–$1.5M | Synthetic proof + 3 design partners | Founder, compute ($13K local + $7K cloud), pipeline |
| Seed | $2M–$5M | v1 live; 5+ partners; 2+ case studies | Team (3–4), sales, dedicated compute |
| Series A | $10M–$20M | $500K+ ARR; 20+ partners; v2 deployed | GTM, enterprise, R&D, in-house GPU cluster |
| Series B | $40M–$80M | $5M+ ARR; 200+ partners; demand intelligence | Vertical expansion, API platform, new instruments |
| Series C / Growth | $100M+ | $25M+ ARR; 1,000+ partners; economic forecasting | Gov / finance verticals, international, R&D lab |
| At scale | — | 5,000+ partners; real-time economic graph | The economic forecasting platform for everyone |
The real-time economic graphThe vision
The real-time economic
graph for the world.
Weather-sensitive products are the wedge — sunscreen, cold medicine, patio furniture, winter gear. They have the strongest causal signal, the fastest feedback loops, and the clearest ROI. But the model architecture generalizes. Once the causal estimation pipeline works for sunscreen in Phoenix, the same instrumental variable framework applies to HVAC parts in Chicago, energy drinks in Miami, or umbrella inventory in Seattle. The wedge is narrow; the platform is broad.
At 20 partners you have weather intelligence. At 200 you have demand intelligence. At 2,000 you have something that doesn't exist yet — a causally-identified, real-time model of how the economy actually works, at the resolution of individual counties and categories, updated daily. The Fed gets monthly aggregates with a two-month lag. Thagorus's partners will see it happening in real time.
That's what "economic forecasting for everyone" means. Not a dashboard. Not a prediction. A living, causal understanding of why people buy what they buy, where, and when — available to every company willing to contribute their signal to the graph. Demand planning is going to have its AlphaFold moment. We're building the Protein Data Bank.
nate@schmiedehaus.com →References
- Power, A. et al. (2022). "Grokking: Generalization beyond overfitting on small algorithmic datasets." arXiv:2201.02177.
- Lee, J. et al. (2024). "Grokfast: Accelerated Grokking by Amplifying Slow Gradients." arXiv:2405.20233.
- Nakkiran, P. et al. (2021). "Deep Double Descent." JSTAT. OpenAI.
- Belkin, M. et al. (2019). "Reconciling modern ML practice and the classical bias-variance trade-off." PNAS 116(32).
- Hoffmann, J. et al. (2022). "Training Compute-Optimal Large Language Models." DeepMind.
- Shi, J. et al. (2024). "Scaling Law for Time Series Forecasting." NeurIPS 2024.
- Das, A. et al. (2024). "A Decoder-Only Foundation Model for Time-Series Forecasting." Google. (TimesFM)
- Woo, G. et al. (2024). "Unified Training of Universal Time Series Forecasting Transformers." Salesforce. (MOIRAI)
- Ansari, A. F. et al. (2024). "Chronos: Learning the Language of Time Series." Amazon.
- Lam, R. et al. (2023). "Learning skillful medium-range global weather forecasting." Science. Google DeepMind. (GraphCast)
- Price, I. et al. (2024). "GenCast: Diffusion-based ensemble forecasting for medium-range weather." Google DeepMind.
- Pathak, J. et al. (2022). "FourCastNet: A Global Data-driven High-resolution Weather Forecasting Model." NVIDIA.
- Bi, K. et al. (2023). "Accurate medium-range global weather forecasting with 3D neural networks." Nature. Huawei. (Pangu-Weather)
- Stock, J. H. & Yogo, M. (2005). "Testing for Weak Instruments in Linear IV Regression." Cambridge UP.
- James, W. & Stein, C. (1961). "Estimation with Quadratic Loss." Fourth Berkeley Symposium.
- Chernozhukov, V. et al. (2018). "Double/Debiased ML for Treatment and Structural Parameters." Econometrics Journal.
- Hartford, J. et al. (2017). "Deep IV: A Flexible Approach for Counterfactual Prediction." ICML 2017.
- Nanda, N. et al. (2023). "Progress measures for grokking via mechanistic interpretability." ICLR 2023.
- Heckel, R. & Yilmaz, F. F. (2024). "Regularization-wise double descent." ICLR 2024.
- Bessemer (2025). "The AI pricing and monetization playbook." bvp.com/atlas.
- Jumper, J. et al. (2021). "Highly accurate protein structure prediction with AlphaFold." Nature 596, 583–589.
- Conviction (2025). "Plausible Schemes: Measured Physics." conviction.com/startups.html.