What Becomes Possible When You Can See the Whole Economy?
Open questions at the intersection of foundation models, causal inference, and large-scale economic data
We are building infrastructure to hold daily transaction data, advertising spend, pricing, inventory levels, and hundreds of exogenous signals from thousands of businesses across thousands of geographies. The goal is not a dashboard or a forecast product—it is a shared scientific instrument for studying how economies actually work at the level of individual purchasing decisions.
This dataset does not exist anywhere yet. The closest analogues—scanner panel data from Nielsen or IRI, credit card transaction aggregates from financial institutions—capture fragments of the picture. None combine the granularity of individual business daily sales with the breadth of hundreds of exogenous covariates across thousands of locations. The result is a set of questions nobody has been able to ask.
The feature space alone suggests the scale of the opportunity: nine signal groups—raw weather, derived weather features, sales and demand, advertising performance, temporal and calendar effects, inventory and supply chain, promotions and pricing, macroeconomic indicators, and cross-entity network features—produce roughly 260 features per entity-day observation. Across 3,000 counties, 500 product categories, and 365 days, the annual observation count exceeds half a billion rows.
Stylized visualization of data streams converging from multiple businesses into a unified economic observatory. Warm palette, editorial quality, 16:9 aspect ratio.
Scaling Laws for Economic Prediction
Can time-series foundation models—TimesFM, Chronos, Lag-Llama—predict individual business daily sales? Is there an equivalent of neural scaling laws (Kaplan et al., 2020) for economic prediction: a log-linear relationship between data scale and forecast accuracy?
The question is not whether more data helps (it does), but whether there exists a phase transition: a threshold of data density where the model acquires qualitatively new forecast capabilities, analogous to the emergent abilities observed in large language models. If such a transition exists, identifying it would reshape how we think about the economics of data aggregation.
Cross-entity scaling raises distinct questions from within-entity scaling. Adding more historical days for a single business improves temporal pattern recognition. Adding more businesses at the same temporal depth introduces cross-sectional variation that could enable transfer learning and structural generalization. How these two dimensions of scale interact is an open empirical question.
Scaling curve: log-log plot of data volume vs. forecast error across entity count and temporal depth. Annotated inflection point where cross-entity transfer begins to dominate.
Automated Causal Discovery at Scale
With hundreds of potential instruments and thousands of entities, can we automate the discovery of valid instrumental variables? The classical approach—economic theory motivates a specific instrument, then the researcher tests its validity—does not scale to hundreds of categories across thousands of geographies.
Weather is one natural experiment generator, but so are local events, policy changes, supply shocks, and seasonal boundaries. The question is whether we can build a systematic search engine for causal instruments: given an outcome variable and a set of candidate instruments, automatically test relevance (first-stage F-statistic), exclusion (falsification tests), and monotonicity conditions.
Wright (1928) introduced instrumental variables in the context of supply and demand identification. Peters, Janzing, and Schölkopf (2017) formalized causal discovery from observational data. The gap between these traditions—econometric identification and machine-learning-based causal discovery—is where the most productive new methods are likely to emerge.
Explore how instrumental variables isolate causal effects from confounded correlations.
Latent Demand States and Hidden Economic Variables
Can deep generative models—variational autoencoders, diffusion models, normalizing flows—reveal latent demand states from transaction data that we do not currently measure or name?
Consider a concrete puzzle: 500 businesses in Phoenix all show the same unexplained demand dip on the same Tuesday. Nothing in the observed covariates—weather, holidays, promotions—explains it. Can a model trained on the full cross-section learn to identify the hidden cause? And if it can, does the latent representation it discovers correspond to something interpretable—a local event, a supply chain disruption, a shift in consumer sentiment?
Kingma and Welling (2014) introduced the VAE framework. The broader question connects to representation learning: can we learn economic state representations that are more predictive than hand-engineered features? And can we do so in a way that preserves interpretability—so that a latent dimension can be traced back to a real-world cause, not just a statistical artifact?
Latent variable diagram: observed transaction data projected into a lower-dimensional latent space, with clusters color-coded by discovered demand states. Arrows connect latent dimensions to interpretable real-world causes.
Calibrating Agent-Based Models with Real Transaction Data
Agent-based models have been theoretically promising for decades but chronically data-starved. The Santa Fe Institute tradition (Arthur, 2021; Farmer & Foley, 2009) demonstrated that simple agent rules can produce complex emergent dynamics—but calibrating these models to real economic behavior has been difficult.
With daily transaction data from thousands of firms, the calibration problem becomes tractable. Each firm is an agent with observable decision rules (pricing, advertising, inventory management) and observable outcomes (sales). Can we fit agent-based models that reproduce not just aggregate market dynamics but the cross-sectional distribution of firm-level outcomes?
The payoff: realistic ABMs could simulate counterfactual policies—what happens if a major retailer exits a market? How do supply chain disruptions cascade through local economies? These questions are unanswerable with aggregate time-series models. They require models that capture the heterogeneity of individual firm behavior and the network structure of local market interactions.
Agent network figure: stylized local economy with firms as nodes, transaction flows as directed edges, and a simulated shock propagating through the network. Before/after comparison panels.
Revealed Preferences and Cognitive Biases at Scale
What does revealed preference data—millions of actual purchasing decisions—tell us about consumer behavior that surveys and lab experiments cannot? Traditional behavioral economics relies on small-scale experiments with undergraduates. Large-scale transaction data offers a complementary approach: observing cognitive biases in the wild, at population scale.
Can we identify anchoring, loss aversion, and availability heuristics from purchase patterns across weather conditions? When a heat wave breaks, do consumers overshoot their cooling purchases in a pattern consistent with the availability heuristic? Does the framing of weather forecasts (probability of rain vs. hours of sunshine) affect category-level demand in ways consistent with prospect theory?
The methodological challenge is distinguishing cognitive biases from rational responses to changing information. A consumer who stockpiles bottled water before a predicted hurricane may be exhibiting loss aversion—or may be making a perfectly rational calculation about supply chain disruption. Separating these explanations requires the kind of cross-sectional variation that only a large-scale, multi-geography dataset can provide. (Thaler, 2015; Kahneman, 2011; Mullainathan & Spiess, 2017.)
How weather events create purchase justifications that reveal cognitive biases at population scale.
Mapping Causal Demand Relationships Across Categories
Cross-category causal relationships are everywhere: when sunscreen demand spikes, aloe vera follows. But is the relationship causal, or are both driven by a common cause (UV exposure)? Can we build a directed graph of causal demand relationships across all product categories—and what does the topology of this graph reveal about economic structure?
The technical challenge is distinguishing genuine causal chains (sunscreen purchase → sunburn → aloe vera purchase) from confounded associations (UV → both). Weather variation again provides leverage: differential timing of weather shocks across geographies can identify which demand relationships are causal through Granger-type tests on the residuals after controlling for common weather exposure.
A complete demand graph would have profound practical implications: it would enable anticipatory inventory management (stock aloe vera when sunscreen sells) and reveal the hidden structure of consumer behavior. The topology itself is scientifically interesting—are demand networks scale-free? Do they exhibit small-world properties? Are there critical nodes whose disruption cascades through the entire graph?
Visualize how demand shocks propagate through interconnected product categories in a local economy.
Demand graph diagram: directed acyclic graph of product categories with edge weights representing causal demand transfer coefficients. Color-coded by category cluster (food, personal care, hardware, seasonal). Hub-and-spoke topology visible.
Transaction Data as an Economic Nervous System
Can real-time transaction data from thousands of businesses function as an economic nervous system—detecting regime changes, recessions, and supply shocks before traditional indicators report them?
Traditional economic indicators—GDP, unemployment, consumer confidence—are published with weeks or months of lag. Daily transaction data, aggregated across businesses and geographies, could provide near-real-time visibility into economic conditions. The question is whether the signal-to-noise ratio is sufficient: can we detect a demand collapse 48 hours before aggregate statistics report it?
Aruoba, Diebold, and Scotti (2009) pioneered high-frequency business conditions indices. The Google Trends nowcasting literature demonstrated that digital exhaust can predict economic indicators. Transaction data is a more direct signal—it captures actual spending decisions rather than search intent. The open question is whether the coverage and representativeness of a voluntary business panel is sufficient to serve as a reliable leading indicator, or whether selection bias in panel composition introduces systematic distortions.
Measuring Climate Adaptation in Real Time
Consumer behavior is already adapting to warming trends. Are Phoenix residents responding to 115°F the way they responded to 105°F a decade ago? The adaptation question is central to climate economics but difficult to study with aggregate data.
Daily transaction data across geographies with different climate trajectories offers a natural experiment: we can compare demand responses to identical weather events in regions with different historical baselines. If Phoenix consumers are habituated to heat in ways that Houston consumers are not, the demand response functions should differ—and the difference measures adaptation.
Forward-looking climate models (CMIP6) could extend demand forecasts from the 10-day weather horizon to seasonal and annual horizons, enabling long-range demand planning that accounts for shifting climate normals. The convergence of weather forecasting, climate science, and demand modeling opens a research frontier that none of these fields can address independently.
How weather forecasting, climate science, and demand modeling converge over different time horizons.
Climate adaptation heatmap: demand response elasticity to extreme heat events (y-axis: temperature threshold, x-axis: year) across Phoenix, Houston, Chicago, and Miami. Visible adaptation gradient in heat-exposed cities vs. gradual sensitization in historically temperate regions.
Causal Pathways: How Weather Reshapes Economic Decisions
Weather doesn't just shift demand quantities—it reshapes how people make decisions. Heat stress reduces cognitive function (Graff Zivin & Neidell, 2014), affecting not just what people buy but how rationally they buy it. The implications for economic modeling are profound: weather is not merely a demand shifter but a modifier of the decision process itself.
Are weather-demand relationships stable across decades, or do they drift as climate normals shift? If relationships are non-stationary, models trained on historical data face a fundamental validity challenge. Identifying the rate of drift—and whether it is predictable—is critical for any long-horizon application. A model that assumes stationarity in a non-stationary world will silently degrade, producing confident forecasts that are systematically wrong.
Separating seasonal from weather effects is a deceptively difficult identification problem. Can we disentangle the effect of daylight hours on mood-driven consumption from the effect of temperature on comfort-driven consumption? These pathways have different policy implications and different response curves. A retailer who conflates them will misattribute seasonal demand to weather and make suboptimal inventory decisions during anomalous years.
Extreme events and economic memory: does a hurricane permanently change consumer behavior in a region, or do patterns revert? The persistence of demand shocks after extreme weather events reveals something about the deep structure of consumer habits. If a Category 4 hurricane causes permanent shifts in bottled water purchasing—years after the event—this tells us that consumer behavior has a longer memory than most demand models assume. The half-life of weather-induced behavioral change is an empirical question with direct implications for forecast model architecture.
Geographic arbitrage: if the same weather event produces different demand responses in different regions, what does this reveal about the underlying economic structure? Regional differences in weather sensitivity may proxy for differences in infrastructure, culture, or market structure that are otherwise invisible. A 95°F day in Portland produces a different demand response than a 95°F day in Dallas—and the difference encodes information about air conditioning penetration, outdoor culture, retail infrastructure, and dozens of other structural variables that are difficult to observe directly.
Multi-pathway causal diagram: weather conditions branching into cognitive, physiological, logistical, and social pathways, each leading to distinct demand effects. Annotated with example product categories affected by each pathway.
Applications on the Horizon
If the research questions above can be answered—even partially—a set of applications becomes possible that does not exist today. These are not product roadmap items. They are capabilities that would emerge from a sufficiently deep understanding of how weather, economics, and human behavior interact at scale.
Climate-adaptive supply chains. Automated inventory pre-positioning based on 14-day probabilistic weather forecasts crossed with learned demand response functions. Instead of reacting to demand after it materializes, supply chains could anticipate shifts before the weather event occurs.
Weather-indexed financial products. Retail weather derivatives calibrated to actual category-level demand elasticities, not just temperature thresholds. Current weather derivatives are blunt instruments—they pay out based on whether temperature crosses a line. Derivatives calibrated to real demand response curves would be far more useful for hedging actual business risk.
Environmental demand pricing. Real-time price optimization that accounts for weather-driven demand shifts. Dynamic pricing that responds to weather is already common in ride-sharing and energy markets. Extending it to retail categories with well-characterized weather-demand relationships could reduce waste and improve allocation.
Smart grid co-optimization. Linking energy demand forecasts with retail demand forecasts—both weather-driven—for coordinated infrastructure planning. A heat wave that drives air conditioning load also drives beverage demand, ice cream sales, and pool supply purchases. Modeling these jointly could improve resource allocation across the entire economy.
Public health early warning. Weather patterns that predict spikes in medication demand, emergency room visits, or mental health service utilization—weeks before they materialize. The connection between weather and health outcomes is well established; the connection between weather and health-related purchasing is a leading indicator that arrives before the health system sees the patients.
Climate migration modeling. As populations shift in response to climate change, how do regional demand patterns restructure? Migration changes the composition of local markets. Transaction data could detect these shifts in real time, long before census data captures them.
Weather-aware advertising infrastructure. Ad platforms that adjust bids, creative, and targeting based on real-time weather × demand predictions. Advertising dollars spent during weather conditions that suppress category demand are largely wasted. Reallocating them to conditions that amplify demand is a straightforward efficiency gain—once you have the demand response functions.
The deeper question: if we can model how weather causally drives economic behavior at fine granularity, what does this tell us about the economic cost of climate change that aggregate GDP models miss? Hsiang et al. (2017) and Dell, Jones, and Olken (2014) estimated climate damages using country-level and state-level data. Firm-level transaction data could reveal the micro-structure of climate damages—which categories, which regions, which business types bear the costs—in a way that aggregate approaches cannot.
Privacy-Preserving Multi-Firm Research
Can federated learning achieve the statistical benefits of multi-firm data pooling without requiring raw data sharing? The privacy-utility tradeoff is central: differential privacy guarantees come at a cost to model accuracy, and the acceptable tradeoff depends on the research question.
Secure multi-party computation offers an alternative: firms jointly compute aggregate statistics without any party observing another's data. The computational overhead is significant but declining. The question is whether current MPC protocols are efficient enough for the scale of computation required—gradient updates across models with millions of parameters, iterated over billions of observations.
The design goal: a shared economic observatory where participating firms contribute to collective knowledge while maintaining complete control over their proprietary data. The analogy is a telescope array—each instrument contributes signal, the combined array resolves structure that no individual instrument could see, but no observatory surrenders its hardware to the consortium.
The Missing Sensors
What data sources don't exist yet but would be transformative? Real-time foot traffic at business-level granularity, indoor climate conditions, supply chain visibility beyond tier-1 suppliers, ZIP-code-level social sentiment, and wearable biometric data correlated with purchasing decisions.
A purpose-built economic weather station would combine: satellite imagery for parking lot occupancy, point-of-sale transaction streams, social media sentiment analysis, weather station telemetry, traffic flow sensors, and air quality monitors. The question is which of these data streams provides the largest marginal improvement in demand prediction—and at what cost. The information value of each sensor depends on what else is already in the model, so the optimal instrumentation strategy requires understanding the full covariance structure of the existing feature space.
Economic weather station illustration: cutaway view of a local economy instrumented with overlapping sensor networks—satellite, POS, social, weather, traffic, air quality—converging into a unified data pipeline. Blueprint aesthetic, warm tones.
Research Partnership Structure
This research depends on real transaction data from real businesses. We are seeking design partners who are interested in contributing data to—and benefiting from—a shared scientific effort.
Data needed: Daily sales by geography and product category, advertising spend by channel, basic product metadata. No customer PII required. The minimum useful contribution is one year of daily data across at least 10 geographies. More data enables more questions, but even a single-category, single-region dataset contributes to the cross-sectional structure that makes causal identification possible.
Partners receive: Access to research findings, causal demand estimates for their categories and geographies, potential co-authorship on published work, and early access to research tools and model outputs. Partners also gain visibility into the broader cross-sectional patterns that their data alone cannot reveal.
What we are not promising: This is not a product demo and not a guaranteed ROI calculation. It is a research collaboration aimed at answering open scientific questions with real data. Some of those questions may lead to commercially valuable insights. Others may lead to interesting null results. The commitment is to intellectual honesty, not to confirming any particular hypothesis.
Timeline: 6–12 months for initial results, with intermediate findings shared as they emerge. The first deliverable is a causal demand model for each partner's categories and geographies, benchmarked against their existing forecasting methods.
Interested in collaborating?
If you have transaction data and open questions about demand, we'd like to hear from you.
nate@schmiedehaus.comReferences
- Kaplan, N., McCandlish, S., Henighan, T., Brown, T.B., et al. (2020). "Scaling Laws for Neural Language Models." arXiv:2001.08361.
- Athey, S. & Imbens, G.W. (2019). "Machine Learning Methods that Economists Should Know About." Annual Review of Economics.
- Kleinberg, J., Ludwig, J., Mullainathan, S. & Obermeyer, Z. (2015). "Prediction Policy Problems." American Economic Review.
- Mullainathan, S. & Spiess, J. (2017). "Machine Learning: An Applied Econometric Approach." Journal of Economic Perspectives.
- Arthur, W.B. (2021). "Foundations of Complexity Economics." Nature Reviews Physics.
- Farmer, J.D. & Foley, D.K. (2009). "The economy needs agent-based modelling." Nature.
- Peters, J., Janzing, D. & Schölkopf, B. (2017). Elements of Causal Inference. MIT Press.
- Mellon, J. (2021). "Rain, Rain, Go Away: 176 potential exclusion-loss violations for studies using weather as an instrumental variable."
- Wright, P.G. (1928). The Tariff on Animal and Vegetable Oils.
- Card, D. & Krueger, A.B. (1994). "Minimum Wages and Employment." American Economic Review.
- Google DeepMind (2024). TimesFM: Time Series Foundation Model.
- Google DeepMind (2024). GenCast: Diffusion-based ensemble forecasting.
- Aruoba, S.B., Diebold, F.X. & Scotti, C. (2009). "Real-Time Measurement of Business Conditions." Journal of Business & Economic Statistics.
- Thaler, R.H. (2015). Misbehaving: The Making of Behavioral Economics. W.W. Norton.
- Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux.
- Graff Zivin, J. & Neidell, M. (2014). "Temperature and the Allocation of Time." Journal of Labor Economics.
- Hsiang, S., et al. (2017). "Estimating economic damage from climate change in the United States." Science.
- Dell, M., Jones, B.F. & Olken, B.A. (2014). "What Do We Learn from the Weather?" Journal of Economic Literature.
- Kingma, D.P. & Welling, M. (2014). "Auto-Encoding Variational Bayes." ICLR.