TECH-EXTRA: Worlds Over Words

How World Models Are Repricing the $7.6 Trillion AI Infrastructure Thesis

May 21, 2026

Welcome to Silicon Sands News, the go-to newsletter for investors, senior executives, and founders navigating the intersection of AI, deep tech, and innovation. Join ~35,000 industry leaders across all 50 U.S. states and 117 countries, including top VCs from Sequoia Capital, Andreessen Horowitz (a16z), Accel, NEA, Bessemer Venture Partners, Khosla Ventures, and Kleiner Perkins. Our readership also includes decision-makers from Apple, Amazon, NVIDIA, and OpenAI.

In this month’s TECH-EXTRA, we examine the world model revolution. Goldman Sachs, Yann LeCun, Fei-Fei Li, and a new generation of frontier labs argue that the next decisive leap in AI will not come from making language models bigger. It will come from teaching machines to understand how the world actually works. We unpack what that means for every infrastructure forecast, enterprise AI budget, and startup thesis you currently hold.

Let’s Dive Into It ….

Key Takeaways

For VCs and LPs

Infrastructure repricing is the trade: Goldman Sachs explicitly states that current AI infrastructure supply-and-demand forecasts, the ones driving data center, chip, and power investment theses, are built around transformer-based LLMs and do not account for world model compute demands. If world models scale as expected, the entire infrastructure stack needs to be repriced upward.
The funding wave has already started: In a twelve-week window spanning February to March 2026, world model startups raised over $3.2 billion. World Labs raised $1 billion. AMI Labs raised $1.03 billion. Wayve closed a $1.2 billion Series D. This is not a theme. It is a category formation event.
Architectural divergence creates multiple bets: the world model space is not a single-horse race. Generative world models (Wayve, NVIDIA), predictive embedding models (AMI Labs, Meta), spatial intelligence models (World Labs), and deterministic physics-constrained models (ARYA Labs) represent fundamentally different architectural bets with different risk profiles, timelines, and enterprise fit.
The enterprise adoption curve is steep yet slow: Goldman Sachs cautions that investment in world models will remain a fraction of total AI spend in the near term. The opportunity for VCs is in the picks-and-shovels layer: compute orchestration, synthetic data pipelines, evaluation infrastructure, and domain-specific deployment tooling, not just the frontier labs themselves.
Safety architecture is the new moat: In regulated industries (defense, pharma, aerospace, medical devices) the ability to provide mathematical certainty rather than probabilistic outputs is a genuine competitive differentiator. Companies that have built safety into their architecture, not as a policy layer, are positioned for markets where hallucination is not an acceptable failure mode.

For Senior Executives

Your current AI vendor is not building this: The world model paradigm requires fundamentally different training data, compute profiles, and evaluation frameworks than the LLMs your enterprise currently deploys. The vendors you rely on today are not necessarily the vendors who will lead this transition.
The ROI calculus changes: LLMs generate text. World models generate foresight. For industries where decisions have physical consequences (manufacturing, logistics, energy, healthcare, aerospace) the ability to simulate outcomes before committing resources is a qualitatively different value proposition than a chatbot.
Synthetic data becomes strategic infrastructure: World models are both consumers and producers of synthetic data. Organizations that build proprietary simulation environments and synthetic data pipelines today will have a structural training advantage over competitors who rely on public data in the world model era.
Regulatory tailwinds favor deterministic approaches: The EU AI Act and emerging US AI governance frameworks are creating compliance pressure around explainability and auditability. World models that deliver mathematical proofs rather than probabilistic outputs are better positioned for regulated deployment contexts.
The timeline is longer than the hype suggests: AMI Labs CEO Alexandre LeBrun has publicly stated that world models could take years to move from fundamental research to commercial applications. Plan for a three to five year horizon before world models are broadly production-ready, while beginning infrastructure and data strategy work now.

For Founders

The picks-and-shovels opportunity is now: The frontier labs are raising billions, but the enabling infrastructure (world model evaluation benchmarks, synthetic data generation pipelines, domain-specific fine-tuning tooling, and deployment orchestration) remains largely unbuilt. This is where the near-term startup opportunity lives.
Domain specificity beats generality: The world model space rewards deep domain expertise. A world model for pharmaceutical manufacturing that understands reaction kinetics is more valuable than a general-purpose world model that knows a little about everything. Founders with domain expertise in high-stakes verticals should be building now.
The architecture you choose is a strategic commitment: Generative, predictive, and deterministic world model architectures have fundamentally different compute requirements, data needs, and enterprise fit. Choosing the wrong architecture for your target market is not a pivotable mistake. It is a rebuild.
Open-source is a go-to-market strategy: AMI Labs has committed to open-sourcing code and publishing research. NVIDIA Cosmos is already open-source. Founders building on top of open-world model foundations can move faster but need to differentiate at the application and domain layers.
Safety is not a feature; it is a market: In defense, aerospace, medical devices, and critical infrastructure, the ability to guarantee that an AI system will not violate physical laws or produce unsafe outputs is not a nice-to-have. It is the product. Founders who treat safety as an architectural constraint from day one, rather than a compliance checkbox, are building for the largest and most defensible markets.

The Problem That $3 Trillion Cannot Solve

There is a question that has haunted artificial intelligence research for decades, and it is deceptively simple: Does the machine actually understand what it is doing?

For the past five years, the answer from the industry has been a qualified yes, or at least, a confident enough yes to justify $581.7 billion in global corporate AI investment in 2025 alone, a figure that Stanford’s 2026 AI Index reports was up 130 percent from the prior year. Large language models have passed bar exams, written production code, diagnosed diseases, and generated enough text to fill every library on earth several times over. The case for their intelligence seems, on the surface, overwhelming.

But Yann LeCun, Turing Award winner, former Meta Chief AI Scientist, and arguably the most credentialed critic of the LLM paradigm, has been making a different argument for years. In early 2026, the market started listening. His core claim is not that LLMs are useless. It is that they are fundamentally limited by what they were trained on: text. Language models learn the statistical relationships between words. They do not learn how gravity works, how a bridge fails under load, how a drug molecule interacts with a protein, or how a supply chain collapses when a port closes. They learn how people describe these things, which is a very different kind of knowledge.

“Language models predict text, not physical reality,” LeCun told MIT Technology Review in January 2026, “and that gap limits what they can do for industries that run on physics, not prose.” The distinction sounds philosophical. It is, in fact, deeply practical. An LLM asked to predict whether a manufacturing process will produce a defective batch can only draw on text descriptions of similar situations. A world model trained on the actual physics of that manufacturing process (temperature gradients, material flow rates, mechanical tolerances) can simulate the process forward in time and identify failure modes before they occur. The difference between those two capabilities is the difference between a consultant who has read about your factory and an engineer who has run the simulation.

Goldman Sachs made the institutional version of this argument in a landmark April 2026 report titled “When AI Learns How the World Works.” Written by George Lee and Dan Keyserling of the Goldman Sachs Global Institute, the report argues that the AI frontier is shifting from pattern recognition to genuine world understanding, and that this shift has not yet been priced into the consensus infrastructure forecasts that are currently driving hundreds of billions of dollars in data center, chip, and power investment. “The demands and opportunities surrounding world models,” the report states, “are not yet reflected in consensus supply-and-demand forecasts for AI infrastructure that are primarily focused on transformer-based LLMs.”

That is a remarkable sentence from Goldman Sachs. It is not saying world models are interesting research. It is saying that the entire infrastructure investment thesis, the one underpinning NVIDIA’s valuation, the hyperscaler CapEx cycle, and the data center construction boom, is undersized for what comes next.

From Craik to the $1 Trillion Moment

The concept of a world model is not a recent invention of the deep learning era. Its intellectual lineage stretches back to the foundational days of cybernetics and cognitive science, and understanding that lineage is essential for grasping why the current investment wave is not hype but a long-deferred reckoning with a fundamental limitation of AI.

In 1943, Cambridge psychologist Kenneth Craik published The Nature of Explanation, a seminal work in early cognitive science. Craik proposed that the human mind constructs “small-scale models” of reality that it uses to anticipate events, reason about alternatives, and make decisions. This was a radical departure from the behaviorist models of the time, which viewed intelligence primarily as a set of learned stimulus-response associations. Craik argued that true intelligence requires internal simulation: the ability to run a model of the world inside the mind before acting in the world outside it. A chess player does not wait to see what happens when they move a piece; they simulate the consequences in their head first. A surgeon does not learn by trial and error on patients; they build an internal model of anatomy and physiology that allows them to plan interventions in advance.

This idea lay dormant in artificial intelligence for decades, largely because the computational resources required to build such models were unimaginable. Early AI focused on symbolic logic and expert systems, attempting to hardcode the rules of the world rather than allowing a system to learn them from experience. These approaches failed because the real world is too complex, noisy, and ambiguous to be captured by rigid logical rules. The “knowledge bottleneck,” the impossibility of manually encoding all the knowledge required to navigate the real world, was the central failure mode of first-generation AI.

The modern era of world models in AI began in 1990 with Jürgen Schmidhuber’s paper, “Making the World Differentiable: On Using Self-Supervised Fully Recurrent Neural Networks for Macroscopic Planning and Reinforcement Learning.” Schmidhuber proposed a system consisting of two neural networks: a “controller” that takes actions, and a “world model” that predicts the consequences of those actions. By training the world model to predict the environment's next state, the controller could learn to plan by simulating futures internally, without needing to execute every action in the real world. This was the first computational instantiation of Craik’s cognitive theory.

Despite its theoretical elegance, Schmidhuber’s approach was limited by the neural network architectures and computing available in the 1990s. The world models that could be built were shallow, brittle, and confined to toy environments. It was not until 2018 that the concept truly exploded into the mainstream consciousness of the AI research community. David Ha and Jürgen Schmidhuber published “World Models”, a paper that demonstrated how a relatively simple generative recurrent neural network could learn a compressed spatial and temporal representation of a reinforcement learning environment. The agent could then be trained entirely inside its own “hallucinated” dream of the environment, achieving state-of-the-art performance when transferred back to the actual game. The paper has since accumulated over 2,000 citations and is considered a foundational text in the field.

Between 2018 and 2024, the field saw incremental but vital progress. Danijar Hafner and colleagues introduced the Dreamer series of algorithms (DreamerV1, V2, and V3), which showed that world models could master increasingly complex and diverse domains, from continuous control tasks to Minecraft, using a single algorithm and, remarkably, very little compute. DreamerV3, described in the 2023 paper “Mastering Diverse Domains through World Models”, famously found diamonds in Minecraft using only a single GPU, demonstrating that world model-based reinforcement learning could generalize across radically different environments without task-specific engineering.

Simultaneously, the rise of Large Language Models like GPT-3 and GPT-4 dominated the narrative and the capital flows. LLMs demonstrated astonishing capabilities in natural language processing, reasoning, and code generation. However, as they scaled, their limitations became apparent. They struggled with physical reasoning, spatial relationships, and causal inference. They were, as LeCun frequently pointed out, auto-regressive text predictors, not models of physical reality.

The turning point arrived in late 2024 and early 2025. The limitations of LLMs in physical domains (robotics, autonomous driving, industrial simulation) became a hard ceiling for enterprise adoption in those sectors. Concurrently, breakthroughs in latent diffusion models and Joint Embedding Predictive Architectures (JEPA) provided the technical tools needed to build high-fidelity world models at scale. The theoretical foundation laid by Craik, Schmidhuber, and Ha was finally ready for commercial scale.

What Is a World Model, Actually?

The Goldman Sachs report draws a crucial distinction between two types of world models that are now attracting capital. The first is the physical world model, a system that understands the laws of physics: gravity, friction, heat transfer, fluid dynamics, and electromagnetic fields. These models are the foundation for robotics, autonomous vehicles, industrial simulation, and scientific discovery. The second is the virtual or social world model, a system that understands how agents behave: incentives, norms, market dynamics, and organizational behavior. These models are the foundation for economic forecasting, strategic planning, and complex system optimization.

Both types represent a qualitative leap beyond LLMs, because they move from describing the world to simulating it. “Prediction assumes a single correct outcome,” the Goldman authors write. “World models reveal ranges, paths, and feedback loops.” An LLM can tell you that a supply chain disruption in a particular region historically leads to price increases of a certain magnitude. A world model can simulate the specific disruption you are facing, with the specific suppliers, logistics routes, and inventory positions you actually have, and show you the range of outcomes across thousands of scenarios, including the tail risks that historical averages obscure.

This distinction matters enormously for enterprise decision-making. The most consequential decisions organizations face are precisely the ones where historical averages are least useful: novel situations, tail risks, and complex system interactions. World models are built for exactly these cases.

At a technical level, a world model learns a function that maps the current state of a system, along with an action or intervention, to a predicted future state. In a latent world model, this state is not the raw pixel data of a video or the raw sensor readings of an industrial system, but a compressed, lower-dimensional representation that captures the essential semantic and geometric features of the environment. The model learns an encoder that maps raw observations to this latent state, a transition model that predicts how the latent state evolves over time, and optionally a decoder that maps the predicted latent state back to observable outputs.

The transition model is where the “physics” is learned. In many modern architectures, this is implemented using Recurrent Neural Networks, Long Short-Term Memory networks, or state-space models such as Mamba, which are highly efficient at modeling long-range dependencies in continuous time-series data. The challenge is that the real world is stochastic and partially observable. The model rarely has access to the full, true state of the system. Therefore, the transition model must often predict a distribution over possible future states, rather than a single deterministic outcome.

This mathematical shift, from discrete token prediction to continuous latent-state evolution, is what drives the massive increase in compute requirements highlighted by Goldman Sachs. Simulating these continuous dynamics, especially over long time horizons and across thousands of parallel scenarios, requires immense memory bandwidth and specialized hardware architectures that are distinct from those optimized for LLM inference.

Four Complementary Bets on How to Build Reality

Not all world models are created equal, and the architectural choices made by leading labs today reflect fundamentally different theories about which kind of world model will win. Understanding these distinctions is essential for anyone trying to evaluate the investment landscape or make enterprise technology decisions.

Generative World Models: Seeing the Future

The most visually intuitive approach to world modeling is the generative paradigm: train a model to generate realistic video or image sequences representing future states of the world. If you show the model a car driving down a road, it generates a photorealistic video of what happens next, including edge cases like a child running into the street, a tire blowout, or an unexpected obstacle.

This is the approach taken by Wayve with its GAIA series of models. The most recent, GAIA-3, released in December 2025, is a 15-billion-parameter latent diffusion-based world model designed specifically for autonomous driving evaluation. GAIA-3 generates safety-critical driving scenarios that would be dangerous, expensive, or impossible to collect in the real world, allowing Wayve to evaluate its autonomous driving AI against thousands of edge cases without putting a vehicle on the road. The model operates in a compressed latent space rather than pixel space, which makes it significantly more efficient than earlier generative approaches, but it remains computationally intensive.

The architecture of a generative world model like GAIA-3 involves a multi-stage pipeline. An encoder network takes a sequence of past video frames from the vehicle’s cameras and compresses them into a lower-dimensional latent representation, stripping away irrelevant pixel noise and capturing the essential semantic and geometric features of the scene. The model is then conditioned on various inputs: the ego-vehicle’s current speed and steering angle, the intended route, and potentially text prompts describing the desired scenario. A diffusion model, operating entirely within the latent space, predicts the latent representation of the next frame or sequence of frames. Finally, a decoder network translates the predicted future latent representation back into high-resolution pixel space, producing a photorealistic video of the simulated future.

NVIDIA’s Cosmos platform, released as an open-source suite in early 2026 and updated with Cosmos Predict 2.5 and Transfer 2.5 at GTC 2026 in March, takes a similar generative approach but targets a broader range of physical AI applications. Cosmos includes three core components: Cosmos Predict (a generative world model that simulates future states), Cosmos Transfer (a simulation-to-photoreal transfer model that converts synthetic environments into photorealistic training data), and Cosmos Reason (a reasoning layer for physical AI planning). The platform is designed to serve as a synthetic data factory for robotics, autonomous vehicles, and industrial vision systems, generating the massive, diverse, physics-aware datasets that physical AI models require for training.

The generative approach has a fundamental tension at its core, which LeCun has articulated clearly: the world is not fully predictable, and a model that tries to predict every future pixel will either hallucinate or hedge. “If you try to build a generative model that predicts every detail of the future, it will fail,” he told MIT Technology Review. The model will either produce a blurry mess (averaging all possible futures) or hallucinate a specific, incorrect configuration. In safety-critical applications, a model that typically obeys the laws of physics is not sufficient. It must always obey physics.

Predictive Embedding Models: Understanding Without Generating

LeCun’s alternative is the Joint Embedding Predictive Architecture, or JEPA, which he proposed in 2022 and which forms the technical foundation of AMI Labs. The key insight of JEPA is that you do not need to predict the future in pixel space. You need to predict it in the representation space. Rather than generating a photorealistic video of what happens next, a JEPA model learns to predict the abstract features of the future state: the shape of the scene, the positions of objects, the dynamics of motion, without committing to every irrelevant detail.

The architecture consists of three main components. An x-Encoder takes the current state of the world and encodes it into an abstract representation. A y-Encoder takes the future state of the world (the target) and encodes it into an abstract representation of its own. A Predictor takes the current representation and an action variable (or a latent variable representing unknown factors), and attempts to predict the future representation. The system is trained by minimizing the distance between the Predictor’s output and the actual future representation produced by the y-Encoder. The model never decodes these representations back into pixels.

By operating solely in the embedding space, the model learns to ignore unpredictable, low-level details and focus on the predictable, high-level semantics. This approach is more computationally efficient than generative models, more robust to the world's inherent unpredictability, and more likely to learn abstract, generalizable representations that transfer across domains. Meta AI’s V-JEPA 2, published on arXiv in 2026, demonstrates that this approach produces strong physical understanding without explicit physics supervision: the model learns the constraints of physical reality simply by predicting the future in representation space across a large corpus of video.

The JEPA approach is also more aligned with how human cognition works. We do not simulate the future in photorealistic detail. We simulate it using abstract features and relationships. When a surgeon plans an operation, they are not generating a mental movie of every cell; they are reasoning about anatomical structures, tissue properties, and instrument trajectories in an abstract representation space.

AMI Labs, which raised $1.03 billion at a $3.5 billion pre-money valuation in March 2026, making it Europe’s largest seed round in history, is building JEPA-based world models for applications in healthcare, manufacturing, aerospace, and pharmaceutical development. The company has been explicit that its timeline is long: “AMI Labs is a very ambitious project, because it starts with fundamental research,” CEO Alexandre LeBrun told TechCrunch. “It’s not your typical applied AI startup that can release a product in three months.” The company plans to open-source its code and publish research as it goes, building a community around the JEPA paradigm.

The investor composition of the AMI Labs round is instructive. Lead investors include Cathay Innovation, Greycroft, Hiro Capital, HV Capital, and Bezos Expeditions. Strategic investors include NVIDIA, Samsung, Toyota Ventures, and Temasek. French industrial and family offices, Groupe Industriel Marcel Dassault and Association Familiale Mulliez, are also participating, signaling that European industrial capital views AMI Labs as a sovereign champion in the world model space. Notable angels include Tim Berners-Lee, Jim Breyer, Mark Cuban, and Eric Schmidt.

Spatial Intelligence Models: Building the 3D Internet

World Labs, founded by Stanford AI pioneer Fei-Fei Li and backed by a $1 billion round that closed in February 2026, led by Autodesk’s $200 million anchor investment and joined by AMD, NVIDIA, Andreessen Horowitz, Fidelity, Emerson Collective, and Sea, is pursuing a third architectural path: spatial intelligence. While LeCun’s JEPA focuses on physical dynamics and AMI Labs targets scientific and industrial applications, World Labs focuses on the geometry and structure of three-dimensional space.

Li’s core thesis is that spatial intelligence, the ability to understand, generate, and reason about 3D environments, is a fundamental cognitive capability that current AI systems lack. Language models have no concept of where things are in space. Even generative image and video models, which can produce visually compelling outputs, do not maintain spatial coherence: objects appear and disappear, perspectives are inconsistent, and the geometry of the scene changes arbitrarily between frames. A world model with genuine spatial intelligence would maintain a persistent, coherent 3D representation of the environment that remains consistent across viewpoints, time, and interactions.

World Labs’ approach combines neural radiance fields and 3D Gaussian Splatting, integrating large-scale generative priors. Given a text prompt or an image, a generative model produces a set of consistent multi-view images or a rough depth map. These outputs are used to initialize a 3D representation, which is then iteratively refined to ensure geometric consistency from all angles, physical plausibility, and visual high fidelity.

World Labs’ first product, Marble, enables users to create spatially cohesive, high-fidelity, and persistent 3D worlds from images, video, or text. The company articulates its vision with a memorable phrase: text became the universal interface for software; 3D is becoming the universal interface for space. The implication is that just as natural language processing created a new interface layer between humans and information systems, spatial intelligence will create a new interface layer between humans and physical environments, enabling everything from immersive creative tools to robotic navigation to scientific visualization.

The Autodesk anchor investment is particularly significant. Autodesk is the dominant platform for architectural design, engineering, and construction, industries that are built on 3D spatial reasoning. A world model capable of understanding and generating coherent 3D environments is not just a creative tool. It is the foundation for a new generation of design, simulation, and construction workflows. The $200 million investment signals that Autodesk sees World Labs not as a vendor but as a strategic infrastructure partner for the future of its entire product ecosystem.

Deterministic Physics-Constrained Models: Mathematical Certainty in a Probabilistic World

The fourth architectural paradigm, and the one most differentiated from the generative and predictive approaches, is the deterministic, physics-constrained world model. This is the approach pioneered by ARYA Labs, a public benefit corporation founded by Dr. Seth Dobrin (formerly IBM’s first Chief AI Officer and the first Chief AI Officer at a Fortune 50 company) and Łukasz Chmiel (a physicist with experience at CERN and the European Space Agency).

ARYA Labs’ architecture is built on five foundational principles: nano models, composability, causal reasoning, determinism, and architectural AI safety. The system contains zero neural network parameters, a striking departure from every other approach in the field, and instead implements world model capabilities through a hierarchical system of specialized nano models orchestrated by AARA (ARYA Autonomous Research Agent), an always-on cognitive daemon that executes a continuous sense-decide-act-learn loop.

The central claim of ARYA Labs’ Constrained Deterministic AI™ (CDAI) architecture is that it produces not probabilistic outputs but mathematical proofs. Where a generative world model might say “there is a 73 percent probability that this manufacturing process will produce a defective batch,” ARYA’s 4D Reality Engine produces a mathematical proof that either the process will or will not violate specified physical constraints, with no probability distribution, no confidence interval, and no hallucination. “With advances in Constrained Deterministic AI, ARYA Labs produces not just predictions but mathematical certainty,” Dobrin stated at the company’s December 2025 emergence from stealth, “providing the rigorous reliability demanded by industry, science, and defense.”

The nano model architecture is the key to understanding how ARYA achieves this. Instead of a single, monolithic model with billions of parameters, ARYA uses a hierarchy of highly specialized nano models. Each nano model is designed to capture a specific, localized physical or causal relationship, for example, the relationship between temperature and pressure in a specific valve, or the relationship between mixing speed and particle size distribution in a bioreactor. These nano models are not trained purely on data; they are mathematically constrained by the laws of physics. If a nano model attempts to predict a state that violates energy conservation, the architecture rejects it. The physics are mathematical boundaries, not learned as statistical approximations.

The true power of the system lies in its composability. The AARA daemon dynamically orchestrates these nano models, linking them together to simulate complex, system-level behaviors. Because each component is deterministic and physics-constrained, the composed system remains so. The architecture supports continuous incorporation of new physical observations and near-real-time updates to the world model, generating predictions that reflect the current state of the physical environment. This is not batch training on historical data. It is continuous learning from a live physical system, a capability that is architecturally impossible for monolithic neural network models.

The Unfireable Safety Kernel, an architecturally immutable safety boundary that cannot be disabled or circumvented by any system component, including the system’s own self-improvement engine, is the most technically distinctive element of ARYA’s architecture. This is not a social or ethical alignment statement. It is a technical framework ensuring that human control persists as autonomy increases. In an era when AI safety is increasingly a regulatory and enterprise procurement requirement, the ability to provide a mathematical guarantee of safety, not a policy layer, not a fine-tuned alignment, but an architectural constraint, is a qualitatively different proposition.

A Category Formation Event

The velocity of capital flowing into world models in early 2026 is not a coincidence. It is a category formation event: the moment when a research paradigm transitions into an investment thesis, and the smart money moves before the consensus catches up.

The numbers are striking. In a twelve-week window from February to March 2026, three world model companies raised a combined $3.23 billion:

World Labs (Fei-Fei Li): $1 billion, reportedly at a $5 billion valuation, led by Autodesk’s $200 million anchor, with AMD, NVIDIA, Andreessen Horowitz, Emerson Collective, Fidelity, and Sea participating (February 18, 2026)
AMI Labs (Yann LeCun): $1.03 billion at $3.5 billion pre-money valuation, co-led by Cathay Innovation, Greycroft, Hiro Capital, HV Capital, and Bezos Expeditions, with NVIDIA, Samsung, Toyota Ventures, and Temasek participating (March 9, 2026)
Wayve: $1.2 billion Series D at $8.6 billion post-money valuation, led by Eclipse, Balderton, and SoftBank Vision Fund 2, with Microsoft, NVIDIA, Uber, Mercedes-Benz, Nissan, and Stellantis participating. Counting Uber’s milestone-based $300 million commitment for robotaxi deployment, Wayve secured $1.5 billion in total (February 24, 2026)

Add PhysicsX’s extended Series B of $155 million (with NVIDIA NVentures participation, November 2025) and the picture becomes clearer: the world model category attracted over $3.4 billion in disclosed funding in a six-month period. For context, the Stanford AI Index 2026 reports that total private AI investment in 2025 was $344.7 billion globally, meaning that world models captured roughly 1 percent of annual global private AI investment in a single six-month sprint. That share will grow.

The investor composition is as revealing as the dollar amounts. NVIDIA appears in multiple rounds simultaneously: AMI Labs, World Labs, Wayve, PhysicsX. Jensen Huang’s team views world models not as a competitive threat to NVIDIA’s GPU business but as the next driver of demand for it. The Goldman Sachs report supports this reading: world models require fundamentally different compute profiles from LLMs, with greater demands for simulation, physics-aware rendering, and multi-step planning. If world models scale as expected, NVIDIA’s addressable market expands, not contracts.

The presence of strategic industrials, Autodesk ($200 million in World Labs), Toyota Ventures (AMI Labs), and Mercedes-Benz (Wayve), signals that the demand side of the market is already forming. These are not financial investors making a bet on a technology theme. They are operating companies making strategic investments in infrastructure that will underpin their next-generation products. Autodesk is betting that World Labs’ spatial intelligence will transform architectural and engineering design. Toyota is betting that AMI Labs’ JEPA models will advance autonomous vehicle and robotics capabilities. Mercedes-Benz is betting that Wayve’s GAIA-3 will

The Goldman Sachs report adds a crucial macro dimension to this picture. The bank’s Global Institute projects $7.6 trillion in cumulative AI CapEx between 2026 and 2031, a forecast built primarily around transformer-based LLM infrastructure. The report’s central argument is that this forecast does not account for the additional infrastructure demands of world models, which require different compute architectures (more simulation, more physics-aware rendering, more multi-step planning), different data pipelines (synthetic data generation, physics simulation, multi-modal sensor fusion), and different evaluation frameworks (physical consistency metrics, causal reasoning benchmarks, safety verification). If world models scale as expected, the $7.6 trillion forecast is a floor, not a ceiling.

“Intelligence, artificial or otherwise, is less about answers than about foresight,” the Goldman report states. “World models give machines something they have long lacked: a sense of consequence.” That sentence, written by investment bankers, not AI researchers, is the clearest signal that the world model thesis has crossed from research into capital allocation.

How Do We Measure Reality?

As the world model category matures, the industry faces a critical challenge: how do we measure progress? For LLMs, benchmarks like MMLU (Massive Multitask Language Understanding) or HumanEval provided a standardized, if imperfect, way to compare models. Evaluating a world model is vastly more complex. You are not measuring whether the model can answer a multiple-choice question. You are measuring whether the model accurately simulates the physical and causal dynamics of reality.

In 2025 and 2026, a new generation of benchmarks emerged, specifically designed for world models. The PhysicsMind benchmark focuses on whether models can reason about physical quantities and values from images, bridging the gap between visual perception and physical understanding. A separate line of work on Disambiguating Physics found specific, systematic patterns of failure in state-of-the-art video generation models when confronted with scenarios requiring strict adherence to physical laws over long time horizons. Objects that should slow down due to friction instead accelerate; fluids that should flow downhill instead flow uphill. These failure modes are not edge cases. They are systematic properties of models that learn physics statistically rather than encoding it structurally.

The CausalARC benchmark, modeled after the famous Abstraction and Reasoning Corpus, tests a world model’s ability to perform abstract causal reasoning in low-data regimes by sampling tasks from fully specified structural causal models. The results reveal a stark divide between models that have learned statistical correlations and models that have learned causal structure: the former fail on novel causal configurations, while the latter generalize.

ARYA Labs published its own systematic evaluation across eight distinct benchmark suites in Q1 2026. The results are striking and deserve careful examination, as they illuminate exactly where the deterministic architecture holds its strongest advantage over leading probabilistic models.

On PhysReason, a 200-problem physics reasoning benchmark spanning four difficulty levels, AARA scored 78.7, compared to DeepSeek-R1 at 56.75, o3-mini-high at 53.32, and GPT-4o at 29.58. The AARA system uses zero neural network parameters on this benchmark; it extracts symbolic physics equations from problem statements and solves them deterministically via SymPy. The performance gap is not marginal. It is 22 points above the next-best model, achieved without any statistical learning.

On CLadder v1, a 10,112-question causal reasoning benchmark that tests do-calculus, counterfactual inference, and causal adjustment, AARA achieved 99.89 percent accuracy, compared to GPT-5.5 at 90.1 percent and Claude Opus 4.7 at 93.3 percent. The benchmark is specifically designed to test the kind of causal reasoning that world models must perform to be useful in enterprise settings: reasoning about what would have happened if a different action had been taken, or what will happen if a specific intervention is made. The near-perfect AARA score reflects the architectural advantage of a system built explicitly around do-calculus and causal inference.

On FrontierScience, a 160-question PhD-level science benchmark spanning physics, chemistry, and biology, AARA v0.9.10 scored 41.25 percent, compared to GPT-5.2 at 25.8 percent, Claude Sonnet 4 at 18.3 percent, and Gemini 2.5 Pro at 16.9 percent. This is a benchmark where the best probabilistic models in the world score roughly one in four questions correctly; AARA scores two in five, a 60 percent improvement over the next-best system.

On BigCodeBench, a 1,140-task code generation benchmark, AARA’s Constraint Breaker achieved 80.5 percent pass@1, compared to OpenAI o3-mini at 61.4 percent, Claude Sonnet 4 at 58.4 percent, and GPT-4o at 51.1 percent. The Constraint Breaker approach, using AST-level code analysis and modification rather than generative text prediction, demonstrates that the deterministic architecture generalizes beyond physics reasoning to structured problem-solving domains.

On WorldArena, an embodied robotics benchmark using the RoboTwin 2.0 bimanual robot platform with 50 tasks and 500 test episodes, AARA v2 achieved a 47.4 percent success rate, compared to the v1 baseline of 8.0 percent. This 5.9x improvement results from the Constraint Breaker’s k-NN episode retrieval, adaptive trajectory blending, and DTW-aligned refinement, demonstrating that the deterministic architecture can be applied to physical manipulation tasks, not just reasoning benchmarks.

The benchmark landscape also reveals where the deterministic approach faces genuine challenges. On video understanding benchmarks that test visual perception, SSv2 (action recognition) and EpicKitchens (procedural activity), GPT-5.2 and Claude Opus 4.6 score near 100 percent, reflecting the strength of large-scale visual pretraining. On temporal reasoning benchmarks where even the best probabilistic models struggle, TOMATO (17.0 percent), TempCompass (24.8 percent), and TemporalBench (25.6 percent), the low baseline scores across all models signal an open research problem that no current architecture has solved. These are precisely the domains where ARYA’s causal reasoning approach has the most room to demonstrate advantage as the benchmarks mature.

The field is converging on the understanding that visual realism and physical accuracy are distinct properties that require distinct evaluation frameworks, and that the latter is more important for enterprise deployment in safety-critical domains. A world model that generates photorealistic video of a car driving down a road may score well on visual fidelity metrics while simultaneously failing on physical consistency metrics, and it is the physical consistency that determines whether the model can be trusted to make decisions with real-world consequences.

ARYA Labs’ performance profile is particularly striking in this landscape: dominant in causal reasoning and physics problem-solving, competitive in mathematics, strong in code generation, and improving rapidly in physical manipulation. The benchmarks on which ARYA does not lead, visual realism, visual generalization, and fine-grained dexterity, are precisely those where the generative and predictive approaches have architectural advantages. This is not a weakness. It is a clear signal that the deterministic architecture is not the right tool for the job.

The Synthetic Data Flywheel

One of the most underappreciated dynamics in the world model ecosystem is the relationship between world models and synthetic data. This relationship is not linear. It is a flywheel, and understanding it is essential for anyone trying to assess the space's long-term competitive dynamics.

The basic dynamic is this: world models require massive amounts of physics-aware, multi-modal training data to learn the dynamics of physical environments. Real-world data collection is expensive, slow, and often impossible for safety-critical scenarios. You cannot collect training data for autonomous vehicle edge cases by crashing cars, or for pharmaceutical manufacturing failures by deliberately contaminating batches. Synthetic data generated by physics simulation engines is the solution, but traditional simulation tools are too slow and too expensive to produce the required volumes.

World models solve this problem by becoming synthetic data generators themselves. A world model trained on real-world physics can generate unlimited synthetic training data for downstream AI systems, including other world models. NVIDIA’s Cosmos platform is explicitly designed around this flywheel: Cosmos Predict generates synthetic future states, Cosmos Transfer converts synthetic environments into photorealistic training data, and the resulting data is used to train physical AI systems that, in turn, generate more training data. The platform is, in effect, a synthetic data factory powered by a world model.

This flywheel has profound implications for competitive dynamics. Organizations that deploy world models early accumulate synthetic data advantages that compound over time. A world model trained on a company’s proprietary manufacturing processes generates synthetic training data that is specific to that company’s equipment, materials, and operating conditions, data that no competitor can replicate. This creates a data moat qualitatively different from the data advantages that LLM-era companies have built, because it is grounded in physical reality rather than in text.

ARYA Labs’ nano model architecture is particularly well-suited to this flywheel dynamic. The architecture supports continuous incorporation of new physical observations and near-real-time updates to the world model, generating predictions that reflect the current state of the physical environment. This is not batch training on historical data. It is continuous learning from a live physical system, a capability that is architecturally impossible for monolithic neural network models.

From Chatbots to Consequence Engines

For senior executives, the world model conversation is not primarily about research or investment. It is about what changes in the enterprise AI stack, and when. The honest answer is: a great deal changes, but not immediately.

The current enterprise AI deployment model is built around LLMs as sophisticated text processors. You feed them documents, data, and instructions; they generate text, code, or structured outputs. This is genuinely useful for a wide range of tasks: summarization, drafting, classification, code generation, and customer service automation. But it has a fundamental ceiling. LLMs cannot simulate physical processes, reason about causal chains with physical consequences, or provide mathematical guarantees about their outputs. For industries where decisions have physical consequences (manufacturing, logistics, energy, healthcare, aerospace, defense), this ceiling matters enormously.

IBM’s research team has been building toward world models for physical assets for years, and the company’s Distinguished Engineer Anuradha Bhamidipati articulated the enterprise value proposition clearly in a March 2026 IBM Think article: “LeCun’s idea of world models is that systems should learn the latent structure and dynamics of reality, not just patterns in text. This view aligns with IBM’s long-standing focus on physics-aware, simulation-driven, and scientifically grounded AI.” IBM is developing asset-agnostic simulation frameworks: systems that generate thousands of trajectories to learn how physical assets transition between states, then use those learned dynamics to evaluate interventions before they happen.

IBM’s collaboration with Sund & Bælt, the Danish operator of the Øresund Bridge and the Great Belt Bridge, illustrates the enterprise use case with precision. A bridge is a complex physical system subjected to continuous stress, weather, and material degradation. Instead of relying on scheduled maintenance (which is inefficient) or reactive maintenance (which is dangerous), the operator uses a world model to simulate the bridge's degradation over decades. The model integrates sensor data (vibration, temperature, traffic load) to update its internal representation of the bridge’s structural health. It can then simulate the impact of different maintenance interventions, allowing the operator to optimize CapEx while ensuring structural integrity.

IBM’s collaboration with NASA represents an application of scientific discovery. NASA’s challenge is not a lack of data. It has petabytes of sensor data from satellites, telescopes, and planetary probes. Its challenge is extracting actionable insights from that data in real time, in novel situations, and under physical constraints that cannot be violated. A world model trained on the physics of orbital mechanics, atmospheric dynamics, and spacecraft systems can simulate mission scenarios, identify failure modes, and recommend interventions with a level of physical fidelity that no LLM can approach.

The pharmaceutical manufacturing application is equally compelling. Drug manufacturing is governed by extraordinarily precise physical and chemical constraints. A batch that deviates from specified temperature gradients, mixing times, or pH levels is unusable, or worse, unsafe. Current AI approaches to process monitoring are largely reactive: they detect deviations after they occur. A world model trained on the physics and chemistry of the manufacturing process can predict deviations before they occur, simulate the consequences of different interventions, and recommend corrective actions with mathematical certainty. ARYA Labs’ seven active industry domain nodes include pharma manufacturing, and the company’s Constrained Deterministic AI architecture is specifically designed for exactly this kind of mission-critical, zero-hallucination application.

For executives in these industries, the strategic question is not whether to adopt world models. It is when and how. The Goldman Sachs report suggests that near-term investment in world models will remain a fraction of total AI spend as the technology matures from fundamental research to commercial deployment. But the companies that begin building the data infrastructure, simulation environments, and organizational capabilities for world model adoption today will have a significant advantage over those that wait until the technology is fully mature before engaging.

The most important near-term action for executives is not to deploy a world model. It is to audit your data strategy. World models require fundamentally different training data than LLMs: physics simulation data, multi-modal sensor fusion data, and synthetic data generated from domain-specific simulation environments. Organizations that begin building these data assets now are investing in the infrastructure that will underpin their competitive position in the world model era.

The Safety Imperative

The world model conversation in the research community is primarily about capability: what can these systems do that LLMs cannot? But in the enterprise market, the more important conversation is about safety: what guarantees can these systems provide, and what happens when they are wrong?

This distinction matters because the most valuable applications of world models are precisely the ones where being wrong has catastrophic consequences. An autonomous vehicle that makes a wrong prediction about a pedestrian’s trajectory does not produce a bad answer. It causes an accident. A pharmaceutical manufacturing system that incorrectly predicts the outcome of a process deviation does not generate a hallucinated response. It produces an unsafe drug batch. A defense system that misidentifies a target does not make an error. It causes an incident.

For these applications, the probabilistic outputs of generative and predictive world models, outputs that come with confidence intervals, probability distributions, and the implicit acknowledgment that the model might be wrong, are not acceptable. What is required is mathematical certainty: a guarantee that the system’s outputs are consistent with physical laws, that its predictions cannot violate specified constraints, and that its behavior is fully auditable and explainable.

This is the market that ARYA Labs is building for. The company’s Unfireable Safety Kernel, an architecturally immutable safety boundary that cannot be disabled or circumvented by any component of the system, including its own self-improvement engine, is not a feature. It is the product. Safety is an architectural constraint governing every operation, not a policy layer applied after the fact.

The regulatory environment is reinforcing this market. The EU AI Act, which entered enforcement in 2026, classifies AI systems used in critical infrastructure, healthcare, and defense as high-risk applications requiring demonstrable explainability, auditability, and human oversight. The United States is developing analogous frameworks through NIST’s AI Risk Management Framework and sector-specific guidance from the FDA, FAA, and DoD. In this regulatory environment, the ability to provide mathematical proofs rather than probabilistic outputs is not just a technical advantage. It is a compliance requirement.

The market size for safety-critical AI applications is substantial. Defense AI spending alone is projected to exceed $100 billion annually by 2030, and the pharmaceutical, aerospace, medical device, and critical infrastructure sectors collectively represent trillions of dollars in annual economic activity. ARYA Labs’ positioning as the world’s first deterministic world model frontier lab targeting these sectors precisely is a bet that the safety imperative will create a distinct, defensible market segment within the broader world model ecosystem.

The company’s public benefit corporation structure reinforces this positioning. As a PBC, ARYA Labs is legally committed to advancing global AI safety and societal good, a commitment that is not just ethical but strategic, as regulated industries increasingly require vendors to demonstrate alignment between their organizational structure and their safety claims. The Safety Kernel will be made available to the community as an open-source asset in the near future.

What Goldman Sachs Is Really Saying

The Goldman Sachs report deserves careful reading, because its implications extend well beyond the world model category itself. The report’s central claim, that current AI infrastructure forecasts do not account for world model compute demands, is one of the most significant investment signals of 2026.

To understand why, consider the current AI infrastructure investment thesis. The dominant narrative is that LLM training and inference require massive GPU clusters, enormous data centers, and unprecedented power consumption. NVIDIA’s market capitalization, the hyperscaler CapEx cycle, and the data center construction boom are all predicated on this narrative. The Stanford AI Index 2026 reports that US private AI investment reached $285.9 billion in 2025, 23 times China’s $12.4 billion, and global corporate AI investment hit $581.7 billion, up 130 percent year-over-year. These are extraordinary numbers, built around the LLM paradigm.

World models change the compute profile in ways that are not simply additive. LLMs are primarily inference workloads: you train a large model once, then run inference at scale. World models are primarily simulation workloads: you run physics simulations, generate synthetic data, and continuously execute multi-step planning loops. These workloads have different hardware requirements (more memory bandwidth, more parallel simulation capacity, different precision requirements), different data center designs (more emphasis on low-latency interconnects for simulation loops), and different energy profiles (more variable, with high peaks during simulation generation).

The Goldman report does not claim that world models will replace LLMs. It explicitly states that LLMs are complementary to world models, not substitutional. It claims that the world model paradigm will create additional infrastructure demand that is not currently included in the consensus forecast. If Goldman is right, the $7.6 trillion cumulative AI CapEx forecast for 2026–2031 is a floor. The infrastructure companies, NVIDIA, AMD, the hyperscalers, the data center REITs, and the power utilities are all undervalued relative to the world model scenario.

For investors, the most actionable implication is not to buy NVIDIA (which is already priced for significant AI infrastructure growth) but to look for second-order beneficiaries: companies that provide the specific infrastructure components world models require and that are not yet priced for world-model demand. This includes simulation software companies (physics engines, synthetic data platforms), specialized memory and interconnect hardware, and the energy infrastructure required to power simulation-intensive workloads.

The near-term caution in the Goldman report is equally important. World-model investment is likely to remain a fraction of total AI spend in the near term, especially given the current state and pace of development. The technology is real, the investment is flowing, and the infrastructure implications are significant, but the commercial deployment timeline is measured in years, not months. The companies that will benefit most from the world model transition are those that can sustain the investment required to build the technology while the market matures.

The Geopolitics of Simulating Reality

The $3 trillion repricing is not just a corporate phenomenon. It is a geopolitical one. The race to build the most capable world models is increasingly viewed through the lens of national security and economic sovereignty.

The Stanford AI Index 2026 highlights the stark disparity in private AI investment: the US invested $285.9 billion in 2025, compared to China’s $12.4 billion, a ratio of roughly 23 to 1. However, the report also notes that the performance gap between US and Chinese models on top benchmarks has narrowed to just 2.7 percent. China is closing the capability gap even as the investment gap widens, which suggests that the efficiency of research investment, not just its scale, will determine the long-term competitive outcome.

World models represent a strategic chokepoint. A nation that possesses superior physical world models has a structural advantage in advanced manufacturing, aerospace design, logistics optimization, and autonomous weapons systems. The ability to simulate the physical world at high fidelity is not just a commercial capability. It is a defense, scientific, and industrial capability. Nations that fall behind in the development of world models will find themselves dependent on foreign AI infrastructure for their most critical physical systems.

This has led to the rise of Sovereign AI initiatives. Governments in Europe, the Middle East, and Asia are investing heavily in building localized compute infrastructure and funding domestic AI champions. AMI Labs’ $1.03 billion seed round, backed heavily by French industrial and family offices (Groupe Industriel Marcel Dassault, Association Familiale Mulliez) and Bpifrance, is a prime example of Europe attempting to build a sovereign champion in the world model space, ensuring it is not entirely dependent on US-based hyperscalers and frontier labs.

The export controls on advanced semiconductors, such as NVIDIA’s H100 and B200 GPUs, are fundamentally about limiting the proliferation of the compute required to train these massive models. As world models become the engine of physical and industrial innovation, the geopolitical competition to control the hardware, talent, and data required to build them will only intensify. The shift from words to worlds is not just about repricing data centers; it is about repricing nations' industrial and military capabilities.

ARYA Labs’ structure as a public benefit corporation with operations in Connecticut, Poland, and the EU positions it at an interesting intersection of this geopolitical dynamic. Its architecture, which requires minimal compute relative to neural network-based approaches due to its nano model design and zero neural network parameters, is inherently more accessible to organizations and nations that lack access to massive GPU clusters. In a world where access to compute is increasingly a geopolitical variable, an architecture that achieves state-of-the-art performance without massive compute requirements is not just a technical advantage; it is a strategic one.

Beyond the GPU

The Goldman Sachs report’s assertion that the AI infrastructure stack must be repriced is rooted in the specific hardware demands of world models. While GPUs have been the undisputed engine of the LLM boom, they are not necessarily the optimal architecture for all world model workloads.

GPUs are designed for massive parallel processing of relatively simple operations, which makes them perfect for the matrix multiplications that dominate neural network training and inference. However, world models, particularly those that rely heavily on continuous physics simulation and complex causal reasoning, often require different computational profiles.

Memory bandwidth is frequently the binding constraint. Simulating complex physical systems requires continuously moving massive amounts of data in and out of memory. The memory bandwidth of the hardware often becomes the bottleneck before the raw compute power. Innovations in High Bandwidth Memory and advanced packaging (such as TSMC’s CoWoS) are critical to world model performance. Interconnects are equally important: world models often require distributed training and simulation across thousands of chips, and the speed at which these chips can communicate is paramount. Technologies like NVIDIA’s NVLink and NVSwitch, as well as emerging optical interconnects, are essential for scaling world models.

For specific types of physics simulations (fluid dynamics, molecular dynamics, finite element analysis), specialized hardware accelerators can offer orders-of-magnitude better performance and energy efficiency than general-purpose GPUs. The emergence of AI-native simulation hardware, designed specifically for the physics simulation workloads that world models require, represents a significant investment opportunity that is not yet reflected in consensus infrastructure forecasts.

ARYA Labs’ nano model architecture represents a radically different hardware profile. Because the system operates through composable, physics-constrained nano models, it is memory-constrained rather than compute-constrained. The GPU architecture actually makes the problem worse in some cases because it is dealing with millions of nano models. This means their system often performs better on large, high-memory CPU clusters, with a single GPU available as needed.

The Ethical Dimensions of Simulating Reality

As world models become more capable, they raise profound ethical and societal questions that extend far beyond the concerns associated with LLMs. LLMs primarily raise issues of misinformation, bias, and copyright infringement. World models, because they simulate physical reality and predict the consequences of actions, introduce new categories of risk.

The first is the illusion of certainty. A highly realistic generative world model can produce simulations that are visually indistinguishable from reality. If a model predicts that a specific economic policy will lead to prosperity or that a specific military intervention will succeed, the visual fidelity of the simulation lends it an unwarranted aura of certainty. Decision-makers place too much trust in the simulation, ignoring the inherent uncertainties and assumptions built into the model.

The second is dual-use capability. World models are inherently dual-use technologies. A model that can simulate the aerodynamics of a commercial airliner can also simulate the aerodynamics of a hypersonic missile. A model that can simulate the molecular dynamics of a life-saving drug can also simulate the synthesis of a novel chemical weapon. The proliferation of highly capable world models poses significant challenges to global security and arms control that the international community has yet to address.

The third is the epistemological risk. As organizations increasingly rely on world models to understand complex systems (climate change, epidemiology, macroeconomics), they risk losing their independent understanding of those systems. If the model becomes the only tool capable of comprehending the complexity, decision-makers become entirely dependent on the model’s internal logic, which is opaque or flawed. This is why the transparency and auditability provided by ARYA Labs’ CDAI architecture are not just regulatory requirements. They are epistemological necessities for maintaining human understanding of the systems being modeled.

These ethical challenges underscore the importance of building safety as an architectural constraint from the beginning, not as a compliance checkbox applied after the fact. The world model era will require a new generation of AI governance frameworks that go beyond the text-focused concerns of current LLM regulation and address the unique risks of systems that simulate physical reality.

The Mathematics of World Model Learning

To appreciate why world models represent a qualitative leap beyond language models, it is necessary to understand the mathematical framework that governs their learning. This section is intended for readers who want to go beyond the conceptual and understand the formal machinery: the equations, the loss functions, and the architectural choices that determine what a world model can and cannot do.

The Formal Definition

A world model can be formally defined as a learned function that approximates the environment's transition dynamics. Let s_tdenote the state of the environment at time t, a_t denote the action taken by an agent at time t, and s_{t +1} denote the resulting state at time t+1. The true transition dynamics are governed by an unknown function

$f: s_{t+1} = f(s_t, a_t, \epsilon_t), where, epsilon_t$

is a noise term capturing the stochastic elements of the environment.

A world model learns an approximation ˆƒ of this function from observed trajectories (s₀, a₀, s₁, a₁, . . . , s_T). The quality of the world model is determined by how accurately ˆƒ predicts s_{t +1} given s_tand a_t, and by how well the predictions generalize to states and actions not seen during training.

In practice, the state s_tis rarely directly observable. Instead, the agent observes an observation o_tthat is a function of the true state: o_t = g(s_t). The world model must therefore also learn an encoder ɸ that maps observations to a latent state representation: z_t= ɸ(o_t). The transition model then operates in this latent space: z_{t +1} = ˆƒ (z_{t +1} , a_{t +1} )$. This latent-state formulation is the foundation of all modern world-model architectures, from Dreamer to JEPA to NVIDIA Cosmos.

The Dreamer Architecture: A Detailed Look

The Dreamer family of algorithms, developed by Danijar Hafner and colleagues at Google DeepMind and the University of Toronto, is the most thoroughly studied and widely cited world model architecture in the academic literature. Understanding Dreamer’s architecture in detail provides a template for understanding the broader class of latent world models.

Dreamer consists of three components: a Recurrent State Space Model (RSSM), an actor network, and a critic network. The RSSM is the world model itself. It maintains a hidden state h_tthat summarizes the history of observations and actions, and a stochastic state z_tthat captures the uncertainty about the current environment state. The RSSM is trained to predict future observations, rewards, and episode terminations from the current state.

The RSSM’s transition model has two components: a deterministic recurrent model

$h_t = f_\phi(h_{t-1}, z_{t-1}, a_{t-1})$

and a stochastic model

$z_t \sim q_\phi(z_t | h_t, o_t) $

(during training)

$z_t \sim p_\phi(z_t | h_t)$

The key insight is that the deterministic component provides a stable, information-rich context, while the stochastic component captures the environment's inherent unpredictability.

The training objective of the RSSM is a variational lower bound on the log-likelihood of the observed data, known as the Evidence Lower Bound (ELBO). The ELBO consists of three terms: a reconstruction term (how well the model predicts observations from the latent state), a KL divergence term (which regularizes the stochastic state to be close to the prior), and a reward prediction term (how well the model predicts rewards from the latent state). By maximizing the ELBO, the RSSM learns a compact, informative latent representation of the environment that captures both its deterministic dynamics and its stochastic variability.

Once the RSSM is trained, the actor and critic networks are trained entirely within the model’s imagination. The actor learns a policy π_ɸ(a_t| z_t, h_t) that maximizes the expected future reward. The critic learns a value function V_ɸ(z_t, h_t) that estimates the expected future reward from the current state. Both are trained using backpropagation through the model’s imagined trajectories, a technique known as Dreaming or model-based policy optimization. The agent never needs to interact with the real environment during policy learning. It learns entirely from simulated experience.

DreamerV3, described in the 2023 paper “Mastering Diverse Domains through World Models”, extended this framework with several key innovations. First, it introduced symlog predictions, a symmetric logarithmic transformation of the target values, which allows the model to handle the enormous range of reward scales across different domains without task-specific normalization. Second, it introduced free bits, a modification of the KL divergence term that prevents the model from collapsing to a trivial solution by ensuring that each latent dimension carries at least some information. Third, it used a categorical representation for the stochastic state, rather than a Gaussian, which provides sharper gradients and better generalization.

The result was a single algorithm that could master 150 diverse tasks across 7 different benchmarks (from Atari games to continuous control to Minecraft) without any task-specific hyperparameter tuning. This generalization across domains is the hallmark of a true world model: it has learned not just how to play a specific game, but how to learn about any environment from experience.

The JEPA Architecture: Prediction in Representation Space

The Joint Embedding Predictive Architecture (JEPA), proposed by Yann LeCun in his 2022 paper “A Path Towards Autonomous Machine Intelligence”, takes a fundamentally different approach to world model learning. Rather than predicting future observations in pixel space or even in a learned latent space, JEPA predicts future representations in an abstract embedding space.

The formal setup is as follows. Let x denote the current state and y denote the future state. JEPA trains two encoders: z_x = E_x(x) and s_y = E_y(y), which map the current and future states to their respective abstract representations. A predictor network P takes the current representation s_x and a latent variable z (which captures the uncertainty about the future), and predicts the future representation:

$\hat{s}_y = P(s_x, z)$

The system is trained to minimize the distance between the predicted future representation ˆs_yand the actual future representation

$s_y: \mathcal{L} = d(\hat{s}_y, s_y)$

, where d is a distance function such as the mean squared error.

The critical challenge in training JEPA is preventing representational collapse, the tendency of the encoders to learn trivial, constant representations that make the prediction task easy but useless. If both encoders learn to map all inputs to the same constant vector, the prediction loss is zero, but the representations carry no information. JEPA addresses this through a combination of architectural constraints and training techniques. The two encoders are not identical: E_x is updated by gradient descent, while E_yis an exponential moving average of E_x (similar to the target network in BYOL and DINO). This asymmetry prevents the encoders from collapsing to the same trivial solution.

The latent variable z is crucial for handling the inherent unpredictability of the future. In a deterministic JEPA, z is set to zero, and the predictor must predict the single most likely future. In a stochastic JEPA, z is sampled from a learned prior distribution, and the predictor must predict the distribution of possible futures. The stochastic formulation is more powerful but more complex to train, as it requires learning both the prior distribution over z and the predictor function.

Meta AI’s V-JEPA 2, published in 2026, applied the JEPA framework to video understanding at scale. The model was trained on a large corpus of video data, using masked video prediction as the training objective: given a video with some frames masked out, predict the abstract representations of the masked frames. The model achieved strong performance on physical reasoning benchmarks without any explicit physics supervision, demonstrating that the JEPA objective, predicting future representations, is sufficient to learn the abstract structure of physical dynamics.

The key advantage of JEPA over generative world models is its computational efficiency. Predicting in representation space is far cheaper than predicting in pixel space, because the representation space is much lower-dimensional and the prediction task is much simpler. A generative world model must predict every pixel of the future frame, a task that requires enormous computational resources and that is fundamentally ill-posed (there are infinitely many possible future frames consistent with the current state). A JEPA model only needs to predict the abstract features of the future, a task that is both computationally cheaper and more tractable.

The Mathematics of Deterministic Physics-Constrained Models

ARYA Labs’ Constrained Deterministic AI (CDAI) architecture represents the most mathematically rigorous approach to world modeling currently in production. Unlike the probabilistic approaches of Dreamer and JEPA, CDAI operates in a fully deterministic framework in which physical laws are encoded as hard mathematical constraints rather than learned statistical approximations.

The formal foundation of CDAI is a hybrid dynamical system framework. The state of a physical system is represented as a vector

$s \in \mathbb{R}^n$

and the dynamics of the system are governed by a set of differential equations:

$\dot{s} = f(s, u, \theta)$

where u is the control input, θ is a set of physical parameters (material properties, geometric dimensions, environmental conditions), and f is the vector field governing the system’s evolution. The key constraint is that f must be consistent with the laws of physics: conservation of energy, conservation of momentum, the second law of thermodynamics, and any domain-specific physical constraints.

In ARYA’s nano model architecture, each nano model captures a specific, localized component of this dynamical system. A nano model for a heat exchanger, for example, captures the differential equations governing heat transfer between two fluid streams:

$\dot{T}h = -\frac{UA}{m_h c{p,h}} (T_h - T_c) $

and

$\dot{T}c = \frac{UA}{m_c c{p,c}} (T_h - T_c)$

where T_hand T_care the temperatures of the hot and cold streams, $U$ is the overall heat transfer coefficient, A is the heat transfer area, m_h and m_c are the mass flow rates, and m_p,h and c_p,c are the specific heat capacities. These equations are not learned from data. They are derived from first principles and encoded directly into the nano model.

The nano model is trained to identify the physical parameters θ, in this case U, A, m_h, m_c , c_p,h, c_p,c , from observed data. This is a parameter identification problem, not a function approximation problem. The training objective is to find parameter values that minimize the discrepancy between the model’s predictions and the observed data, subject to the constraint that the parameters be physically plausible (heat transfer coefficients must be positive, and specific heat capacities must be within physically reasonable ranges).

The composability of nano models is achieved through a hierarchical causal graph. Each nano model is a node in the graph, and the edges represent the physical connections between components. The AARA daemon traverses this graph to simulate the behavior of the composed system, propagating the state forward in time by solving the differential equations of each nano model in the appropriate order. Because each nano model is deterministic and physics-constrained, the composed system is also deterministic and physics-constrained. The physical constraints of the individual components are automatically satisfied by the composed system.

The Unfireable Safety Kernel is implemented as a set of invariant constraints on the state space:

$\mathcal{C} = {s \in \mathbb{R}^n : g_i(s) \leq 0, i = 1, \ldots, m}$

These constraints define the safe operating region of the system. The AARA daemon continuously monitors the system state and verifies that it remains within ℂ. If a predicted future state violates any constraint, the daemon rejects the prediction and flags the situation for human review. This is not a probabilistic safety guarantee. It is a mathematical proof that the system will not enter an unsafe state, provided the physical model is accurate.

The Role of Reinforcement Learning in World Models

Reinforcement learning (RL) and world models have a deep and mutually beneficial relationship that is often underappreciated in popular accounts of the technology. Understanding this relationship is essential for grasping why world models are so powerful for decision-making applications, and why they represent a qualitative leap beyond both supervised learning and standard RL.

The Sample Efficiency Problem

The fundamental challenge in reinforcement learning is sample efficiency: learning a good policy requires many interactions with the environment, and in many real-world applications, those interactions are expensive, slow, or dangerous. Training a robot to perform a manipulation task by trial and error in the real world requires millions of attempts, each of which takes seconds or minutes and carries the risk of damaging the robot or its environment. Training an autonomous vehicle to handle edge cases by driving in the real world requires billions of miles of driving, a task that would take decades and cost billions of dollars.

World models address the sample-efficiency problem by enabling the agent to learn from imagined rather than real experience. Once the world model has been trained on a relatively small amount of real-world data, the agent can generate unlimited synthetic experience by simulating trajectories within the model. The agent can explore dangerous situations, test rare edge cases, and learn from failure modes without any real-world risk. This is the core value proposition of model-based reinforcement learning, and it is what makes world models so powerful for physical AI applications.

The theoretical analysis of model-based RL reveals a fundamental trade-off: a more accurate world model enables more efficient policy learning, but building a more accurate world model requires more data and computation. The optimal strategy depends on the relative costs of data collection, model training, and policy learning. In applications where real-world data is expensive (robotics, autonomous vehicles, industrial control), the investment in building a high-quality world model pays off quickly. In applications where real-world data is cheap (video games, simulated environments), the investment is not justified.

Planning with World Models

Beyond policy learning, world models enable a form of reasoning that is qualitatively different from reactive decision-making: planning. A planning agent uses its world model to simulate the consequences of different action sequences before committing to any. This is the computational equivalent of the human ability to think before acting, to consider the consequences of different choices in the mind before executing them in the world.

The most powerful planning algorithms for world models are based on Monte Carlo Tree Search (MCTS), the algorithm that underlies AlphaGo and AlphaZero. MCTS builds a search tree of possible future states by repeatedly sampling action sequences from the world model, evaluating the resulting states using a value function, and using the evaluation results to guide the search toward more promising regions of the action space. The algorithm balances exploration (trying new action sequences) and exploitation (refining the best-known action sequences) using the Upper Confidence Bound (UCB) formula.

The combination of world models and MCTS is particularly powerful for long-horizon planning tasks, tasks where the agent must commit to a sequence of actions whose consequences will only become apparent many steps in the future. In these tasks, reactive policies (which choose actions based only on the current state) are fundamentally limited, because they cannot anticipate the long-term consequences of their choices. World model-based planning agents look ahead over long time horizons, consider the consequences of different action sequences, and choose the sequence that maximizes expected long-term reward.

The Causal Reasoning Advantage

One of the most important advantages of world models over purely data-driven approaches is their ability to support causal reasoning, the ability to reason about the consequences of interventions, not just correlations in historical data. This distinction is captured by Judea Pearl’s ladder of causation, which distinguishes three levels of causal reasoning: association (what is correlated with what?), intervention (what happens if I do X?), and counterfactual (what would have happened if I had done Y instead of X?).

Standard machine learning models, including LLMs, operate primarily at the first level of Pearl’s ladder. They learn statistical associations from historical data and use those associations to make predictions. But they cannot reason about interventions (what happens if I change X?) or counterfactuals (what would have happened if X had been different?), because these questions require a causal model of the world, not just a statistical model.

World models, particularly those that explicitly encode causal structure (like ARYA Labs’ CDAI architecture), operate at all three levels of Pearl’s ladder. They can answer association questions (what is the correlation between temperature and yield?), intervention questions (what happens to yield if I increase temperature by 5 degrees?), and counterfactual questions (what would the yield have been if the temperature had not spiked at hour 3?). This causal reasoning capability is what makes world models so powerful for decision support in complex physical systems, and what fundamentally distinguishes them from LLMs, which can describe causal relationships in text but cannot simulate them.

Deep Dive: The Data Bottleneck and the Synthetic Data Solution

The world model revolution is, at its core, a data revolution. The fundamental challenge in building world models for physical systems is not algorithmic. The mathematical frameworks are well-understood. It is empirical: obtaining the massive, diverse, physics-aware datasets required to train models that accurately capture the dynamics of complex physical environments.

Why Text Data Is Not Enough

The success of LLMs was built on the availability of essentially unlimited text data, the entire written output of human civilization, scraped from the internet and digitized from books, articles, and academic papers. This data was not perfect, but it was vast, diverse, and cheap. The LLM scaling laws showed that model performance improved predictably with the amount of training data, and the internet provided essentially unlimited data to feed that scaling.

Physical world models face a fundamentally different data challenge. The physical world does not produce text. It produces sensor readings, video streams, simulation outputs, and experimental measurements. These data modalities are far more expensive to collect than text, far more difficult to label and annotate, and far more domain-specific. A world model for pharmaceutical manufacturing requires data from pharmaceutical manufacturing processes, not from the internet. A world model for structural engineering requires data from structural testing and simulation, not from Wikipedia.

The data challenge is compounded by the safety constraints of physical systems. You cannot collect training data for a world model of a nuclear power plant by running the plant to failure. You cannot collect training data for an autonomous vehicle world model by crashing cars. The most important failure modes, the ones that the world model most needs to understand, are precisely the ones that are most dangerous and expensive to observe in the real world.

The Physics Simulation Approach

The traditional solution to the physical data bottleneck is physics simulation: using computational models of physical systems to generate synthetic training data. Physics simulation engines like ANSYS, Abaqus, and OpenFOAM generate highly accurate synthetic data for structural mechanics, fluid dynamics, and heat transfer, but they are extremely slow. A single finite element simulation of a complex structure takes hours or days on a high-performance computing cluster. Generating the millions of training examples required for a world model using traditional simulation is computationally infeasible.

This is the problem that PhysicsX, a UK-based startup that raised a $155 million Series B extension in November 2025 with NVIDIA NVentures participation, is working to solve. PhysicsX develops AI-accelerated physics simulation tools that run 1,000 to 10,000 times faster than traditional simulation engines while maintaining high accuracy. The company’s approach combines physics-informed neural networks, neural networks that are trained to satisfy the governing equations of physics, with traditional numerical methods, achieving the accuracy of physics simulation at a fraction of the computational cost.

The PhysicsX approach is particularly relevant for the world model ecosystem because it enables the generation of large, diverse, physics-accurate synthetic datasets at a cost that makes world model training feasible. A world model for aerospace structural design, for example, requires training data covering a vast range of geometric configurations, material properties, and loading conditions. Traditional simulation would require years of computing time to generate this data; PhysicsX’s AI-accelerated simulation generates it in days.

The World Model as Synthetic Data Generator

The most powerful solution to the data bottleneck is the one that NVIDIA has built into its Cosmos platform: using world models themselves as synthetic data generators. This creates a virtuous cycle, a flywheel, in which world models generate synthetic data that is used to train better world models, which generate better synthetic data, and so on.

NVIDIA’s Cosmos Transfer model, which converts synthetic environments into photorealistic training data, is the key component of this flywheel. The model takes a synthetic scene generated by a physics simulation engine (which appears artificial and stylized) and transforms it into a photorealistic rendering indistinguishable from real camera footage. This allows the downstream AI system (the autonomous vehicle, the robot, the industrial vision system) to be trained on data that combines the physical accuracy of simulation with the visual realism of real-world footage.

The synthetic data flywheel has profound implications for competitive dynamics. Organizations that deploy world models early accumulate synthetic data advantages that compound over time. A world model trained on a company’s proprietary manufacturing processes generates synthetic training data that is specific to that company’s equipment, materials, and operating conditions, data that no competitor can replicate. This creates a data moat qualitatively different from the data advantages that LLM-era companies have built, because it is grounded in physical reality rather than in text.

Deep Dive: The Regulatory Landscape and the Demand for Determinism

The world model revolution is unfolding against a backdrop of rapidly evolving AI regulation, and the regulatory environment is one of the most important factors shaping which architectural approaches will succeed in enterprise markets. Understanding the regulatory landscape is essential for anyone building or investing in world models for high-stakes applications.

The EU AI Act: A Framework for High-Risk AI

The European Union’s AI Act, which entered full enforcement in 2026, is the world’s most comprehensive AI regulatory framework. The Act classifies AI systems into four risk categories: unacceptable risk (prohibited), high risk (subject to strict requirements), limited risk (subject to transparency obligations), and minimal risk (largely unregulated).

World models deployed in safety-critical applications (autonomous vehicles, medical devices, critical infrastructure, defense systems) fall squarely in the high-risk category. The requirements for high-risk AI systems under the EU AI Act are extensive: a conformity assessment demonstrating that the system meets specified technical requirements, a quality management system ensuring ongoing compliance, post-market monitoring and incident reporting, and documentation of the system’s design, training data, and performance characteristics.

The most technically demanding requirement for world models is the obligation to explain. High-risk AI systems must explain their outputs in a way that is understandable to the humans who use them. For a generative world model that produces a photorealistic video of a predicted future, explaining why the model predicted that specific future, rather than any of the infinitely many other possible futures, is extremely difficult. The model’s internal representations are high-dimensional, opaque, and not easily interpretable by humans.

This is where ARYA Labs’ CDAI architecture has a structural advantage. Because the system’s predictions are derived from explicit physical equations and causal models, every prediction can be traced back to the specific physical relationships and parameter values that produced it. The explanation is not a post-hoc rationalization. It is the actual computational path that produced the prediction. This level of explainability is not just a regulatory compliance feature; it is a fundamental property of the deterministic, physics-constrained architecture.

The US Regulatory Environment

The United States does not yet have a comprehensive AI regulatory framework comparable to the EU AI Act, but sector-specific regulations are creating similar demands for explainability and auditability in high-stakes domains.

The Food and Drug Administration (FDA) has published guidance on AI- and machine learning-based software as medical devices, requiring that AI systems used in medical diagnosis and treatment planning be explainable, auditable, and subject to ongoing performance monitoring. The Federal Aviation Administration (FAA) has published guidance on the use of AI in aviation, requiring that AI systems used in flight-critical applications be deterministic, verifiable, and subject to rigorous safety analysis. The Department of Defense has published ethical principles for AI that require AI systems used in military applications to be explainable, auditable, and subject to human oversight.

These sector-specific requirements, while less comprehensive than the EU AI Act, create a de facto demand for deterministic, explainable AI in the most valuable enterprise markets. A world model that cannot explain its predictions to an FDA reviewer, an FAA safety engineer, or a DoD contracting officer cannot be deployed in those markets, regardless of its technical performance.

The National Institute of Standards and Technology (NIST) AI Risk Management Framework, published in 2023 and updated in 2025, provides a voluntary but increasingly influential framework for AI risk management that emphasizes explainability, accountability, and human oversight. Many large enterprises are adopting the NIST framework as a baseline for their AI governance programs, creating additional demand for explainable and auditable AI systems.

The Compliance Opportunity

The regulatory environment is not just a constraint on world model deployment. It is a market opportunity. Organizations that demonstrate compliance with AI regulations through explainability, auditability, and mathematical safety guarantees have a competitive advantage in regulated markets. The compliance burden falls most heavily on the largest and most risk-averse organizations: pharmaceutical companies, aerospace manufacturers, defense contractors, and critical infrastructure operators, which are also the most valuable enterprise customers.

For ARYA Labs, the regulatory environment is a tailwind. The company’s CDAI architecture is designed from the ground up to satisfy the most stringent regulatory requirements: mathematical proofs of physical constraint satisfaction, full causal traceability of every prediction, and an architecturally immutable safety kernel that cannot be disabled. These properties are not features that can be added to a generative or predictive world model after the fact. They are architectural constraints that must be built in from the beginning.

The compliance opportunity is also a barrier to entry. Building a world model that satisfies the EU AI Act’s high-risk requirements is not just a matter of technical performance. It requires extensive documentation, testing, and validation processes that take years to complete. Organizations that begin this process now, building the compliance infrastructure alongside the technical capabilities, will have a significant head start over competitors that wait for the regulatory environment to stabilize before engaging.

Deep Dive: The Autonomous Vehicle Proving Ground

No domain has invested more in world models or demonstrated more clearly the value of the technology than autonomous vehicles. The AV industry has been building and deploying world models for years, and the lessons learned in this domain provide a preview of how world models will transform other industries.

Why Autonomous Driving Needs World Models

The fundamental challenge in autonomous driving is the long tail of rare but critical scenarios. A human driver encounters a child running into the street, a tire blowout on the highway, or a vehicle driving the wrong way on a one-way street, perhaps once in a lifetime, but an autonomous vehicle fleet must handle these scenarios reliably, every time. The only way to ensure reliable performance in rare scenarios is to train on data that includes those scenarios, and the only way to obtain that data at scale is through simulation.

Wayve’s GAIA-3 world model, released in December 2025, is the most advanced publicly described generative world model for autonomous driving. The model generates safety-critical driving scenarios that would be dangerous, expensive, or impossible to collect in the real world, allowing Wayve to evaluate its autonomous driving AI against thousands of edge cases without putting a vehicle on the road. GAIA-3 is a 15-billion-parameter latent diffusion model that operates in a compressed latent space, generating photorealistic video of driving scenarios conditioned on text prompts, route information, and vehicle dynamics.

The technical innovation in GAIA-3 is the conditioning mechanism. Previous generative world models for autonomous driving produced plausible-looking driving videos but lacked fine-grained control over the specific scenarios. GAIA-3 is conditioned on specific text descriptions (”a pedestrian crossing the road in heavy rain at night” or “a truck merging into the lane from the right at highway speed”) and generates a photorealistic video of exactly that scenario, with physically plausible vehicle dynamics and environmental conditions. This level of controllability is essential for systematic evaluation of autonomous driving AI: you need to be able to generate specific, targeted scenarios, not just random samples from the distribution of driving experiences.

Waymo’s World Model, announced in February 2026, takes a complementary approach. Rather than generating photorealistic video, Waymo’s model generates structured representations of driving scenarios (the positions, velocities, and predicted trajectories of all agents in the scene) that are used to evaluate the autonomous driving system’s decision-making without the computational overhead of pixel-level generation. Waymo has described the model as a frontier generative model that simulates the behavior of other drivers, pedestrians, and cyclists in response to the autonomous vehicle’s actions, enabling the evaluation of the vehicle’s decision-making in interactive scenarios.

The Wayve Series D: A Signal for the Broader Market

Wayve’s $1.2 billion Series D at an $8.6 billion post-money valuation, with Microsoft, NVIDIA, Uber, Mercedes-Benz, Nissan, and Stellantis all participating, is the most significant single data point in the world model funding landscape. The round is not just large. It is strategically significant.

Microsoft’s participation signals that the world model technology has applications beyond autonomous vehicles, specifically in the enterprise AI platforms that Microsoft is building. A world model that simulates driving scenarios is, at a higher level of abstraction, a world model that simulates any complex, multi-agent physical environment. Microsoft’s investment in Wayve is as much about general-purpose world-model technology as about autonomous vehicles specifically.

Uber’s participation, with an additional $300 million in milestone-based capital tied to robotaxi deployment, is the clearest signal of commercial demand. Uber is not making a research bet. It is making a commercial bet that Wayve’s autonomous driving technology, powered by GAIA-3, will be deployable in Uber’s robotaxi network within a defined timeframe. The milestone structure aligns Wayve’s incentives with Uber’s commercial needs, creating a direct link between the world model technology and revenue generation.

The automaker participation, Mercedes-Benz, Nissan, and Stellantis, signals that the traditional automotive industry is not ceding the autonomous driving market to technology companies. By investing in Wayve, these manufacturers are ensuring access to world-model technology that will underpin the next generation of driver-assistance and autonomous-driving systems in their vehicles.

Deep Dive: Scientific Discovery and the World Model Frontier

Beyond commercial applications in autonomous vehicles, manufacturing, and enterprise AI, world models are beginning to transform scientific discovery itself. This is the most profound long-term implication of the technology: the ability to simulate physical reality at high fidelity is not just a tool for engineering and operations; it is a tool for science.

Drug Discovery and Molecular World Models

The pharmaceutical industry has used computational simulations in drug discovery for decades, but these simulations have been limited by the computational cost of accurately modeling molecular dynamics. A single molecular dynamics simulation of a protein-drug interaction at atomic resolution takes weeks on a supercomputer cluster. The result is that computational drug discovery has been limited to relatively small molecules and relatively short simulation timescales, far shorter than the timescales relevant for drug binding and efficacy.

World models trained on molecular dynamics data dramatically accelerate this process. By learning the statistical mechanics of molecular systems from large datasets of short simulations, a world model predicts the long-time behavior of molecular systems (the binding affinity of a drug candidate, the stability of a protein fold, the kinetics of a chemical reaction) at a fraction of the computational cost of direct simulation. The world model does not replace physics. It learns to predict physics outcomes at a higher level of abstraction, skipping irrelevant short-timescale fluctuations and focusing on long-timescale behaviors that matter for drug efficacy.

Both AMI Labs and ARYA Labs have identified pharmaceutical development as one of their primary target markets, and the JEPA architecture is well-suited to this application. The key challenge in molecular world modeling is not generating photorealistic images of molecules. It is predicting the abstract features of molecular dynamics that determine biological activity. JEPA’s focus on predicting in representation space, rather than in pixel space, aligns naturally with this requirement.

Climate and Earth System Modeling

Climate science presents one of the most complex and consequential challenges in world modeling: simulating the behavior of the Earth’s climate system over decades and centuries. Current climate models (general circulation models, or GCMs) are extraordinarily sophisticated, but they are also extraordinarily slow. A single century-long simulation of the global climate at high resolution takes months on the world’s most powerful supercomputers. This limits the number of simulations that can be run, the range of scenarios that can be explored, and the resolution at which regional climate impacts can be assessed.

AI-accelerated climate models are emerging as a solution. Google DeepMind’s GraphCast, published in 2023, demonstrated that a graph neural network trained on historical weather data produces 10-day weather forecasts at a fraction of the computational cost of traditional numerical weather prediction, with comparable or superior accuracy. The successor systems, including Huawei’s Pangu-Weather and NVIDIA’s FourCastNet, have further advanced the state of the art, enabling ensemble forecasting (running thousands of simulations to assess uncertainty) at a cost that was previously prohibitive.

The next frontier is climate projection, not just weather forecasting, but simulating the evolution of the climate system over decades in response to different emissions scenarios. This requires world models that accurately capture the long-range dependencies and feedback loops of the climate system: the interaction between ocean circulation, atmospheric dynamics, ice sheet dynamics, and the carbon cycle. The mathematical challenges are formidable, but the potential impact, enabling policymakers to assess the consequences of different climate interventions with much greater precision, is enormous.

Astrophysics and Cosmological Simulation

The universe itself is a world model problem. Understanding the formation and evolution of galaxies, the distribution of dark matter, and the large-scale structure of the universe requires simulating the gravitational dynamics of billions of particles over billions of years, a computational task that pushes the limits of even the most powerful supercomputers.

AI-accelerated cosmological simulation is an active area of research, with world models trained on the outputs of traditional N-body simulations learning to predict the large-scale structure of the universe at a fraction of the computational cost. These models generate synthetic universes that are statistically indistinguishable from the outputs of traditional simulations, enabling cosmologists to run large ensembles of simulations to assess the sensitivity of their conclusions to uncertain physical parameters.

The scientific discovery application is the longest-horizon and arguably the most consequential of the world model use cases. A world model trained on the physics of orbital mechanics, atmospheric dynamics, and spacecraft systems simulates mission scenarios, identifies failure modes, and recommends interventions with a level of physical fidelity that no LLM can approach. The same logic extends to materials discovery, climate modeling, and computational biology: every domain where the bottleneck is not data acquisition but the cost of running high-fidelity simulations.

Deep Dive: The Competitive Landscape Beyond the Headline Rounds

The headline funding rounds, World Labs, AMI Labs, Wayve, represent the frontier of the world model ecosystem, but the competitive landscape extends far beyond these three companies. Understanding the broader ecosystem is essential for anyone trying to assess the investment opportunity or the enterprise technology landscape.

The Open-Source Dimension

One of the most significant dynamics in the world model ecosystem is the emergence of powerful open-source world models. NVIDIA’s Cosmos platform is open-source, as is Meta’s V-JEPA 2. AMI Labs has committed to open-sourcing its code and publishing research as it develops. This open-source dynamic is not just a technical choice. It is a strategic one.

Open-source world models lower the barrier to entry for application developers, creating a larger ecosystem of downstream applications and use cases. They also accelerate research by enabling the broader academic community to build on frontier models, identify failure modes, propose improvements, and generate research that advances the field. The open-source strategy is particularly important for AMI Labs, which is building a research-first company in the JEPA paradigm: by open-sourcing its code and publishing its research, AMI Labs is building the community of researchers and developers who will extend and apply the JEPA framework across domains.

The open-source dimension also creates a competitive dynamic distinct from that of the closed-source LLM market. In the LLM market, the frontier models (GPT-5, Claude 4, Gemini Ultra) are proprietary, and the competitive advantage lies in the scale of the training run and the quality of the RLHF fine-tuning. In the world model market, frontier architectures are increasingly open-source, and the competitive advantage lies in the quality of domain-specific training data, the depth of physical expertise, and the robustness of the safety and compliance infrastructure.

Google DeepMind’s Genie 3

Google DeepMind’s Genie 3, released in December 2025, represents the frontier of generative world models for interactive environments. The model generates playable, interactive 3D environments from a single image or text prompt, not just a video of the environment, but a fully interactive world in which an agent takes actions and observes the consequences.

Genie 3 is trained on a large corpus of video game footage, internet videos, and synthetic data, and it learns to generate environments that are both visually realistic and physically plausible. The model’s key innovation is its ability to maintain temporal and spatial consistency across long interaction sequences: as the agent moves through the generated environment, the model maintains a coherent representation of the world that is consistent with the agent’s past observations and actions.

The applications of Genie 3 extend far beyond video games. The ability to generate interactive 3D environments from a single image has immediate applications in robotics (generating training environments for robot manipulation), architecture (generating interactive walkthroughs of building designs), and education (generating interactive simulations of historical events or scientific phenomena). DeepMind has described Genie 3 as a step toward universal simulators: world models that generate any interactive environment from a description, enabling AI agents to be trained in arbitrarily diverse simulated environments.

Startups in the World Model Ecosystem

Beyond the headline companies, a rich ecosystem of startups is building the enabling infrastructure for the world model era. These companies represent the picks-and-shovels opportunity that VCs should be examining alongside the frontier labs.

Decart (Israel/San Francisco) is building real-time interactive world models that generate photorealistic, interactive environments at 20 frames per second, fast enough for real-time interaction. The company’s Oasis model, released in late 2024, generated significant attention as the first demonstration of a real-time interactive world model. Decart is targeting gaming, simulation, and training data generation as its primary markets.

Worlds.io (San Francisco) is building world models for social and economic simulation, the virtual world model category identified by Goldman Sachs. The company’s approach involves training large-scale agent-based models on social and economic data, enabling simulation of complex social dynamics, market behaviors, and organizational processes. The applications include policy simulation, economic forecasting, and organizational design.

Waabi (Toronto/San Francisco) is building world models specifically for autonomous trucking. Founded by Raquel Urtasun, former Chief Scientist at Uber ATG, Waabi’s approach combines a high-fidelity generative world model (Waabi World) with a safety-first autonomous driving system. The company has deployed autonomous trucks on commercial routes in Texas and is using its world model to accelerate the development and validation of its autonomous driving system.

Isomorphic Labs (London, a DeepMind spinout) is applying world model principles to drug discovery, building AI systems that simulate the behavior of biological molecules at atomic resolution. The company’s AlphaFold 3 system, which predicts the structure of protein-ligand complexes, is a world model for molecular biology: a system that has learned the physical constraints governing molecular structure and that simulates the consequences of molecular interactions.

The Investment Thesis: A Framework for Allocating Capital

For venture capitalists and institutional investors navigating the world model landscape, the challenge is not identifying the opportunity. The Goldman Sachs report, the funding rounds, and the research breakthroughs have made the opportunity clear. The challenge is allocating capital intelligently across a complex, multidimensional landscape with a long, uncertain timeline.

The Three Investment Layers

The world model investment opportunity can be organized into three layers, each with different risk profiles, timelines, and return characteristics.

Layer 1: Frontier Labs. These are the companies building foundational world-model architectures: AMI Labs, World Labs, Wayve, ARYA Labs, and the research labs of the hyperscalers (Google DeepMind, Meta AI, NVIDIA). The frontier lab opportunity is characterized by high potential returns, high risk, and long timelines. The rounds are already done at high valuations, and the path to commercial revenue is measured in years. The investors who will generate the best returns from frontier labs are those who access the rounds early (pre-seed, seed) and who have the patience to hold through a multi-year development cycle.

Layer 2: Enabling Infrastructure. These are the companies building the tools, platforms, and infrastructure that world models require: synthetic data generation (PhysicsX, NVIDIA Cosmos), evaluation frameworks, deployment orchestration, domain-specific fine-tuning tooling, and the hardware and software stack required to run simulation-intensive workloads at scale. The enabling infrastructure opportunity is characterized by more predictable revenue trajectories, shorter timelines to commercial deployment, and lower risk than frontier labs. The picks-and-shovels play in the world model gold rush is still largely unbuilt, and the window to build it is open now.

Layer 3: Domain Applications. These are the companies applying world models to specific, high-value enterprise domains: pharmaceutical manufacturing, aerospace design, autonomous vehicles, energy optimization, and structural engineering. The domain application opportunity is characterized by the highest near-term revenue potential, because the enterprise customers in these domains have immediate needs and large budgets. The risk is execution risk: building a domain application requires both deep technical expertise in world models and deep domain expertise in the target industry. The companies that combine both will generate exceptional returns; those that lack either will struggle.

The ARYA Labs Investment Case

ARYA Labs represents a distinctive investment case within this framework. The company is simultaneously a frontier lab (developing a novel world model architecture), an enabling infrastructure provider (its CDAI architecture provides the deterministic, physics-constrained foundation that other applications build on), and a domain application company (with seven active industry domain nodes and $300,000 in ARR at stealth emergence).

The company’s public benefit corporation structure, its positioning as the world’s first deterministic world model frontier lab, and its focus on the highest-value, most regulated enterprise markets (defense, aerospace, pharma, medical devices) create a distinctive competitive position. The zero neural network parameter architecture means that ARYA’s system is deployable on standard enterprise hardware without massive GPU clusters, a significant advantage in the air-gapped, on-premises deployment environments that characterize its target markets.

The $60 million qualified sales pipeline at stealth emergence, with two of three flagship enterprise customers already engaged, signals that the market validation is already underway. Customer engagements in pharma manufacturing, defense, and remote patient monitoring (including the NUVO maternal-fetal platform) provide concrete, verifiable deployment benchmarks that enterprise prospects can evaluate.

For investors, the key question is whether the deterministic, physics-constrained approach scales across the full range of physical domains the company targets. The nano model architecture is designed for scalability, but the depth of physical expertise required to design the appropriate nano model architectures for each domain is a potential bottleneck. The company’s team, which combines deep AI expertise (Dobrin’s IBM background) with deep physics expertise (Chmiel’s CERN and ESA background), is well-positioned to address this challenge, but scaling that expertise across seven domains simultaneously is a significant organizational challenge.

Deep Dive: Enterprise Architecture for World Model Deployment

For senior executives and enterprise architects, the question of how to deploy world models in production is as important as which world model to choose. World models have fundamentally different deployment requirements than LLMs, and organizations that attempt to deploy them using the same infrastructure and processes they use for LLMs will encounter significant challenges.

The Deployment Stack

A production world model deployment requires several components not included in a standard LLM deployment stack.

Physics simulation infrastructure is required for world models that rely on physics simulation for training data generation or for real-time state estimation. This infrastructure includes physics simulation engines (ANSYS, OpenFOAM, NVIDIA Isaac Sim), high-performance computing clusters for running simulations at scale, and data pipelines for ingesting simulation outputs and converting them into training data. Organizations that do not already have physics simulation infrastructure will need to build or acquire it before deploying world models.

Multi-modal sensor fusion is required for world models that integrate data from multiple sensor modalities (e.g., cameras, LiDAR, radar, accelerometers, temperature sensors, and pressure sensors). The sensor fusion pipeline must synchronize data from different sensors, correct for calibration errors and sensor noise, and convert the raw sensor data into the format required by the world model. This is a non-trivial engineering challenge that requires specialized expertise in sensor systems and signal processing.

Real-time state estimation is required for world models that are used for real-time decision support. The world model must continuously update its internal representation of the physical system as new sensor data arrives, and it must do so fast enough to support real-time decision-making. For many physical systems, this requires inference latency of less than 100 milliseconds, a requirement that is challenging for large neural network-based world models but achievable for ARYA Labs’ nano model architecture.

Simulation-to-real transfer is required for world models that are trained on synthetic data and deployed on real physical systems. The sim-to-real gap, the discrepancy between the simulated environment and the real environment, is a fundamental challenge in physical AI deployment. Techniques for closing the sim-to-real gap include domain randomization (training on a wide range of simulated environments to improve robustness), domain adaptation (fine-tuning the model on a small amount of real-world data), and physics-informed regularization (constraining the model to produce outputs that are consistent with physical laws).

Safety monitoring and verification are required for world models deployed in safety-critical applications. The safety monitoring system must continuously verify that the world model’s predictions are consistent with physical constraints, that the model’s confidence estimates are well-calibrated, and that the model’s behavior is within the expected operating envelope. For ARYA Labs’ CDAI architecture, this monitoring is built into the Unfireable Safety Kernel, which provides real-time verification of physical constraint satisfaction.

The Build vs. Buy Decision

For most enterprises, the decision to deploy world models will involve a combination of building proprietary capabilities and buying or licensing commercial solutions. The build vs. buy decision depends on several factors: the availability of commercial solutions for the specific application domain, the strategic importance of the world model capability, the availability of internal expertise, and the regulatory requirements for the application.

For applications in highly regulated industries (pharmaceutical manufacturing, aerospace, medical devices, defense) the build vs. buy decision is often constrained by regulatory requirements. Regulators typically require that the organization deploying the AI system have a deep understanding of how the system works, which precludes the use of black-box commercial solutions. In these cases, the organization either builds proprietary world models or works closely with a vendor that provides the transparency and auditability regulators require.

For applications in less regulated industries (logistics, retail, financial services) commercial world model solutions are more viable, and the build vs. buy decision is primarily driven by cost and time-to-value considerations. Commercial solutions from vendors like NVIDIA (Cosmos), World Labs (Marble), and ARYA Labs (CDAI) significantly reduce the time and cost of world model deployment, but they do not always provide the level of domain specificity required for the most valuable applications.

The most common enterprise deployment pattern will likely be a hybrid approach: using commercial world-model platforms as a foundation and building proprietary domain-specific capabilities on top of them. This approach allows organizations to benefit from the scale and expertise of commercial vendors while maintaining the domain specificity and competitive differentiation that proprietary capabilities provide.

The Organizational Readiness Challenge

Beyond the technical deployment challenges, world model adoption requires significant organizational readiness. Organizations that are accustomed to deploying LLMs, which require relatively little domain expertise to use effectively, will find that world models require a much deeper level of technical and domain expertise.

The key organizational capabilities required for world model deployment include: physics and engineering expertise (to design the physical models and validate the simulation outputs), data engineering expertise (to build the sensor fusion and synthetic data pipelines), AI/ML expertise (to train, evaluate, and maintain the world models), and domain expertise (to translate the world model outputs into actionable business decisions). Few organizations currently have all of these capabilities in-house, and building them will require significant investment in hiring, training, and organizational development.

The organizations best positioned for world model adoption are those that already have strong physics and engineering capabilities (aerospace manufacturers, pharmaceutical companies, energy companies, defense contractors) and are investing now to build the AI/ML capabilities required to complement those strengths. These organizations have the domain expertise to design meaningful world models and the regulatory context to justify the investment in safety and compliance infrastructure.

Deep Dive: The Economics of World Model Deployment

The economic case for world model deployment is compelling in theory, but the path from theory to realized value is more complex than the marketing materials suggest. Understanding the economics of world model deployment, the costs, the benefits, and the timeline to ROI is essential for executives making investment decisions.

The Cost Structure

World model deployment has a fundamentally different cost structure than LLM deployment. LLM deployment costs are primarily driven by inference compute, the cost of running the model to generate outputs. World model deployment costs are driven by a combination of training compute, simulation infrastructure, data collection and processing, and ongoing maintenance.

Training compute for large neural network-based world models (Wayve GAIA-3, NVIDIA Cosmos, AMI Labs) is comparable to or greater than LLM training compute. A 15-billion-parameter generative world model requires training runs that cost tens of millions of dollars in GPU compute. Unlike LLMs, which are typically trained once and then deployed, world models often require continuous retraining as new data becomes available and as the physical system evolves. This continuous retraining requirement significantly increases the total cost of ownership.

Simulation infrastructure is a cost that is largely absent from LLM deployments but is central to world model deployments. Building and maintaining a physics simulation environment, the software, hardware, and expertise required to run accurate physics simulations, is a significant investment. For organizations that already have simulation infrastructure (aerospace manufacturers, automotive OEMs, pharmaceutical companies) this cost is largely incremental. For organizations that do not, it is a substantial new investment.

Data collection and processing costs are higher for world models than for LLMs, because the required data modalities (sensor data, simulation outputs, multi-modal measurements) are more expensive to collect and process than text data. Building the data pipelines required to ingest, clean, and format this data for world model training is a significant engineering investment.

For ARYA Labs’ nano model architecture, the cost structure is fundamentally different. Because each nano model requires minimal computing and a small number of parameters compared to neural network-based approaches, the training cost is negligible. The primary costs are the engineering time to design appropriate nano model architectures for each domain and the data collection needed to identify the physical parameters of each nano model. This cost structure is particularly attractive for enterprises that need to deploy world models in multiple domains, as the incremental cost of adding a new domain is much lower than for neural network-based approaches.

The Benefit Structure

The economic benefits of world model deployment fall into three categories: cost reduction, revenue enhancement, and risk mitigation.

Cost-reduction benefits stem from the ability to simulate physical processes before execution, thereby reducing the costs of physical testing, prototyping, and experimentation. In pharmaceutical manufacturing, a world model that predicts the outcome of a process change before it is implemented can save millions of dollars by preventing failed batches and regulatory delays. In aerospace design, a world model that simulates a component's structural behavior before it is manufactured can save months of physical testing and iteration. In energy operations, a world model that predicts equipment failures before they occur reduces unplanned downtime and maintenance costs.

Revenue-enhancement benefits stem from the ability to optimize physical processes in ways previously impossible. A world model for a manufacturing process identifies operating conditions that maximize yield, minimize energy consumption, and reduce defect rates: improvements that translate directly into revenue and margin. A world model for a logistics network optimizes routing, scheduling, and inventory levels in real time, reducing costs and improving service levels.

Risk mitigation benefits come from the ability to identify and avoid failure modes before they occur. In safety-critical applications (aerospace, medical devices, nuclear power) the ability to predict and prevent failures is not just a cost reduction opportunity. It is a regulatory and reputational necessity. The economic value of avoiding a single major failure event (a drug recall, an aircraft incident, a power plant outage) exceeds the entire cost of deploying a world model.

The ROI Timeline

The timeline to positive ROI for world model deployment is longer than for LLM deployment, but the ROI magnitude is also larger. LLMs deploy in weeks and generate positive ROI within months, but the ROI is typically incremental, a 10 to 20 percent improvement in productivity for specific tasks. World models require months or years to deploy and validate, but the ROI is transformational: a 50 to 80 percent reduction in physical testing costs, a 30 to 50 percent improvement in process yield, or the prevention of a catastrophic failure event.

The Goldman Sachs report’s caution about near-term world model investment is well-founded: the technology is real, but the commercial deployment timeline is measured in years, not months. Organizations that begin investing in world model capabilities now, building the data infrastructure, the simulation environments, and the organizational expertise, will be positioned to capture the ROI when the technology matures. Organizations that wait for technology to mature before engaging will find themselves years behind their competitors.

Deep Dive: Open Questions in World Model Research

Despite the extraordinary progress of the past few years, world model research faces several fundamental open questions that will determine the field's long-term trajectory. Understanding these open questions is important for anyone trying to assess the technical risk of world model investments.

The Scalability Question

The most fundamental open question in world model research is whether the current approaches will scale to the complexity of real-world physical systems. The most capable world models today accurately simulate relatively simple physical environments (driving scenarios, robotic manipulation tasks, molecular dynamics), but the real world is vastly more complex. A world model for a large-scale industrial facility must simultaneously model thousands of interacting physical components, each with its own dynamics and failure modes. A world model for a city must model the interactions between millions of agents (vehicles, pedestrians, buildings, infrastructure), each with their own physical and behavioral properties.

The scaling question is not just about computing. It is about whether the current architectures can represent the complexity of real-world physical systems without exponential growth in model size or training data. The empirical scaling laws that have driven LLM progress, the observation that model performance improves predictably with model size and training data, do not hold for world models in the same way. Physical systems have a structure that is exploited to achieve efficient scaling (the composability of nano models in ARYA’s architecture is one example), but whether this structure is sufficient to scale to the full complexity of real-world systems remains an open question.

The Partial Observability Problem

Real physical systems are almost never fully observable. Sensors have noise, limited resolution, and limited coverage. Important state variables, such as the internal temperature of a component, the stress distribution within a material, and the concentration of a chemical species in a reaction vessel, are not directly measurable. The world model must infer the full state of the system from partial, noisy observations, a problem known as state estimation or filtering.

State estimation is a well-studied problem in control theory and robotics, with classical solutions like the Kalman filter and particle filter. However, these classical solutions assume that the system dynamics are known and linear (or nearly linear), and they struggle with the high-dimensional, nonlinear dynamics of complex physical systems. Modern world models address this challenge through learned state estimation, training the model to infer the full state from partial observations, but this approach requires large amounts of training data and does not always generalize well to novel situations.

The Long-Horizon Planning Problem

World models are most valuable for long-horizon planning, predicting the consequences of actions over long time horizons. But the accuracy of world model predictions typically degrades over time: small errors in the initial state estimate compound over multiple prediction steps, leading to increasingly inaccurate predictions at longer horizons. This error accumulation problem is a fundamental challenge for long-horizon planning applications.

Several approaches address the error accumulation problem. Model predictive control (MPC) addresses this by replanning at every time step: using the world model to plan a sequence of actions, executing only the first action, observing the resulting state, and then replanning from that state. This approach limits the prediction horizon to a manageable length, but it requires the world model to be fast enough to replan in real time. Hierarchical planning addresses it by decomposing the long-horizon planning problem into a sequence of shorter-horizon sub-problems, each of which is more tractable for the world model.

ARYA Labs’ deterministic architecture has a structural advantage for long-horizon planning. Because the system’s predictions are derived from explicit physical equations rather than learned statistical models, the error accumulation problem is significantly reduced. Physical equations are exact (within the limits of the physical model’s accuracy), so the prediction error does not compound in the same way as for statistical models. This advantage is particularly important for applications such as aerospace design and pharmaceutical manufacturing, where the planning horizon spans days, weeks, or months.

The Transfer Learning Problem

World models are trained on specific physical environments, and their ability to transfer to new environments, environments with different physical properties, different geometries, or different dynamics, is limited. A world model trained on driving scenarios in California does not generalize well to those in Norway because the weather, road conditions, and traffic patterns differ. A world model trained on one pharmaceutical manufacturing process does not generalize well to a different process, even if the underlying chemistry is similar.

The transfer learning problem is particularly challenging for world models because physical environments are highly diverse and the relevant differences between environments are subtle. Two manufacturing processes that appear similar have very different dynamics due to small differences in equipment geometry, material properties, or operating conditions. A world model that cannot detect and adapt to these subtle differences produces inaccurate predictions, leading to costly or dangerous decisions.

The most promising approaches to the transfer learning problem involve meta-learning, training the world model to quickly adapt to new environments with only a small amount of new data, and physics-informed transfer, using physical knowledge to identify relevant differences between environments and adapt the model accordingly. ARYA Labs’ nano model architecture is inherently well-suited to transfer learning, because adding a new physical component to the system requires only training a new nano model for that component, rather than retraining the entire system.

Deep Dive: Designing for Human-World Model Collaboration

The most technically sophisticated world model is only as valuable as the decisions it enables. Designing effective human-world model collaboration systems (interfaces, workflows, and organizational processes that enable humans to leverage world model capabilities) is as important as the model's technical performance.

The Interface Design Challenge

The output of a world model is not a text response or an image; it is a simulation of the future. Presenting this simulation to a human decision-maker in a way that is both informative and actionable is a significant design challenge. Too much information overwhelms the decision-maker; too little information leaves important uncertainties unexplored.

The most effective world model interfaces are those that present the simulation results in terms of the decision-relevant quantities, the outcomes that the decision-maker cares about, rather than the raw simulation outputs. A manufacturing engineer does not need to see the full temperature and pressure profile of a process. They need to see whether the process will produce a product that meets specifications, and what the probability of different failure modes is. A financial analyst does not need to see the full trajectory of every asset in a portfolio. They need to see the distribution of portfolio returns under different market scenarios.

The design of effective world model interfaces requires close collaboration between AI engineers, domain experts, and user experience designers. The interface must translate the high-dimensional outputs of the world model into the low-dimensional, decision-relevant summaries the human decision-maker needs, while preserving the uncertainty information essential for risk-aware decision-making.

The Calibration Challenge

A world model is only useful if the decision-maker trusts its predictions. Building and maintaining trust requires that the model’s predictions be well-calibrated, that the model’s stated confidence levels accurately reflect its actual accuracy. A model that claims 95 percent confidence in a prediction that turns out to be wrong 50 percent of the time is worse than useless. It actively misleads the decision-maker.

Calibration is a particularly challenging problem for world models, because the accuracy of the model’s predictions depends on the complexity of the physical system, the quality of the training data, and the similarity of the current situation to the training distribution. A world model is well-calibrated for situations similar to its training data but poorly calibrated for novel situations. Detecting when the model is operating outside its calibration envelope, and communicating this uncertainty to the decision-maker, is a critical safety requirement.

ARYA Labs’ deterministic architecture has a natural advantage for calibration. Because the system’s predictions are derived from explicit physical equations, the model’s uncertainty is not a statistical artifact of the training process but a reflection of the genuine uncertainty in the physical parameters. The model provides explicit bounds on its predictions: “the yield will be between 87 and 93 percent, with the uncertainty driven by the ±2°C uncertainty in the reactor temperature,” rather than a single point estimate with an opaque confidence interval.

The Organizational Process Challenge

Deploying world models in enterprise settings requires not just technical integration but organizational process redesign. The decision-making processes that organizations have developed for LLM-augmented workflows, in which the AI generates options and the human selects among them, are not appropriate for world model-augmented workflows, in which the AI simulates the consequences of different options and the human must interpret and act on those simulations.

The most important organizational process change is the shift from reactive to proactive decision-making. LLMs are primarily reactive tools: they respond to queries and generate outputs on demand. World models are primarily proactive tools: they continuously monitor the physical system, identify emerging risks and opportunities, and proactively alert decision-makers to situations that require attention. This shift requires organizations to redesign their decision-making processes around continuous monitoring and proactive intervention, rather than periodic review and reactive response.

The organizational change management required for this shift is significant. Decision-makers must develop new skills, the ability to interpret simulation outputs, to assess the quality of physical models, and to make decisions under uncertainty, that are not required for LLM-augmented workflows. Organizations must develop new governance frameworks for world model-assisted decisions: frameworks that define when human oversight is required, how disagreements between the world model and human judgment should be resolved, and how the world model's performance should be monitored and evaluated.

Deep Dive: Sovereign World Models and the New AI Race

The world model revolution is unfolding in a geopolitical context increasingly defined by competition among the United States, China, and Europe for AI supremacy. Understanding this geopolitical dimension is essential for anyone trying to assess the long-term trajectory of the world model ecosystem.

The US-China AI Competition

The Stanford AI Index 2026 documents the stark disparity in AI investment between the US and China: $285.9 billion versus $12.4 billion in 2025, a ratio of roughly 23 to 1. However, the report also notes that the performance gap between US and Chinese models on top benchmarks has narrowed to just 2.7 percent, a remarkable achievement given the investment disparity. China is closing the capability gap even as the investment gap widens, which suggests that the efficiency of research investment, not just its scale, will determine the long-term competitive outcome.

Specifically in the world model domain, China has made significant investments in physical AI and simulation. Huawei’s Pangu-Weather model, which generates 10-day weather forecasts at a fraction of the computational cost of traditional numerical weather prediction, is one of the most capable weather world models in the world. Chinese automotive companies (BYD, NIO, Xpeng) are investing heavily in world models for autonomous driving, and Chinese industrial companies are deploying world models for manufacturing optimization and predictive maintenance.

The US export controls on advanced semiconductors, specifically the restrictions on exporting NVIDIA’s H100 and B200 GPUs to China, are intended to slow China’s progress in AI by limiting its access to the compute required to train large models. However, the effectiveness of these controls is limited by several factors: the availability of alternative compute sources (Chinese domestic chip manufacturers, cloud computing providers in non-restricted countries), the ability to train world models with less compute than LLMs (particularly for architectures like ARYA’s nano model approach), and the difficulty of enforcing export controls in a globalized semiconductor supply chain.

The European Sovereign AI Initiative

Europe’s response to the US-China AI competition has been the Sovereign AI initiative: a set of policies and investments designed to ensure Europe has access to AI capabilities independent of US or Chinese technology. The AMI Labs $1.03 billion seed round, heavily backed by French industrial and family offices, is the most prominent example of the Sovereign AI initiative in the world model domain.

The European approach to AI sovereignty is shaped by two concerns: economic competitiveness and regulatory sovereignty. On the economic side, European policymakers are concerned that European companies will become dependent on US-based AI platforms (particularly the hyperscalers Microsoft Azure, Google Cloud, Amazon AWS, and the frontier labs OpenAI, Anthropic, Google DeepMind) for their AI capabilities, creating a structural disadvantage relative to US competitors that have access to these capabilities at lower cost and with greater customization. On the regulatory side, European policymakers are concerned that US-based AI platforms do not always comply with European data protection and AI governance requirements, creating legal and reputational risks for European companies that use them.

AMI Labs is positioned as a European champion in the world model space, providing European enterprises with access to frontier world model capabilities developed, trained, and operated within the European regulatory framework. The company’s commitment to open-sourcing its code and publishing its research is also consistent with the European approach to AI governance, which emphasizes transparency and accountability.

The Defense Dimension

The defense applications of world models are the most strategically significant and the most sensitive. A world model that accurately simulates the behavior of physical systems in complex, adversarial environments has obvious applications in military planning, weapons development, and autonomous systems. The ability to simulate the consequences of different military actions before executing them, to war-game scenarios in a world model before committing forces, is a capability that every major military power is pursuing.

The US Department of Defense has been investing heavily in AI for defense applications, with a particular focus on autonomous systems and decision support. The DoD’s AI strategy emphasizes the importance of explainable AI, AI systems that explain their recommendations to human decision-makers, which aligns with the deterministic, physics-constrained approach of ARYA Labs. The company’s focus on defense as one of its seven active industry domain nodes and its public benefit corporation structure (which ensures that its technology is developed with societal benefit in mind) position it well for DoD procurement.

The defense dimension also highlights the dual-use risks of world model technology. A world model that simulates the aerodynamics of a commercial airliner also simulates the aerodynamics of a hypersonic missile. A world model that simulates the molecular dynamics of a life-saving drug also simulates the synthesis of a novel chemical weapon. The proliferation of highly capable world models poses significant challenges to global security and arms control that the international community has yet to address.

Deep Dive: The Path Forward

The world model revolution is real; capital is flowing, and enterprise demand is forming. But the path from the current state of technology to widespread commercial deployment is not linear, and several critical milestones must be achieved before world models fulfill their transformative potential.

The Technical Milestones

Improved sample efficiency is the most important near-term technical milestone. Current world models require large amounts of training data to achieve high accuracy, which limits their applicability to domains where data is scarce or expensive. Advances in meta-learning, physics-informed learning, and transfer learning are needed to enable world models to learn accurate physical dynamics from much smaller datasets.

Better uncertainty quantification is essential for enterprise deployment. Decision-makers need to know not just what the world model predicts, but how confident the model is in its predictions, and what the key sources of uncertainty are. Current world models provide uncertainty estimates that are often poorly calibrated and difficult to interpret. Advances in Bayesian deep learning, conformal prediction, and physics-informed uncertainty quantification are needed to provide the calibrated, interpretable uncertainty estimates that enterprise applications require.

Faster inference is required for real-time decision support applications. Many of the most valuable world model applications, including process control, autonomous vehicles, and real-time risk monitoring, require inference latency of less than 100 milliseconds. Current large neural network-based world models are too slow for these applications, and advances in model compression, hardware acceleration, and inference optimization are needed to close the latency gap.

Improved sim-to-real transfer is essential for physical AI applications. The gap between simulated environments and real physical systems is a fundamental challenge for world model deployment, and closing this gap requires advances in domain randomization, domain adaptation, and physics-informed regularization.

The Infrastructure Milestones

Standardized evaluation frameworks are needed to enable systematic comparison of competing world models. The current benchmark landscape is fragmented, with different research groups using different evaluation protocols and metrics. Standardized benchmarks, analogous to MMLU for LLMs, would enable more systematic progress and would make it easier for enterprise customers to compare competing systems.

Open-source world model foundations are needed to lower the barrier to entry for application developers. NVIDIA’s Cosmos and Meta’s V-JEPA 2 are important steps in this direction, but the ecosystem of open-source world-model tools, libraries, and pre-trained models remains nascent. Building a rich open-source ecosystem, analogous to the Hugging Face ecosystem for LLMs, would dramatically accelerate the development and deployment of world model applications.

Synthetic data pipelines are needed to address the data bottleneck. Building the infrastructure to generate, validate, and manage large-scale synthetic datasets for world model training remains a significant engineering challenge that has not yet been fully addressed. Companies like PhysicsX are making progress on AI-accelerated physics simulation, but the full stack, from simulation to synthetic data generation to training data management, remains largely unbuilt.

The Regulatory Milestones

Harmonized AI governance frameworks are needed to enable global deployment of world models in regulated industries. The current patchwork of national and sector-specific AI regulations creates significant compliance complexity for companies trying to deploy world models across multiple jurisdictions. Harmonized frameworks, analogous to the Basel III framework for financial regulation, would reduce compliance costs and enable more efficient global deployment.

Technical standards for world model safety are needed to provide a common language for assessing and communicating their safety properties. The current state of the art in AI safety, including red-teaming, adversarial testing, and robustness evaluation, was developed for LLMs and is not directly applicable to world models. New technical standards that cover physical constraint satisfaction, causal correctness, and long-horizon stability are needed to provide a rigorous framework for world model safety assessment.

Procurement frameworks for world model-assisted decisions are needed to enable adoption by government and regulated industries. Current procurement frameworks for AI systems were designed for LLMs and do not address the unique properties of world models, including their ability to simulate physical consequences, their dependence on physical models, and their requirements for ongoing validation and monitoring. New procurement frameworks that address these properties are needed to enable large-scale government and regulated-industry deployments that represent the most valuable near-term market for world models.

The path forward is long, but the direction is clear. The world model revolution is not a distant future scenario; it is the next chapter of the AI story, and the opening pages are being written right now. The companies, investors, and organizations that engage with this transition thoughtfully and strategically, that understand the technical landscape, the investment dynamics, and the organizational requirements, will be the ones that capture the extraordinary value that world models will create.

Deep Dive: The Transformer’s Evolution in World Modeling

The Transformer architecture, introduced by Vaswani et al. in the landmark 2017 paper “Attention Is All You Need,” has become the dominant computational substrate for modern AI, not just for language models, but increasingly for world models as well. Understanding how the Transformer has been adapted and extended for world modeling reveals both the architecture's power and its fundamental limitations for physical simulation.

The Original Transformer and Its Limitations for Physical Modeling

The original Transformer was designed for sequence-to-sequence tasks in natural language processing: given an input sequence of tokens, produce an output sequence of tokens. The key innovation was the self-attention mechanism, which allows every token in the sequence to attend to every other token, capturing long-range dependencies that recurrent neural networks struggled to model. The Transformer’s ability to process sequences in parallel, rather than sequentially as RNNs do, also made it far more efficient to train on modern GPU hardware.

However, the original Transformer has several properties that make it poorly suited to physical-world modeling. First, it operates on discrete tokens, symbols drawn from a finite vocabulary, rather than continuous state variables. Physical systems are described by continuous quantities: temperatures, pressures, velocities, and positions. Discretizing these quantities into tokens introduces quantization errors and loses the geometric structure of the state space. Second, the Transformer’s attention mechanism is permutation-equivariant: it treats all tokens as equally relevant, regardless of their spatial or temporal relationships. Physical systems have strong spatial and temporal structure (nearby objects interact more strongly than distant ones, and recent events are more relevant than distant ones) that the standard Transformer does not capture efficiently.

Third, and most fundamentally, the Transformer has no built-in inductive bias toward physical laws. It learns to approximate physical dynamics from data, but it does so by memorizing statistical patterns rather than encoding the underlying equations. This means that a Transformer-based world model generalizes poorly to situations that differ statistically from its training data, even when those situations are physically similar. A model trained on Earth’s gravity does not automatically generalize to a different gravitational constant, even though the underlying physics is the same.

Adaptations for Physical World Modeling

The research community has developed several important adaptations of the Transformer architecture to address these limitations for physical world modeling.

Continuous-time Transformers replace the discrete token sequence with a continuous-time representation, allowing the model to operate on continuous state variables and to make predictions at arbitrary time points. Neural ODEs (Ordinary Differential Equations), introduced by Chen et al. in 2018, provide a principled framework for continuous-time modeling: instead of predicting the next discrete state, the model predicts the time derivative of the state, and the state is evolved forward in time by numerical integration. The combination of Transformer-based encoders with Neural ODE dynamics has produced some of the most accurate world models for physical systems.

Graph Neural Networks (GNNs) extend the Transformer’s attention mechanism to graph-structured data, enabling the model to capture spatial relationships among physical objects. In a GNN-based world model, each physical object is represented as a graph node, and edges represent physical interactions (gravitational attraction, contact forces, electromagnetic fields). The GNN’s message-passing mechanism allows each object to aggregate information from its neighbors, capturing the local structure of physical interactions. Google DeepMind’s GraphCast weather model and the molecular dynamics models used in drug discovery are both based on GNN architectures.

Physics-Informed Neural Networks (PINNs) incorporate physical laws directly into the Transformer’s training objective. Instead of training the model purely to minimize prediction error on observed data, PINNs add a physics residual term to the loss function that penalizes predictions that violate the governing equations of the physical system. This provides a form of inductive bias in physics that helps the model generalize to situations not seen during training, as long as they obey the same physical laws.

Mamba and State Space Models (SSMs) represent a different approach to handling long-range dependencies in continuous-time sequences. Unlike the Transformer’s attention mechanism, which has quadratic complexity in the sequence length, Mamba’s selective state-space model has linear complexity, making it far more efficient for long sequences. This efficiency advantage is particularly important for world models that need to simulate physical systems over long time horizons, a task that requires processing very long sequences of state observations.

The Video Transformer: From Text to Physical Reality

The most direct application of the Transformer to world modeling is the video Transformer, which treats video as a sequence of image patches and learns to predict future frames from past ones. The video Transformer is the foundation of most modern generative world models, including Wayve’s GAIA-3 and NVIDIA’s Cosmos.

The key challenge in video Transformers is the computational cost. A single high-resolution video frame contains millions of pixels, and treating each pixel as a token would make the attention computation prohibitively expensive. The solution is to operate in a compressed latent space: a convolutional encoder compresses each frame into a small set of latent tokens, a Transformer processes these tokens, and a convolutional decoder expands the predicted tokens back into pixel space. This approach, known as latent video Transformer, reduces the computational cost by several orders of magnitude while preserving the essential semantic and geometric features of the video.

The temporal attention mechanism in video Transformers is particularly important for world modeling. Standard spatial attention allows each patch in a frame to attend to all other patches in the same frame, capturing the spatial relationships within a single image. Temporal attention extends this by allowing each patch to attend to corresponding patches in past and future frames, capturing the temporal dynamics of the scene. The combination of spatial and temporal attention enables the model to learn both the environment's static structure and its dynamic evolution over time.

NVIDIA’s Cosmos platform uses a Diffusion Transformer (DiT) architecture, a combination of the latent diffusion model framework and the Transformer architecture, for its generative world model. The DiT architecture has several advantages over the convolutional U-Net architecture used in earlier diffusion models: it scales more efficiently with compute, it handles variable-length sequences more naturally, and it is conditioned on a wider range of inputs (text, video, sensor data) through cross-attention mechanisms.

Deep Dive: The Measurement Problem in Evaluating World Model Quality

One of the most challenging and underappreciated problems in the world model field is evaluation: how do you measure whether a world model is good? This question is more complex than it might appear, and the answer has profound implications for how the field develops and how enterprise customers should assess competing systems.

The Multi-Dimensional Nature of World Model Quality

A world model is evaluated along multiple dimensions, and these dimensions are not always correlated. A model that scores well on one dimension scores poorly on another, and the relative importance of different dimensions depends on the specific application.

Physical fidelity measures how accurately the model’s predictions match the true dynamics of the physical system. This is the most fundamental dimension of world model quality, but it is also the most difficult to measure, because the “true” dynamics of complex physical systems are often unknown or computationally intractable to compute exactly. In practice, physical fidelity is often measured by comparing the model’s predictions to the outputs of high-fidelity physics simulation engines, which serve as a ground truth.

Visual realism measures how photorealistic the model’s generated outputs are. For generative world models that produce video or images, visual realism is often measured using the Fréchet Inception Distance (FID) or similar metrics that compare the statistical distributions of generated images with those of real images. Visual realism is important for applications where the generated outputs will be viewed by humans (e.g., training data for autonomous vehicles), but it is largely irrelevant for applications where the outputs are consumed by downstream AI systems (e.g., reinforcement learning agents).

Temporal consistency measures how coherent the model’s predictions are over time. A world model that generates physically plausible individual frames but produces inconsistent sequences, where objects appear and disappear, or where the geometry of the scene changes arbitrarily between frames, is not useful for planning or simulation. Temporal consistency is particularly challenging for generative models, which tend to produce each frame independently and do not always maintain a consistent internal representation of the scene over time.

Causal correctness measures whether the model correctly captures the causal relationships between events. A world model that predicts that a ball will fall when dropped is physically correct, but a model that predicts that a ball will fall because it is dropped, and that correctly predicts what would happen if the ball were not dropped, is causally correct. Causal correctness is essential for decision support applications, where the user needs to understand the consequences of different interventions, not just the most likely outcome.

Generalization measures how well the model performs on situations not seen during training. A world model that accurately simulates only situations similar to its training data is of limited value for enterprise applications, where the most important scenarios are often novel. Generalization is particularly challenging for data-driven world models, which tend to memorize training data rather than learn the underlying physical laws.

The Benchmark Landscape in 2026

The world model research community has developed a rich landscape of benchmarks for evaluating different aspects of world model quality. Understanding this landscape is essential for anyone trying to compare competing systems or assess the state of the field.

PhysicsMind is a benchmark for physical reasoning from images, testing whether models correctly identify and reason about physical quantities (mass, velocity, force) from visual observations. The benchmark includes tasks ranging from simple object recognition to complex multi-step physical reasoning, and it has revealed significant gaps between the physical reasoning capabilities of current AI systems and human performance.

Disambiguating Physics is a benchmark specifically designed to test whether generative world models correctly simulate physical laws over long time horizons. The benchmark generates scenarios where the correct physical behavior is unambiguous, including a ball rolling down a ramp, a pendulum swinging, and a fluid flowing through a pipe, and measures whether the model’s generated video is consistent with the correct physical behavior. The results have revealed systematic failure modes in state-of-the-art generative models: they tend to produce physically plausible behavior in the short term but diverge from correct physics over longer time horizons.

CausalARC is a benchmark for abstract causal reasoning, modeled after the Abstraction and Reasoning Corpus (ARC). The benchmark generates tasks from fully specified structural causal models, testing whether models correctly identify causal relationships and reason about the consequences of interventions. CausalARC has revealed a fundamental divide between models that have learned statistical correlations and models that have learned causal structure: the former fail on novel causal configurations, while the latter generalize.

WorldArena (RoboTwin 2.0) is the most demanding physical AI benchmark in this set: a bimanual robot manipulation suite with 50 distinct task types and 500 test episodes, requiring the AI system to physically perform tasks such as stacking bowls, placing objects in cabinets, pressing staplers, and rotating QR codes. AARA v2’s 47.4% overall success rate, compared to the v1 baseline of 8.0%, reflects the Constraint Breaker’s ability to retrieve similar prior episodes, blend their trajectories adaptively, and refine execution via DTW alignment. The tasks where AARA performs best (put_object_cabinet: 100%, put_bottles_dustbin: 100%, stack_bowls: 70%) are precisely those with well-defined physical constraints and clear success criteria. The tasks where AARA struggles (turn_switch: 0%, stamp_seal: 20%) are those requiring fine-grained dexterity and precise force control, an honest reflection of the current state of the art in physical manipulation AI.

ARYA Labs’ performance profile is particularly striking in this landscape: dominant in causal reasoning and physics problem-solving, competitive in mathematics, strong in code generation, and improving rapidly in physical manipulation. The benchmarks on which ARYA does not lead (visual realism, visual generalization, fine-grained dexterity) are precisely those where the generative and predictive approaches have architectural advantages. This is not a weakness; it is a clear signal that the deterministic architecture is not the right tool for the job.

The field is converging on the understanding that visual realism and physical accuracy are distinct properties that require distinct evaluation frameworks, and that the latter is more important for enterprise deployment in safety-critical domains. A world model that generates photorealistic video of a car driving down a road scores well on visual fidelity metrics while failing on physical consistency metrics; physical consistency determines whether the model is trusted to make decisions with real-world consequences.

Deep Dive: The Hardware Landscape and What World Models Demand from Silicon

The world model revolution is not just a software story; it is a hardware story. The computational demands of world models are fundamentally different from those of LLMs, and the hardware landscape is evolving rapidly to meet those demands. Understanding the hardware dimension is essential for anyone trying to assess the infrastructure investment required for world model deployment or the investment opportunity in the enabling hardware ecosystem.

The LLM versus World Model Compute Profile

Large language models are primarily memory-bandwidth-bound workloads. The dominant operation in LLM inference is the matrix-vector multiplication that implements the attention mechanism, which requires reading large weight matrices from memory and multiplying them by relatively small activation vectors. The bottleneck is not floating-point compute but memory bandwidth, the rate at which data is transferred from memory to the compute units. This is why NVIDIA’s H100 and B200 GPUs, which are optimized for high memory bandwidth, are the dominant hardware for LLM inference.

World models have a more complex compute profile. Generative world models, like Wayve’s GAIA-3 and NVIDIA’s Cosmos, are compute-bound during training (the diffusion process requires many forward and backward passes through the model) and memory-bandwidth-bound during inference (similar to LLMs). But physics simulation workloads, the kind required for training data generation and for validating world model predictions, are often latency-bound: they require solving systems of differential equations that must be computed sequentially, with each step depending on the previous one, making parallelization difficult.

This mixed compute profile means that world model deployments often require a heterogeneous hardware stack: GPU clusters for training the neural network components, CPU clusters for running physics simulations, and specialized accelerators for real-time inference. The integration of these different hardware components into a coherent, efficient deployment stack is a significant engineering challenge.

NVIDIA’s Cosmos and the Physical AI Hardware Stack

NVIDIA has positioned itself as the dominant hardware and platform provider for the world model era, and its Cosmos platform is the clearest expression of this strategy. Cosmos is not just a world model; it is a complete hardware-software stack for physical AI development, combining NVIDIA’s GPU hardware, the CUDA software ecosystem, the Isaac robotics simulation platform, and the Cosmos world foundation model into a single integrated offering.

The Cosmos platform is designed to run on NVIDIA’s Blackwell architecture GPUs, the B200 and GB200, which provide a significant step up in performance for the mixed compute workloads of world model development. The GB200 NVL72 system, which combines 36 Grace CPUs with 72 Blackwell GPUs in a single rack, is designed specifically for the kind of large-scale physics simulation and world model training that Cosmos requires. The system provides 1.4 exaflops of AI performance, enough to train a large world model in days rather than weeks.

NVIDIA’s investment in the Cosmos platform reflects a strategic bet that the world model market will be as large as the LLM market, and that NVIDIA will capture the same dominant position in world model infrastructure that it has in LLM infrastructure. The company’s partnerships with Wayve, World Labs, and other world model startups are designed to ensure that the Cosmos platform becomes the de facto standard for world model development, just as CUDA became the de facto standard for deep learning.

The Memory Architecture Challenge

One of the most significant hardware challenges for world models is the memory architecture required to maintain a persistent, high-fidelity representation of a complex physical environment. A world model for an autonomous vehicle must maintain a detailed representation of the vehicle’s surroundings, including the positions, velocities, and predicted trajectories of all nearby objects, and update this representation in real time as new sensor data arrives. This requires not just high memory bandwidth but also high memory capacity and low-latency random access.

Current GPU memory architectures, HBM3 and HBM3e, provide high bandwidth but limited capacity (up to 192 GB per GPU for the H200). For world models that need to maintain detailed representations of large, complex environments, this capacity is often insufficient. The development of larger, higher-capacity memory technologies, including CXL-attached memory and near-memory computing architectures, is an important hardware enabler for the world model era.

ARYA Labs’ nano model architecture has a significant advantage in this regard. Because each nano model captures a specific, localized physical relationship and has a small number of parameters (typically fewer than 100), the total memory footprint of a composed nano model system is orders of magnitude smaller than a neural network-based world model. A composed nano model system sufficient to model a complex industrial facility requires less than 1 MB of memory, compared to the gigabytes or terabytes required by neural network-based approaches. This memory efficiency is particularly important for edge deployment (deploying world models on the industrial equipment itself, rather than in a cloud data center), which is the deployment model preferred by many enterprise customers in regulated industries.

The Neuromorphic Computing Opportunity

Beyond conventional GPU and CPU architectures, neuromorphic computing, computing architectures inspired by the structure and function of the biological brain, is emerging as a potential hardware platform for world models. Neuromorphic chips, such as Intel’s Loihi 2 and IBM’s NorthPole, implement spiking neural networks, neural networks where neurons communicate through discrete spikes rather than continuous activations, which are far more energy-efficient than conventional neural networks for certain types of workloads.

The energy efficiency advantage of neuromorphic computing is particularly relevant for world models deployed in edge environments (autonomous vehicles, industrial robots, wearable medical devices) where power consumption is a critical constraint. A world model running on a neuromorphic chip achieves the same inference performance as a GPU-based system while consuming a fraction of the power, enabling deployment in environments where GPU-based systems are impractical.

The challenge is that neuromorphic computing is still in its early stages, and the software ecosystem for programming neuromorphic chips is far less mature than the CUDA ecosystem for GPUs. The translation of world model architectures, which are typically designed for conventional GPU hardware, to neuromorphic hardware is a significant research and engineering challenge. But the potential energy-efficiency gains are large enough to justify the investment, particularly for autonomous-vehicle and robotics applications where world models are most valuable.

Deep Dive: The Talent Landscape and Who Builds World Models

The world model revolution is being built by a relatively small community of researchers and engineers with a rare combination of skills: deep expertise in machine learning, physics, mathematics, and software engineering. Understanding the talent landscape, who has the skills required to build world models, where they are concentrated, and how organizations attract and retain them, is essential for anyone trying to build or invest in world model capabilities.

The Interdisciplinary Nature of World Model Expertise

Building world models requires expertise that spans multiple disciplines in ways that are unusual even in the AI field. A world model researcher needs to understand machine learning and deep learning: the mathematical foundations of neural networks, optimization algorithms, and training techniques; the practical skills of implementing and debugging large-scale ML systems; and the research literature on world model architectures, including Dreamer, JEPA, and diffusion models.

They also need to understand physics and mathematics: the governing equations of the physical domain being modeled (fluid dynamics, structural mechanics, molecular dynamics), the numerical methods used to solve those equations (finite element analysis, finite difference methods, molecular dynamics simulation), and the mathematical tools for analyzing dynamical systems (differential equations, control theory, stability analysis).

Software engineering expertise is equally critical: the systems programming skills required to implement high-performance simulation and training pipelines; the distributed computing skills required to scale training across large GPU clusters; and the software architecture skills required to design robust, maintainable world model systems.

Finally, domain expertise is essential: deep knowledge of the specific physical domain being modeled, the practical constraints, failure modes, and edge cases not captured in textbook physics but essential for building world models that work in the real world.

This combination of skills is extremely rare. Most machine learning researchers have limited physics expertise; most physicists and engineers have limited machine learning expertise; and most software engineers have limited expertise in either. The researchers who combine all four and who work effectively across disciplinary boundaries are among the most sought-after professionals in the technology industry.

The Geographic Concentration of Talent

World model talent is concentrated in a small number of geographic clusters, reflecting the field's historical development and the locations of the major research institutions and technology companies that have driven its progress.

The San Francisco Bay Area is the global center of world model research and development, home to the research labs of Google DeepMind, Meta AI, and NVIDIA, as well as numerous world model startups, including World Labs and Wayve’s US operations. The concentration of AI talent, venture capital, and technology infrastructure in the Bay Area creates a powerful ecosystem for world model development.

London is the second most important center of world model research, home to Google DeepMind’s main research lab, Wayve’s headquarters, Isomorphic Labs, and PhysicsX. The UK’s strong tradition in mathematics and physics, combined with its world-class universities (Oxford, Cambridge, Imperial College London, UCL) and its deep connections to the European research community, make it a natural hub for world model research.

Paris is emerging as a major center of world model research, driven by Meta AI’s European research lab, the INRIA research institute, and the growing ecosystem of AI startups that has developed around the French government’s AI strategy. AMI Labs’ decision to base its operations in Paris is both a reflection of and a contribution to this emerging ecosystem.

Toronto is a significant center of world model research, home to the Vector Institute (founded by Geoffrey Hinton), the University of Toronto (where Danijar Hafner developed the Dreamer architecture), and a growing ecosystem of AI startups, including Waabi. The city’s strong connections to the academic research community and its proximity to the US technology ecosystem make it an attractive location for world model companies.

New York is an emerging center of world model research, driven by NYU (where Yann LeCun developed the JEPA framework before founding AMI Labs), Columbia University, and a growing ecosystem of AI startups. ARYA Labs is headquartered in New York, reflecting the city’s growing importance as a center of enterprise AI development.

The Talent Acquisition Challenge

The scarcity of world model talent creates significant challenges for organizations trying to build world model capabilities. The competition for top talent is intense, with frontier labs, hyperscalers, and well-funded startups all competing for the same small pool of researchers and engineers. Compensation packages for senior world model researchers routinely exceed $1 million per year, and the most sought-after researchers command multiple competing offers.

Organizations that cannot compete on compensation must find other ways to attract world model talent. The most effective strategies include: offering research autonomy and the opportunity to work on fundamental problems (which is why frontier labs like AMI Labs and World Labs attract top researchers despite not having the compensation budgets of the hyperscalers); providing access to unique data and physical systems (which is why industrial companies with proprietary physical data attract researchers who want to work on real-world applications); and offering equity in early-stage companies with the potential for large returns (which is why well-funded startups compete with established companies for top talent).

The talent acquisition challenge also has a geographic dimension. Many of the most talented world model researchers are concentrated in a small number of cities, and organizations outside those cities face significant disadvantages in talent acquisition. Remote work has partially addressed this challenge: it is now possible to build a world model team distributed across multiple cities, but the most productive research still tends to happen in person, in environments where researchers collaborate closely and share ideas informally.

Deep Dive: The Venture Capital Perspective on Evaluating World Model Investments

For venture capitalists evaluating world model investments, the standard LLM-era evaluation frameworks (team quality, market size, product-market fit, competitive moat) are necessary but not sufficient. World model investments require additional evaluation dimensions that reflect the unique technical, regulatory, and commercial characteristics of the space.

Technical Due Diligence for World Models

Technical due diligence for world model investments must go beyond the standard assessment of model performance on benchmarks. The most important technical questions for world model due diligence are:

What is the physical fidelity of the model? A world model that generates visually realistic outputs but does not accurately capture physical dynamics is not useful for enterprise applications. Due diligence should include a rigorous assessment of the model’s physical fidelity and its ability to accurately predict the behavior of physical systems in situations not seen during training. This assessment should be conducted by domain experts who evaluate the physical plausibility of the model’s predictions, not just by ML engineers who evaluate its benchmark performance.

What is the model’s uncertainty quantification capability? Enterprise applications require not just accurate predictions but calibrated uncertainty estimates. Due diligence should assess whether the model provides well-calibrated uncertainty estimates, whether its stated confidence levels accurately reflect its actual accuracy, and whether it detects when it is operating outside its calibration envelope.

What is the model’s computational efficiency? The computational cost of world model inference is a critical factor in determining the feasibility of real-time deployment. Due diligence should assess the model’s inference latency, memory footprint, and energy consumption, and compare these to the requirements of the target application.

What is the model’s data efficiency? The amount of training data required to achieve high accuracy is a key determinant of the model’s applicability to domains where data is scarce or expensive. Due diligence should assess the model’s data efficiency, its ability to learn accurate physical dynamics from small datasets, and compare this to the data availability in the target domain.

What is the model’s safety and compliance posture? For applications in regulated industries, the model’s ability to satisfy regulatory requirements (explainability, auditability, physical constraint satisfaction) is a critical factor in determining its commercial viability. Due diligence should assess the model’s safety architecture and its compliance with relevant regulatory frameworks.

Market Sizing for World Models

Sizing the market for world models is more complex than sizing the market for LLMs, because world models address a much more diverse set of applications with very different economic characteristics. The Goldman Sachs estimate of a multi-trillion-dollar addressable market is a useful starting point, but it requires significant decomposition to be actionable for investment purposes.

The most valuable near-term markets for world models are those where the physical system being modeled is complex and high-stakes (making the cost of errors high), the data required to train the model is available or can be generated through simulation, the regulatory environment creates demand for explainable, auditable AI, and the competitive dynamics reward early movers with data and deployment advantages.

By these criteria, the highest-priority near-term markets are pharmaceutical manufacturing, aerospace and defense, autonomous vehicles, and energy and utilities. These markets are characterized by high stakes, available simulation data, stringent regulatory requirements, and strong early-mover advantages. The total addressable market in these four sectors alone exceeds $500 billion annually in potential value creation from world model deployment.

The longer-term market opportunity includes the broader enterprise market for simulation-assisted decision-making, the consumer market for interactive world model applications (gaming, virtual reality, creative tools), and the scientific research market for AI-accelerated simulation. These markets are real but more distant, and investors should be cautious about business plans that depend on capturing a significant share of the broader market in the near term.

The Competitive Moat Question

The most important question for world model investors is the nature of the competitive moat: what prevents a well-funded competitor from replicating the company’s world model capabilities and competing away its margins?

For generative world model companies (World Labs, Wayve, NVIDIA Cosmos), the primary moat is the combination of proprietary training data and architectural expertise. The training data moat is real but fragile: as more synthetic data generation tools become available, the cost of generating training data declines, reducing the advantage of companies with large proprietary datasets. The architectural expertise moat is more durable: the researchers who developed the frontier architectures are rare, and their expertise is difficult to replicate quickly.

For predictive world model companies (AMI Labs, Meta V-JEPA 2), the primary moat is the combination of the JEPA architectural framework and the open-source ecosystem that AMI Labs is building around it. The open-source strategy is a double-edged sword: it accelerates adoption and builds a community, but it also makes it easier for competitors to build on the same foundation. AMI Labs’ moat will ultimately depend on its ability to stay ahead of the open-source community through continuous research and innovation.

For deterministic world model companies (ARYA Labs), the primary moat is the combination of a physics-constrained architecture, regulatory compliance infrastructure, and domain-specific physical expertise required to design nano model architectures for each application domain. This moat is the most durable of the three, because it is the most difficult to replicate: building a deterministic world model that satisfies the regulatory requirements of the pharmaceutical, aerospace, and defense industries requires years of domain expertise development and regulatory engagement that cannot be shortcut by capital.

Deep Dive: The Startup Ecosystem Map

Beyond the headline companies, the world model ecosystem includes a rich landscape of startups addressing specific components of the world model stack. This section provides a comprehensive map of the ecosystem, organized by layer.

Foundation Model Layer

World Labs (San Francisco, founded 2023 by Fei-Fei Li, Justin Johnson, Christoph Lassner, and Ben Mildenhall) is building spatial intelligence world models, AI systems that understand and generate 3D spatial environments. The company’s Marble product, launched in 2025, enables architects, designers, and game developers to generate interactive 3D environments from text descriptions and images. The company raised $1 billion in early 2026, with participation from Autodesk, AMD, NVIDIA, Andreessen Horowitz, Emerson Collective, Fidelity, and Sea. World Labs is targeting the creative tools market as its primary near-term revenue source, with longer-term ambitions in robotics and autonomous systems.

AMI Labs (Paris, founded 2025 by Yann LeCun) is building JEPA-based world models for physical and social reasoning. The company raised $1.03 billion in Europe’s largest-ever seed round in March 2026, with backing from French industrial and family offices. AMI Labs is committed to open-sourcing its code and publishing its research, positioning itself as the open-source alternative to the closed-source frontier labs. The company is targeting pharmaceutical development, climate modeling, and social simulation as its primary application domains.

ARYA Labs (Connecticut, founded 2025 by Dr. Seth Dobrin and Dr. Lukasz Chmiel) is building Constrained Deterministic AI (CDAI) world models for safety-critical physical systems. The company emerged from stealth in 2026 with a published arXiv paper (arXiv:2603.21340), $300,000 in ARR, a $60 million qualified sales pipeline, and seven active industry domain nodes. ARYA Labs is targeting pharmaceutical manufacturing, aerospace, defense, medical devices, energy, and industrial operations as its primary markets, with a focus on the highest-value, most regulated enterprise segments.

Autonomous Vehicle Layer

Wayve (London, founded 2017 by Alex Kendall and Amar Shah) is building end-to-end autonomous driving AI powered by the GAIA-3 generative world model. The company raised a $1.2 billion in Series D funding at an $8.6 billion valuation in 2026, with participation from Microsoft, NVIDIA, Uber, and three major automakers (Mercedes-Benz, Nissan, Stellantis), and an additional $300 million from Uber contingent on robotaxi deployment milestones, bringing the total to $1.5 billion. Wayve’s GAIA-3 model generates safety-critical driving scenarios that would be dangerous or impossible to collect in the real world, enabling systematic evaluation of the autonomous driving system against thousands of edge cases.

Waabi (Toronto, founded 2021 by Raquel Urtasun) is building autonomous trucking AI powered by the Waabi World generative world model. The company has deployed autonomous trucks on commercial routes in Texas and is using its world model to accelerate the development and validation of its autonomous driving system. Waabi’s focus on long-haul trucking, a more constrained and predictable environment than urban driving, allows it to deploy commercially with a less mature world model than would be required for urban autonomous vehicles.

Waymo (Mountain View, subsidiary of Alphabet) has developed a proprietary world model for autonomous driving simulation, described in a February 2026 blog post as a “frontier generative model” that simulates the behavior of other drivers, pedestrians, and cyclists in response to the autonomous vehicle’s actions. Waymo’s world model is used internally for training and evaluation, and the company has not announced plans to commercialize it as a standalone product.

Scientific Simulation Layer

PhysicsX (London, founded 2021) is building AI-accelerated physics simulation tools for aerospace, automotive, and energy applications. The company’s technology enables engineers to run physics simulations 1,000 to 10,000 times faster than traditional methods, thereby generating large, diverse, physics-accurate synthetic datasets for world model training. PhysicsX raised a $155 million Series B extension in November 2025, with participation from NVIDIA NVentures, positioning the company as a key infrastructure provider for the world model ecosystem.

Isomorphic Labs (London, founded in 2021 as a DeepMind spinout) is applying world-model principles to drug discovery, building AI systems that simulate the behavior of biological molecules at atomic resolution. The company’s AlphaFold 3 system, which predicts the structure of protein-ligand complexes, is a world model for molecular biology. Isomorphic Labs has partnerships with major pharmaceutical companies, including Eli Lilly and Novartis.

Recursion Pharmaceuticals (Salt Lake City, founded 2013) is building a biological world model, a system that predicts the effects of drug candidates on biological systems, by combining high-throughput cellular imaging with deep learning. The company has generated one of the largest proprietary biological datasets in the world, with over 50 petabytes of cellular imaging data, and is using this data to train world models that predict drug efficacy and toxicity.

Industrial and Enterprise Layer

Sight Machine (San Francisco, founded 2012) is building world models for manufacturing operations, enabling manufacturers to simulate the effects of process changes before implementing them. The company’s platform integrates with existing manufacturing equipment and data systems, providing a real-time digital twin of the manufacturing process for optimization, predictive maintenance, and quality control.

Aspen Technology (Bedford, Massachusetts, founded 1981) is a long-established provider of process simulation software for the chemical, energy, and pharmaceutical industries. The company is incorporating AI-based world-model capabilities into its existing simulation platform, enabling faster, more accurate simulation of complex chemical processes. AspenTech’s large installed base of enterprise customers provides a significant distribution advantage for its world model capabilities.

Cognite (Oslo, founded in 2016) is building industrial AI platforms that incorporate world-model capabilities for the oil and gas, energy, and manufacturing industries. The company’s Cognite Data Fusion platform integrates data from industrial equipment, sensors, and operational systems, providing a unified data foundation for world model development and deployment.

Creative and Consumer Layer

Decart (Tel Aviv and San Francisco, founded 2022) is building real-time interactive world models for gaming and simulation. The company’s Oasis model, released in late 2024, was the first demonstration of a real-time interactive world model, a model that generates photorealistic, interactive environments at 20 frames per second. Decart is targeting gaming, simulation, and training data generation as its primary markets.

Worlds.io (San Francisco, founded 2023) is building world models for social and economic simulation. The company’s platform enables simulation of complex social dynamics, market behaviors, and organizational processes, with applications in policy simulation, economic forecasting, and organizational design.

Runway (New York, founded 2018) is building generative world models for video production and creative tools. The company’s Gen-3 Alpha model generates high-quality video from text descriptions, and it is developing world-model capabilities to enable more consistent and controllable video generation. Runway has positioned itself as the creative AI company, targeting filmmakers, advertisers, and content creators as its primary customers.

Deep Dive: The Next 24 Months and What to Watch

For executives, investors, and technologists tracking the world model space, the next 24 months will be defined by key milestones and inflection points. Understanding what to watch and what the milestones will signal about the field's trajectory is essential for staying ahead of the curve.

Technical Milestones to Watch

The first commercial deployment of a generative world model in a regulated industry will be a watershed moment for the field. The first pharmaceutical company to use a world model to support an FDA submission, or the first aerospace manufacturer to use a world model to support an FAA certification, will demonstrate that the technology has crossed the threshold from research to commercial deployment in the most demanding enterprise markets. Watch for announcements from the major pharmaceutical and aerospace companies about their AI governance frameworks and their engagement with regulators on AI-assisted processes.

The first demonstration of a world model that achieves human-level performance on a comprehensive physical reasoning benchmark will signal a qualitative leap in world model capabilities. Current world models achieve human-level performance on specific, narrow physical reasoning tasks, but no model has yet demonstrated human-level performance across the full range of physical reasoning required for general-purpose physical AI. The PhysicsMind and WorldSimBench benchmarks will be the key indicators to watch.

The first open-source world model to match the performance of frontier closed-source models will dramatically accelerate the democratization of world model technology. The LLM field saw this transition with the release of Meta’s LLaMA models, which enabled a large ecosystem of open-source development and applications. A similar transition in the world model field, likely driven by AMI Labs’ open-source strategy or NVIDIA’s Cosmos platform, will make the technology accessible to a much broader community of developers and researchers.

Business Milestones to Watch

The first world model company to achieve $100 million in ARR will demonstrate that the technology generates commercial revenue at scale. Current world model companies are in the early stages of commercial deployment, with ARR measured in the millions. The first company to reach $100 million in ARR, likely in the autonomous vehicle, pharmaceutical, or aerospace domains, will validate the commercial model and attract significant follow-on investment.

The first major enterprise deployment of a world model in a Fortune 500 company will signal that the technology has crossed the chasm from early adopters to the mainstream enterprise market. Watch for announcements from the major pharmaceutical companies (Pfizer, Johnson & Johnson, Roche), aerospace manufacturers (Boeing, Airbus, Lockheed Martin), and energy companies (ExxonMobil, Shell, BP) about their world model deployments.

The first major acquisition in the world model space will signal that the established technology companies (Microsoft, Google, Amazon, Salesforce) are moving from investment to acquisition. The most likely acquisition targets are the domain application companies that have achieved significant commercial traction in specific verticals, rather than the frontier labs (which are too expensive and too research-focused for most acquirers).

Regulatory Milestones to Watch

The EU AI Act’s first enforcement actions against high-risk AI systems will clarify the regulatory requirements for world model deployment in Europe and will create significant demand for compliant, explainable AI systems. The first enforcement actions are expected in 2026, and the companies that are best positioned for compliance, those with deterministic, physics-constrained architectures and comprehensive documentation, will benefit from the regulatory clarity.

The FDA’s first approval of a drug developed with AI assistance will be a landmark event for the pharmaceutical world model market. The FDA has been developing guidance on the use of AI in drug development, and the first approval of a drug that used AI-assisted simulation in its development will validate the regulatory pathway and accelerate adoption.

The DoD’s first major procurement of a world model-based decision support system will signal that the defense market is ready for world model deployment. The DoD has been investing heavily in AI for defense applications, and a major procurement, likely in the logistics, maintenance, or mission planning domains, will create a significant revenue opportunity for world model companies with defense-grade compliance capabilities.

Deep Dive: The Social World Model and Simulating Human Behavior at Scale

The most provocative and least discussed category of world models is the social world model: AI systems that simulate the behavior of human beings, organizations, and societies. While physical world models simulate the dynamics of matter and energy, social world models simulate the dynamics of human decision-making, social interaction, and collective behavior. The potential applications and the potential risks are extraordinary.

The Foundation of Social Simulation

Social simulation has a long history in the social sciences, dating back to the agent-based models of the 1990s. These early models represented individuals as simple rule-following agents and simulated their interactions to study emergent social phenomena: the formation of social norms, the spread of information, and market dynamics. The models were intellectually illuminating but practically limited: they were too simple to capture the complexity of real human behavior, and they lacked the data required to calibrate them to real-world social systems.

Modern social world models are qualitatively different from their predecessors. They are built on large language models that have learned rich representations of human language, behavior, and social norms from the vast corpus of human-generated text on the internet. They simulate the behavior of individual agents with a level of psychological realism that was impossible with earlier approaches. And they are calibrated to real-world social data (social media interactions, economic transactions, mobility patterns) to produce simulations that are grounded in empirical reality.

The Goldman Sachs report identifies social world models, what it calls “virtual world models,” as one of the three primary categories of world model applications, alongside physical world models and scientific world models. The report notes that social world models have applications in policy simulation (simulating the effects of policy interventions before implementation), economic forecasting (simulating market and agent behavior), and organizational design (simulating organizational behavior under different structural configurations).

The Policy Simulation Application

The most immediately valuable application of social world models is policy simulation: using AI to simulate the effects of policy interventions before implementing them. This is the computational equivalent of the randomized controlled trial, the gold standard of policy evaluation, but without the ethical and practical constraints of running real-world experiments on human populations.

A social world model for policy simulation represents the relevant population of individuals as agents, each with their own demographic characteristics, economic circumstances, behavioral tendencies, and social connections. The model is calibrated to real-world data (census data, economic surveys, social network data) to ensure that the simulated population accurately reflects the real population. Policy interventions are implemented as changes to the model’s parameters (e.g., a tax change, a regulatory requirement, or a public health intervention), and the model simulates the consequences of the intervention over time, tracking outcomes such as economic welfare, health, and social cohesion.

The potential value of policy simulation is enormous. Governments spend trillions of dollars on policy interventions every year, and the evidence base for most of those interventions is weak. A policy simulation system that accurately predicts the effects of different policy options, and that identifies unintended consequences before they occur, dramatically improves the quality of policy decisions and reduces the waste of public resources.

The challenges are equally enormous. Human behavior is far more complex and unpredictable than physical dynamics, and the data required to calibrate a social world model is far more sensitive and difficult to obtain than the data required to calibrate a physical world model. The ethical implications of simulating human behavior at scale (the potential for misuse, the risk of reinforcing biases, the question of consent) are profound and have not yet been adequately addressed.

The Economic Forecasting Application

Economic forecasting is one of the oldest and most important applications of social simulation. Traditional economic models (DSGE models, VAR models, agent-based economic models) have been used for decades to forecast economic outcomes and to simulate the effects of monetary and fiscal policy. But these models are limited by their simplifying assumptions about human behavior (rational expectations, representative agents) and by their inability to capture the complex network effects and feedback loops that characterize real economic systems.

Modern social world models offer the potential to overcome these limitations. By representing individual economic agents with realistic behavioral models (capturing bounded rationality, social influence, and heterogeneous preferences), social world models simulate the emergence of macroeconomic phenomena from the interactions of millions of agents. This bottom-up approach to economic modeling is more realistic than the top-down approach of traditional macroeconomic models, and it has the potential to produce more accurate forecasts and more reliable policy simulations.

The most ambitious application of social world models in economics is the simulation of financial markets. Financial markets are complex adaptive systems, in which the behavior of individual agents (traders, investors, market makers) creates emergent phenomena (price dynamics, market crashes, liquidity crises) that cannot be predicted from their behavior alone. A social world model that accurately captures the behavior of market participants (their beliefs, risk preferences, social connections, and information processing) could predict market dynamics with a level of accuracy that is impossible with traditional econometric models.

The Organizational Design Application

Organizations are complex social systems, and the design of organizational structures (the allocation of decision-making authority, the design of incentive systems, the configuration of communication networks) has profound effects on organizational performance. Traditional approaches to organizational design rely on case studies, surveys, and the intuition of experienced managers. Social world models can simulate the effects of different organizational designs before implementation, enabling evidence-based organizational design.

A social world model for organizational design represents the organization's members as agents, each with their own skills, motivations, and social connections. The model simulates the organization's behavior under different structural configurations (reporting relationships, incentive systems, communication protocols) and tracks outcomes such as productivity, innovation, employee satisfaction, and organizational resilience. The model is used to evaluate the effects of specific organizational changes (such as a reorganization, a new incentive system, or a change in leadership) before implementing them.

The applications of organizational simulation extend beyond individual companies to entire industries and economies. A social world model for the healthcare industry, for example, simulates the effects of different healthcare delivery models (different payment systems, care coordination protocols, and staffing configurations) on patient outcomes, costs, and provider satisfaction. A social world model of the education system simulates the effects of different educational policies (e.g., different curriculum standards, teacher training programs, and school funding formulas) on student achievement and educational equity.

Deep Dive: The Energy Sector and World Models for the Energy Transition

The energy sector is one of the most important and most challenging application domains for world models. The energy transition, the shift from fossil fuels to renewable energy, is one of the most complex engineering and economic challenges in human history, and world models have the potential to play a critical role in accelerating and optimizing it.

The Grid Optimization Challenge

The modern electrical grid is one of the most complex physical systems ever built. It consists of thousands of generators, millions of transmission and distribution lines, and billions of end-use devices, all of which must be continuously balanced to maintain the precise frequency and voltage levels required for reliable operation. The integration of large amounts of variable renewable energy (solar and wind) into the grid is dramatically increasing the complexity of this balancing act, because the output of solar and wind generators depends on weather conditions that are inherently uncertain and difficult to predict.

World models for grid optimization must capture the dynamics of the entire grid system: the physical dynamics of generators, transmission lines, and loads; the economic dynamics of electricity markets; and the behavioral dynamics of market participants (generators, utilities, consumers). The model must predict the state of the grid over time horizons ranging from seconds (for real-time control) to hours (for day-ahead planning) to years (for long-term investment planning), and it must simulate the effects of different operational and investment decisions on grid reliability, cost, and environmental impact.

ARYA Labs’ CDAI architecture is particularly well-suited to grid optimization. The electrical grid is a physical system governed by well-understood physical laws (Kirchhoff’s laws, the swing equation, the power flow equations) that are encoded directly into nano models. The composability of nano models allows the system to model the grid at multiple levels of abstraction, from individual generators and transmission lines to regional grid segments to the entire national grid. The real-time update capability of the nano model architecture allows the system to continuously incorporate new sensor data from the grid, maintaining an accurate representation of the grid’s current state.

The Renewable Energy Forecasting Challenge

Accurate forecasting of renewable energy output is one of the most important enablers of the energy transition. Grid operators need accurate forecasts of solar and wind output to plan their generation dispatch, to schedule maintenance, and to manage the integration of variable renewable energy into the grid. Current forecasting methods, based on numerical weather prediction models and statistical post-processing, achieve a mean absolute error of 5 to 15 percent for day-ahead forecasts, which is insufficient for the high levels of renewable energy penetration required to meet climate targets.

World models for renewable energy forecasting combine physics-based models of atmospheric dynamics, the same models used in numerical weather prediction, with machine learning models trained on historical weather and energy data. The physics-based component provides the large-scale atmospheric dynamics that drive renewable energy output; the machine learning component captures the local, site-specific factors (terrain, vegetation, turbulence) that the physics-based model cannot resolve. The combination achieves a 2 to 5 percent mean absolute error for day-ahead forecasts, a two- to three-fold improvement over current methods, saving grid operators billions of dollars annually in balancing costs.

The Energy Storage Optimization Challenge

Energy storage (batteries, pumped hydro, compressed air, thermal storage) is a critical enabler of the energy transition, providing the flexibility required to balance variable renewable energy with demand. But optimizing the operation of energy storage systems is a complex, multi-objective problem: the storage system must simultaneously maximize revenue (by charging when energy is cheap and discharging when energy is expensive), minimize degradation (by avoiding charging and discharging patterns that accelerate battery aging), and provide grid services (frequency regulation, voltage support, capacity reserves) that are valued by grid operators.

World models for energy storage optimization must capture the physical dynamics of the storage system (the electrochemical dynamics of batteries, the hydraulic dynamics of pumped hydro), the economic dynamics of electricity markets (spot prices, ancillary service prices, capacity prices), and the regulatory dynamics of grid services (the rules governing the provision of frequency regulation, voltage support, and capacity reserves). The model must predict the future state of the energy market and the grid over time horizons ranging from minutes to years, and optimize the storage system’s operation to maximize value over those horizons.

Deep Dive: The Healthcare Application and World Models for Precision Medicine

Healthcare is one of the most promising and most challenging application domains for world models. The potential to simulate the behavior of biological systems, from individual cells to entire organisms, and to use those simulations to personalize medical treatment is one of the most exciting frontiers in medicine. But the complexity of biological systems, the sensitivity of health data, and the stringent regulatory requirements for medical devices pose significant challenges to deploying world models in healthcare.

The Personalized Medicine Vision

The vision of personalized medicine, tailoring medical treatment to each patient's individual characteristics, has been a goal of medicine for decades. But realizing this vision requires the ability to predict how a specific patient will respond to a specific treatment, given their unique genetic makeup, medical history, lifestyle, and environmental exposures. This is precisely the kind of prediction that world models are designed to make.

A world model for personalized medicine represents the patient’s biological system at multiple levels of abstraction: the molecular level (gene expression, protein interactions, metabolic pathways), the cellular level (cell signaling, cell division, cell death), the tissue level (organ function, immune response, inflammation), and the whole-body level (physiological parameters, disease progression, treatment response). The model is calibrated to the patient’s individual data (genomic, proteomic, metabolomic, and clinical) and used to simulate the patient’s response to different treatment options.

The potential value of personalized medicine world models is enormous. Cancer treatment, for example, currently relies on population-level evidence from clinical trials, which tells us what treatment works best on average for patients with a specific cancer type. But cancer is a highly heterogeneous disease, and the treatment that works best on average is not the best treatment for a specific patient. A world model that predicts a specific patient’s tumor response to different treatment options, based on the tumor’s genomic profile, the patient’s immune system characteristics, and the patient’s metabolic profile, dramatically improves treatment outcomes.

The Drug Development Application

The drug development process is one of the most expensive and time-consuming processes in any industry. Developing a new drug from initial discovery to regulatory approval takes an average of 12 to 15 years and costs $2 to $3 billion. The high cost and long timeline are driven primarily by the high failure rate: approximately 90 percent of drug candidates that enter clinical trials fail to achieve regulatory approval, most often because of insufficient efficacy or unexpected toxicity.

World models for drug development reduce the failure rate by enabling more accurate prediction of drug efficacy and toxicity before clinical trials. A molecular world model that accurately simulates the interaction between a drug candidate and its biological target (binding affinity, selectivity, downstream signaling effects) identifies promising drug candidates and eliminates unpromising ones much earlier in the development process, reducing the cost and time of drug development.

Isomorphic Labs’ AlphaFold 3 system is the most prominent example of a world model for drug development. The system predicts the structure of protein-ligand complexes (the 3D arrangement of a drug molecule bound to its protein target) with atomic-level accuracy. This structural prediction capability enables drug developers to design drug candidates that bind more tightly and selectively to their targets, reducing the risk of off-target effects and improving the probability of clinical success.

ARYA Labs’ nano model architecture is also relevant to drug development, particularly for simulating pharmaceutical manufacturing processes. The synthesis of a drug candidate is a complex chemical process that must be carefully controlled to ensure the quality and purity of the final product. A world model for pharmaceutical manufacturing simulates the effects of process changes (temperature, pressure, solvent composition, and reaction time) on the quality and yield of the drug product, enabling process optimization without expensive, time-consuming physical experiments.

The Medical Device Application

Medical devices, from pacemakers to insulin pumps to surgical robots to remote patient monitoring platforms, are subject to some of the most stringent regulatory requirements of any product category. The FDA requires extensive pre-market testing and validation to ensure that medical devices are safe and effective, and the post-market surveillance requirements ensure that any safety issues that emerge after approval are identified and addressed quickly.

World models for medical devices accelerate the regulatory approval process by enabling more comprehensive pre-market testing and validation. A world model for a cardiac pacemaker, for example, simulates the behavior of the pacemaker in a wide range of patient physiologies and clinical scenarios, including rare edge cases that would be difficult or impossible to test in a clinical trial, providing a more comprehensive assessment of the device’s safety and efficacy than physical testing alone.

NUVO, the public benefit corporation behind the FDA-cleared INVU maternal-fetal remote monitoring platform, is a prime example of where world models meet the medical device frontier. Remote monitoring devices generate continuous physiological signal streams from patients outside the clinical setting, and the value of those signals depends entirely on whether the AI interpreting them can be trusted at the level the FDA requires. A probabilistic model that produces false reassurance or a missed warning in an at-home prenatal monitoring context has consequences that no statistical confidence interval can absorb. The medical device segment is where the architectural choice between probabilistic and deterministic AI is not a research debate. It is a regulatory and clinical reality.

ARYA Labs’ CDAI architecture is particularly well-suited to medical device applications because its deterministic, physics-constrained design provides the mathematical safety guarantees the FDA requires for safety-critical medical devices. The Unfireable Safety Kernel ensures that the device’s behavior remains within the safe operating envelope defined by physical constraints, providing a level of safety assurance that probabilistic AI systems cannot match. The expansion of remote patient monitoring across maternal-fetal care, cardiology, oncology, and chronic disease management is creating a generation of medical devices for which mathematical certainty is not optional.

Deep Dive: The Aerospace Application and World Models for the Next Generation of Flight

Aerospace is one of the most demanding and most valuable application domains for world models. The design, testing, and certification of aircraft, spacecraft, and their components requires extensive simulation of aerodynamics, structural mechanics, thermodynamics, propulsion, and avionics, and the accuracy of those simulations directly determines the safety and performance of the final product.

The Aircraft Design Challenge

The design of a modern commercial aircraft involves thousands of engineers working on hundreds of interacting subsystems, each of which must be designed to meet stringent performance, safety, and regulatory requirements. The interactions between subsystems (the aerodynamic effects of the wing on the fuselage, the structural effects of engine vibration on the airframe, the thermal effects of the engines on the fuel system) are complex and difficult to predict analytically. Physical testing of the complete aircraft is extremely expensive and time-consuming, and it can only be done late in the design process, when changes are most costly.

World models for aircraft design dramatically reduce the cost and time of the design process by enabling more comprehensive simulation of the aircraft’s behavior earlier in the design process. A world model that accurately captures the interactions between all subsystems (aerodynamics, structures, thermodynamics, propulsion, avionics) identifies design problems early, when they are cheapest to fix, and evaluates the effects of design changes without the need for expensive physical testing.

ARYA Labs’ nano model architecture is particularly well-suited to aircraft design, because the composability of nano models allows the system to model the aircraft at multiple levels of abstraction, from individual components (bolts, fasteners, seals) to subsystems (landing gear, hydraulics, avionics) to the complete aircraft, and to simulate the interactions between subsystems with physical fidelity. The real-time update capability of the nano model architecture allows the system to continuously incorporate new test data as it becomes available, maintaining an accurate representation of the aircraft’s behavior throughout the design process.

The Space Mission Application

Space missions present some of the most extreme and demanding world modeling challenges. Spacecraft operate in environments (the vacuum of space, extreme temperatures, high radiation) that are difficult or impossible to replicate on Earth, and the consequences of failures are catastrophic and irreversible. The design, testing, and operation of spacecraft requires extensive simulation of the spacecraft’s behavior in these extreme environments, and the accuracy of those simulations directly determines the success of the mission.

Deterministic, physics-constrained world models are particularly well-suited to spacecraft systems simulation: thermal control, power systems, attitude control, propulsion, and communications all operate under physical laws that are exact, observable, and unforgiving. A world model built on first-principles physics, rather than statistical pattern recognition on flight data that is necessarily sparse, replaces high-fidelity finite-element and computational-fluid-dynamics simulation engines for many routine analysis tasks, dramatically reducing the cost and time of spacecraft design and operations.

The commercial implications extend beyond government space programs. Commercial space (SpaceX, Rocket Lab, Relativity Space, Axiom, Sierra Space, Blue Origin) is the fastest-growing segment of the aerospace market, and the cost pressure on new entrants is intense. A simulation infrastructure that runs orders of magnitude faster than legacy high-fidelity engines, while preserving the physical correctness those engines provide, is a competitive enabler for an entire generation of commercial space operators.

Deep Dive: The Financial Services Application and World Models for Risk Management

Financial services is one of the most data-rich and most regulated industries in the world, and it presents a distinctive set of opportunities and challenges for world model deployment. The potential to simulate the behavior of financial markets, credit portfolios, and insurance risks with greater accuracy and transparency than current models is enormous, but the regulatory requirements for model explainability and auditability are among the most stringent of any industry.

The Market Risk Application

Market risk, the risk of losses from changes in the prices of financial assets, is one of the most important risks that financial institutions manage. Current market risk models (Value at Risk, Expected Shortfall, stress testing) are based on statistical models of asset price dynamics that have well-known limitations: they assume that asset returns are normally distributed (which they are not), they do not capture the fat tails and extreme events that drive the most severe losses, and they do not capture the contagion effects that amplify losses during market crises.

World models for market risk overcome these limitations by simulating the behavior of financial markets at the level of individual market participants (traders, investors, market makers) and capturing the emergent dynamics that arise from their interactions. A world model that accurately captures the behavior of market participants (their beliefs, their risk preferences, their social connections, their information processing) simulates the fat tails and extreme events that drive the most severe losses, and captures the contagion effects that amplify losses during market crises.

The regulatory requirements for market risk models are stringent: the Basel III framework requires banks to use approved internal models for market risk measurement, and those models must be validated against historical data and stress-tested against extreme scenarios. World models for market risk must satisfy these regulatory requirements, which means they must be explainable, auditable, and subject to ongoing performance monitoring.

The Credit Risk Application

Credit risk, the risk of losses from borrower defaults, is the largest source of risk for most financial institutions. Current credit risk models (logistic regression, gradient boosting, neural networks) are trained on historical default data and use borrower characteristics (income, debt, credit history) to predict the probability of default. These models are reasonably accurate for predicting defaults in normal economic conditions, but they fail during economic downturns, when the default rate spikes and the statistical relationships between borrower characteristics and default probability change dramatically.

World models for credit risk improve on current models by simulating borrowers' behavior across different economic scenarios (recessions, financial crises, sector-specific downturns) and predicting how the default rate changes under those scenarios. A world model that accurately captures the economic dynamics that drive borrower defaults (the relationship between unemployment, income, and default probability; the contagion effects that spread defaults across borrower networks; the feedback loops between defaults and economic conditions) provides more accurate and more robust credit risk predictions than current statistical models.

The regulatory requirements for credit risk models are also stringent: the Basel III framework requires banks to use approved internal models for credit risk measurement, and those models must be validated against historical data and stress-tested against extreme scenarios. The EU AI Act’s requirements for explainability and auditability add additional regulatory complexity for AI-based credit risk models. World models that satisfy these requirements, through deterministic, physics-constrained architectures or through comprehensive explainability frameworks, will have a significant competitive advantage in the financial services market.

The Long View: What the World Model Era Means for Human-AI Collaboration

The world model revolution is ultimately not just a technology story; it is a story about the changing relationship between human intelligence and artificial intelligence. As world models become more capable, the nature of human-AI collaboration will shift in ways that have profound implications for how organizations are structured, how decisions are made, and what skills are most valuable.

The current paradigm of human-AI collaboration is primarily augmentative: AI systems augment human capabilities by automating routine tasks, processing large amounts of information, and generating options for human review. The human remains the decision-maker; the AI is a tool that makes the human more productive. This paradigm is well-suited to LLMs, which are powerful information processors but lack the ability to simulate consequences or provide mathematical guarantees.

World models shift the paradigm toward simulation-assisted decision-making. Instead of asking an AI to generate options for human review, the human asks the AI to simulate the consequences of different options, to run the world model forward in time under different scenarios and show the human the range of possible outcomes. The human remains the decision-maker, but the decision is now informed by a simulation of the future, not just a summary of the past.

This shift has profound implications for organizational structure. In the current paradigm, the most valuable human skills are those that are difficult to automate: creative thinking, complex judgment, interpersonal communication, and ethical reasoning. In the world model paradigm, the most valuable human skills will also include the ability to design and interpret simulations: to formulate the right questions for the world model, to evaluate the quality of the simulation, to identify the assumptions and limitations of the model, and to translate the simulation results into actionable decisions.

The organizations that will thrive in the world model era are those that invest now in building these simulation-design and simulation-interpretation capabilities, not just in deploying AI tools, but in developing the human expertise required to use those tools effectively. This is a different kind of AI readiness than the current focus on prompt engineering and LLM deployment; it is a deeper, more technically demanding form of AI literacy that will take years to develop.

The world model era also raises profound questions about accountability and responsibility. When a decision is made based on the output of a world model simulation, who is responsible for the consequences? The human who made the decision? The organization that deployed the world model? The company that built the world model? The researchers who developed the underlying architecture? These questions do not have easy answers, and the regulatory frameworks that will govern world model deployment are still being developed.

What is clear is that the world model era will require a new social contract between humans and AI systems, one grounded not in the probabilistic outputs of statistical models but in the mathematical certainty of physics-constrained simulation. The companies that build for this future, that treat safety as an architectural constraint, explainability as a design requirement, and human oversight as a non-negotiable principle, are the ones that will earn the trust required to deploy world models in the highest-stakes applications.

That is what worlds over words means in capital terms. And it is just getting started.

Let’s Wrap This Up

The world model moment is real, it is funded, and it is coming. But it is not coming tomorrow. That tension between the scale of the opportunity and the length of the timeline is the defining challenge for everyone trying to navigate this space, whether you are a VC allocating capital, an executive planning your AI strategy, or a founder deciding what to build.

The Goldman Sachs report is the clearest signal that the world model thesis has crossed from research into institutional capital allocation. When Goldman’s Global Institute publishes a report arguing that the entire AI infrastructure investment thesis is undersized for what comes next, that is not a research note. It is a market signal. The $3.23 billion that flowed into world model startups in a twelve-week window in early 2026 is the venture capital version of the same signal.

The architectural diversity of the space is both a strength and a complication. Generative world models (Wayve, NVIDIA Cosmos) are the most visually compelling and the closest to commercial deployment in constrained domains like autonomous driving. Predictive embedding models (AMI Labs, Meta V-JEPA 2) are the most intellectually ambitious and the most likely to generalize across domains, but they carry the longest research-to-product timeline. Spatial intelligence models (World Labs) are opening a new frontier in 3D world generation with immediate applications in design and creative tools. Deterministic physics-constrained models (ARYA Labs) are building for the markets where mathematical certainty is not a nice-to-have but a requirement: defense, aerospace, pharma, medical devices, critical infrastructure.

For VCs and LPs, the near-term opportunity is not primarily in the frontier labs. Those rounds are already done at valuations that reflect significant future optionality. The opportunity is in the enabling infrastructure: synthetic data pipelines, world model evaluation frameworks, domain-specific deployment tooling, and the hardware and software stack required to run simulation-intensive workloads at scale. The picks-and-shovels play in the world model gold rush is still largely unbuilt, and the window to build it is open right now.

For senior executives, the most important action is not to deploy a world model. It is to begin building the data infrastructure and organizational capabilities that world model adoption will require. The companies that audit their simulation data assets today, build physics-aware data pipelines this year, and develop internal expertise in world model evaluation over the next 18 months will have a structural advantage when the technology matures. The window to build that advantage is open now. It will not remain open indefinitely.

For founders, the message is both encouraging and sobering. The category is real, the capital is flowing, and the enterprise demand is forming. But the architectural choices are consequential, the timelines are long, and the competition includes some of the best-funded and most credentialed teams in the history of AI. The founders who will win are those who combine deep domain expertise with architectural clarity: who know exactly which physical domain they are modeling, which architectural approach is right for that domain, and which enterprise customers have both the need and the regulatory context to pay for mathematical certainty.

Here is the binary that should focus every conversation about world models in the next 24 months. Either the Goldman Sachs thesis is right, and the $7.6 trillion AI infrastructure forecast for 2026 to 2031 is a floor rather than a ceiling, in which case the entire AI infrastructure stack needs to be repriced upward, and the second-order beneficiaries (specialized memory, interconnects, simulation hardware, energy infrastructure, synthetic data platforms) are systematically undervalued. Or the thesis is wrong, and world models remain a research curiosity that captures a small slice of total AI spend through the decade, in which case the $3.23 billion deployed in twelve weeks is a category formation event that resolves into a niche.

There is no middle scenario. World models either reprice the AI stack or they do not. The companies, investors, and operators who position now for the first scenario, while hedging against the second, are the ones who will navigate this transition with both conviction and discipline.

The opening pages of the next chapter are being written right now. In Paris. In San Francisco. In London. In a public benefit corporation in New York that is quietly building the world’s first deterministic world model with zero neural network parameters and an architecturally immutable safety kernel. Worlds over words. The shift is on. The question is not whether the world model era is coming. The question is whether you are positioned for it.

A Technical Glossary for World Model Practitioners

For readers who want to engage more deeply with the technical literature on world models, this glossary provides concise definitions of the key terms and concepts used in the field.

Autoregressive model: A generative model that predicts the next element in a sequence based on the previous elements. Autoregressive models are used in LLMs (to predict the next token) and in some world models (to predict the next video frame). The key limitation of autoregressive models for world modeling is that errors compound over time: each predicted frame is used as input for the next prediction, so small errors in early frames lead to large errors in later frames.

Causal model: A model that explicitly represents the causal relationships between variables, enabling reasoning about the consequences of interventions. Causal models are distinct from statistical models, which only represent correlations. ARYA Labs’ CDAI architecture is a causal model; most neural network-based world models are statistical models.

Constrained Deterministic AI (CDAI): ARYA Labs’ proprietary world model architecture, which combines physics-constrained nano models with a hierarchical causal graph and an Unfireable Safety Kernel to produce deterministic, physically grounded predictions with mathematical safety guarantees.

Diffusion model: A generative model that learns to generate data by reversing a noise-adding process. Diffusion models are trained to predict the noise added to a data sample, and they generate new samples by starting from pure noise and iteratively removing it. Diffusion models are the foundation of most modern image and video generation systems, including NVIDIA’s Cosmos and Wayve’s GAIA-3.

Digital twin: A virtual representation of a physical system that is continuously updated with real-time data from the physical system. Digital twins are a precursor to world models: they represent the current state of the physical system but typically cannot simulate future states or reason about counterfactuals. World models extend the digital twin concept by adding predictive and causal reasoning capabilities.

Dreamer: A family of model-based reinforcement learning algorithms developed by Danijar Hafner and colleagues, based on a Recurrent State Space Model (RSSM) that learns a compact latent representation of the environment. DreamerV3, the most recent version, masters 150 diverse tasks across 7 benchmarks without task-specific hyperparameter tuning.

Evidence Lower Bound (ELBO): The training objective for variational autoencoders and related models, which provides a lower bound on the log-likelihood of the observed data. The ELBO consists of a reconstruction term (how well the model predicts observations from the latent state) and a KL divergence term (which regularizes the latent state to be close to the prior). Maximizing the ELBO encourages the model to learn a compact, informative latent representation of the data.

Foundation model: A large AI model trained on broad data that is adapted to a wide range of downstream tasks. In the world-model context, foundation models such as NVIDIA’s Cosmos and AMI Labs’ JEPA are designed to provide general-purpose physical reasoning capabilities that can be fine-tuned for specific applications.

Generative world model: A world model that generates explicit predictions of future observations, typically in the form of images or video. Generative world models are useful for applications where the visual appearance of the predicted future is important (e.g., training data generation for autonomous vehicles), but they are computationally expensive and do not always accurately capture physical dynamics.

Graph Neural Network (GNN): A neural network architecture designed to process graph-structured data, where nodes represent entities and edges represent relationships. GNNs are used in world models for physical simulation, where the graph structure captures interactions among objects (e.g., gravitational attraction, contact forces).

JEPA (Joint Embedding Predictive Architecture): A world model architecture proposed by Yann LeCun that predicts future representations in an abstract embedding space, rather than predicting future observations in pixel space. JEPA is more computationally efficient than generative world models and less susceptible to predicting irrelevant details.

Latent space: A compressed, abstract representation of the data learned by a neural network. World models typically operate in a latent space: an encoder compresses the input observation into a latent representation, the transition model predicts the future latent representation, and a decoder (if needed) expands the predicted latent representation back into observation space.

Model Predictive Control (MPC): A control strategy that uses a world model to plan a sequence of actions, executes only the first action, observes the resulting state, and replans from the new state. MPC limits the prediction horizon to a manageable length, reducing the impact of world model errors on control performance.

Nano model: ARYA Labs’ term for a small, physics-constrained model that captures a specific, localized physical relationship. Nano models are the building blocks of ARYA’s CDAI architecture: each nano model captures the dynamics of a single physical component, and the composed system of nano models captures the dynamics of the entire physical system.

Neural ODE (Ordinary Differential Equation): A neural network architecture that models the dynamics of a system as a continuous-time differential equation, rather than as a discrete-time sequence. Neural ODEs are used in world models for physical simulation, where the continuous-time formulation is more natural than the discrete-time formulation.

Physics-Informed Neural Network (PINN): A neural network trained to satisfy the governing equations of a physical system, in addition to fitting the observed data. PINNs incorporate physical laws directly into the training objective, providing a form of physics inductive bias that helps the model generalize to situations not seen during training.

Predictive world model: A world model that predicts abstract features of future states, rather than generating explicit predictions of future observations. JEPA is the most prominent example of a predictive world model. Predictive world models are more computationally efficient than generative world models and are more accurate for physical reasoning tasks.

Recurrent State Space Model (RSSM): The core component of the Dreamer world model architecture, which maintains a hidden state that summarizes the history of observations and actions, and a stochastic state that captures the uncertainty about the current environment state.

Sim-to-real gap: The discrepancy between the behavior of an AI system in a simulated environment and its behavior in the real world. Closing the sim-to-real gap is a fundamental challenge in physical AI deployment, requiring techniques such as domain randomization, domain adaptation, and physics-informed regularization.

Spatial intelligence: The ability to understand and reason about 3D spatial environments, including the positions, shapes, and relationships of objects in space. World Labs is focused on building spatial-intelligence world models with applications in architecture, design, robotics, and autonomous systems.

Unfireable Safety Kernel: ARYA Labs’ term for the architecturally immutable safety component of its CDAI system, which continuously monitors the system state and verifies that it remains within the safe operating envelope defined by the physical constraints. The Unfireable Safety Kernel cannot be disabled or overridden by any downstream component of the system.

V-JEPA 2: Meta AI’s video JEPA model, which applies the JEPA framework to video understanding at scale. V-JEPA 2 is trained on a large corpus of video data using masked video prediction as the training objective, and achieves strong performance on physical reasoning benchmarks without any explicit physics supervision.

World model: An AI system that learns an internal model of the environment’s dynamics, enabling it to predict the consequences of actions, reason about counterfactuals, and plan over long time horizons. World models are the foundational technology for physical AI, AI systems that understand and interact with the physical world.

Disclaimer: This is educational content, not financial advice. Full source links above.

The views and opinions expressed above are current as of the date of this document and are subject to change without notice. The information contained herein is provided for general informational purposes only and does not constitute investment, legal, tax, or other professional advice. Past performance is not indicative of future results. Any forward-looking statements are based on current expectations and assumptions that involve risks and uncertainties that may cause actual results to differ materially. Readers should conduct their own due diligence and consult qualified professionals before making any investment, business, or operational decisions based on the content of this document.

Discussion about this post

Ready for more?