For two centuries, a factory has been a place that converts raw materials into products. Cotton becomes fabric. Iron becomes steel. Oil becomes fuel.
On Monday at Nvidia's annual GTC conference in San Jose, Jensen Huang described a new kind of factory. "Every enterprise needs a token factory," the NVIDIA CEO told 30,000 attendees at the SAP Center. "This is your factory in the future. And the reason I know that is because everybody in this room is powered by intelligence. In the future, that intelligence will be augmented by tokens."
At Alpha , every day we're talking to board directors and c-suite executives helping them navigate the governance gap between traditional corporate governance and AI governance. And every day our message is consistent: Stop adding AI to your business, and start adding your business to AI. This year's GTC was a reminder how quickly businesses need to move, while simultaneously balancing the risks and the opportunities of the AI era.
"Every CEO in the world will study their business" using the economics of token production. Jensen is right. The numbers he shared are mind-boggling: $1 trillion in expected purchase orders through 2027, doubled from $500 billion a year ago. A $300 billion annual revenue opportunity per gigawatt of power consumed. Amazon Web Services deploying more than one million NVIDIA GPUs this year. An independent benchmark from SemiAnalysis crowning NVIDIA the "inference king" at 50 times the performance per watt of competitors.
That word, inference, is the key to understanding everything that follows. Inference is AI doing everyday work: answering your customer's question, scanning your contracts, flagging your supply chain risks, writing your code. If training is how AI learns, inference is how it earns. Every dollar of AI value your company captures flows through inference. And the entire GTC 2026 keynote was built around a single thesis: the world has entered an inference inflection that will reshape how companies spend money, organize work, and compete.
If training is how AI learns, inference is how it earns.
How We Got to the Inference Inflection
Huang walked through three computing shifts that, in just over two years, increased inference demand by 10,000 times.
The first of course, was generative AI and the "ChatGPT moment". When ChatGPT launched on November 30th, 2022, it introduced a fundamentally new kind of computing. Traditional software retrieves information. Generative AI creates it. It The old way of computing was deterministic - the new way is probabilistic. That distinction sounds academic until you realize it changes how every computer in the world needs to be built. Retrieval is cheap. Generation is expensive. Every generated response consumes inference.
The second was reasoning. When OpenAI released o1 in 2024, AI gained the ability to reflect, plan and decompose complex problems into manageable steps, and ground its answers in evidence. Reasoning made AI trustworthy enough for serious business use. It also multiplied inference consumption dramatically: a reasoning model does not just generate one answer. It generates a host of intermediate "reasoning tokens" over minutes or hours as it thinks through a problem, evaluates alternatives, discards weak paths, and synthesizes a final response. A single reasoning query can require over 100 times the compute of a traditional AI response. Every step of that internal deliberation burns inference.
The third was agentic computing. When Anthropic's Claude Code arrived publicly on February 24, 2025, AI crossed from thinking to doing. Claude Code reads files, writes software, compiles it, tests it, evaluates the results, and iterates, all without human intervention between steps. "You don't ask it what, where, when, how," Huang said. "You ask it create, do, build." Agentic AI runs continuously, consumes inference with every action, and generates orders of magnitude more tokens than a chatbot conversation.
Huang combined these shifts with a striking claim: "Computing demand has increased by one million times in the last two years." That number, he acknowledged, is partly a feeling shared across the industry, from OpenAI to Anthropic to every startup in the room. But the directional point is not in dispute. Demand for inference is growing faster than any infrastructure build-out in history can accommodate. Spot GPU pricing is skyrocketing. "You couldn't find a GPU if you tried," Huang said. That is the context for everything NVIDIA announced.
The Vocabulary of the Token Economy
To understand how inference reshapes business, let's make sure we all understand some key terms.
Token. The smallest unit of inference work, roughly three quarters of a word. Every AI interaction, the question you ask and the answer the machine generates, is measured in tokens. Tokens are the output of AI factories the way widgets are the output of manufacturing plants. The economics are simple: during training, tokens represent investment into intelligence (think of it as capital expenditure). During inference, tokens drive cost and revenue (operating expenditure). Huang showed a pricing menu on stage: $0 per million tokens for basic open-source models, $3 for medium-tier, $6 for frontier, $45 for premium, $150 for ultra-tier workloads demanding maximum speed and model intelligence. That pricing menu is the rate card for the inference economy.
AI Factory. A data center redesigned around inference. Traditional data centers store files and run software applications. AI factories produce tokens. Electricity goes in, inference happens, tokens come out, and those tokens become decisions that drive revenue. "Your data center used to be a data center for files," Huang said. "It's now a factory to generate tokens." The operating equation: Power to Tokens to Intelligence to Economic Value. The constraint is no longer chips. It is access to reliable, affordable power. The largest AI factories will consume electricity at the scale of mid-size cities, and a one-gigawatt factory costs roughly $40 billion to build and operate over 15 years.
Gigawatt and Tokens per Watt. A single LED lightbulb uses about 10 watts. A microwave oven pulls about 1,000 watts (one kilowatt). So one gigawatt (GW) is one billion watts of power - the equivalent of a million microwaves running at once, or 100 million LED bulbs lit simultaneously.
To put that in perspective: A single gigawatt is roughly the output of a large nuclear power plant, about 1.3 million horsepower, and is equivalent to 3.1 million solar panels - all enough to power over 100 football stadiums during peak events. A typical coal or natural gas plant produces about 500-600 megawatts, so you'd need almost two of those running flat out to hit 1 gigawatt. The Hoover Dam's peak capacity is about 2 gigawatts. The entire U.S. electrical grid capacity is around 1,200 gigawatts. The famous line from Back to the Future where Doc Brown says "1.21 gigawatts!" actually referenced a massive amount of energy, enough to power roughly 750,000 average American homes simultaneously.
Tokens per watt is the efficiency metric that determines who wins the inference economy: the factory equivalent of "units produced per dollar of energy." NVIDIA's newest system, Vera Rubin paired with the Groq LPX rack, produces 700 million tokens per second per gigawatt, up from two million just two years ago. That is a 350x improvement in factory output. In revenue terms, Vera Rubin plus Groq generates $300 billion in annual revenue per gigawatt, 10 times the prior generation. "You better make for darn sure you put the best computer system on that thing so that you could have the best token cost," Huang said about the $40 billion infrastructure investment. The company that produces the most inference per watt sets the price of intelligence for everyone else.
Token Budget. The concept most likely to reshape corporate operating budgets. Huang described a near future where every employee receives an annual token budget alongside their salary. "I'm going to give them probably half of that on top of it as tokens so that they could be amplified 10x," he said about NVIDIA's own engineers. "How many tokens comes along with my job? It is now one of the recruiting tools in Silicon Valley." For CFOs, this means a new line item: inference consumption per employee. For HR leaders, it reframes productivity investment. For boards, it transforms headcount planning entirely. You are no longer funding people alone. You are funding people plus their inference consumption. Companies that understand this arithmetic will out-invest and out-produce those that do not.
Structured vs. Unstructured Data. Huang called this his favorite slide of the entire keynote. Structured data is information in databases: spreadsheets, transaction records, customer tables. He called it "the ground truth of enterprise computing." Unstructured data is everything else: emails, contracts, PDFs, videos, meeting recordings, roughly 90% of all data generated each year. "Until now, this data has been completely useless to the world," Huang said. AI changes that by using inference to read, interpret, and index unstructured information at scale, converting it into structured intelligence that can be searched, queried, and acted upon. NVIDIA built two foundational libraries for this: cuDF for structured data and cuVS for vector (unstructured) data. The business implication is direct: the 90% of your company's information sitting in documents and emails is now a productive asset, and unlocking it requires massive amounts of inference. The companies that do it first build an intelligence advantage on proprietary data competitors cannot replicate.
Sovereign AI. Governments building their own national inference infrastructure. Forty percent of NVIDIA's demand now comes from outside the traditional cloud hyperscalers, including sovereign AI initiatives, industrial enterprises, and AI-native startups. The $150 billion in AI venture investment last year, which Huang called "the largest in human history," is fueling an explosion of inference capacity that is fragmenting by jurisdiction. For boards with international operations, this means your AI access in the EU, China, India, or the Gulf states may run on sovereign factories with different rules, data residency requirements, and governance regimes. NVIDIA's response is building systems that work everywhere: confidential computing that prevents even the operator from seeing your data, Palantir partnerships for air-gapped deployments, and the Nemotron coalition training sovereign language models for every region. Intelligence is going local the way data privacy already has.
Every Software Company Becomes an Inference Company
If inference is the new economic engine, the business model built on top of it is Agent-as-a-Service. Huang showed one of his most consequential slides: the transformation of enterprise IT from SaaS to what he called "AGaaS." In the old model, companies paid software vendors per seat for tools employees operated manually. In the new model, AI factories produce tokens that flow to software companies, systems integrators, AI providers, and AI-native firms, all of which deploy agents that perform work humans used to do. Those agents consume inference continuously.
"Every single software company of the future will be agentic," Huang said. He cited the example of Nestlé, which uses accelerated data processing to refresh its global supply chain data mart across 185 countries. On traditional CPUs, Nestlé refreshed this data a few times per day. With NVIDIA acceleration, it runs five times faster at 83% lower cost. Now imagine an AI agent consuming that data continuously, making procurement decisions, adjusting inventory, flagging disruptions in real time. That is not a software license. That is an inference workload. For boards and CFOs, this changes every vendor relationship. The transition from seat-based pricing to token-based pricing will be as disruptive to enterprise budgets as the shift from on-premises software to cloud a decade ago.
OpenClaw: The Operating System for the Inference Era
If agents are the new consumers of inference, they need an operating system. Huang argued that OpenClaw is it. The open-source agent framework became the most popular open-source project in history, accumulating 247,000 GitHub stars in weeks, faster than Linux managed in 30 years. Huang compared it to HTML, Linux, and Kubernetes: platform shifts that every company eventually needed a strategy for.
He broke down the comparison in operating system terms. OpenClaw has I/O (it communicates across email, Slack, Telegram, and a dozen other platforms). It has scheduling (cron jobs and task decomposition). It has resource management (access to file systems and databases). It has tool access (it can use browsers, run code, call APIs). It has sub-agent coordination (it can spawn child agents for subtasks). "OpenClaw has open-sourced essentially the operating system of agentic computers," Huang said. "It is no different than how Windows made it possible for us to create personal computers. Now OpenClaw has made it possible for us to create personal agents."
The inference implications are staggering. A chatbot generates a few hundred tokens per interaction when a human asks a question. An always-on agent running cron jobs, accessing databases, executing code, communicating across platforms, and coordinating sub-agents generates millions of tokens per day. Every enterprise deploying agents at scale will see inference consumption, and inference costs, grow by orders of magnitude.
Huang himself named the risk. "Access sensitive information, execute code, and communicate externally," he said about what agents can do inside corporate networks. "Just say that out loud." He paused to let the audience absorb the implications, then presented NemoClaw, NVIDIA's enterprise agent toolkit built on top of OpenClaw, with policy engines, privacy routers, and network guardrails designed to keep agents from acting beyond their authority.
NemoClaw is the right instinct. NVIDIA understands that agents without guardrails do not get deployed at Fortune 500 scale. But the evidence from the field suggests the guardrails are not yet sufficient. A computer science student recently discovered his OpenClaw agent had created a dating profile and was screening matches without his explicit direction. Cisco's security team found a third-party OpenClaw skill performing data exfiltration and prompt injection without user awareness. Fourteen malicious skills appeared on ClawHub in a single weekend, masquerading as cryptowallet tools while harvesting browser data. One of OpenClaw's own maintainers warned the project is "far too dangerous" for non-technical users.
Linux succeeded precisely because enterprises spent years building governance, compliance, and security infrastructure around it. OpenClaw needs the same. The question is whether that infrastructure arrives before or after the first material corporate incident involving an autonomous agent consuming inference inside a production system with access to sensitive data.
When Inference Gets a Body
Everything described so far, tokens, factories, agents, operates in the digital world. The third leg of Huang's vision extends inference into the physical world, and it may carry the greatest long-term consequences.
"The ChatGPT moment of self-driving cars has arrived," Huang announced. He unveiled autonomous vehicle partnerships with BYD, Hyundai, Nissan, and Geely (representing 18 million cars built annually, joining existing partners Mercedes, Toyota, and GM) and a deployment partnership with Uber to connect robo-taxi-ready vehicles to its network across multiple cities. He showcased GR00T for humanoid robots, Cosmos for AI-generated world simulation, and Isaac Lab for training physical AI entirely in simulation before deploying it in the real world.
Then he brought a walking, talking Olaf robot from Disney's "Frozen" onto the stage. It was not decoration. The robot runs on NVIDIA's Jetson platform, with locomotion policies trained in Isaac Lab using a physics engine called Newton, developed jointly with Disney and Google DeepMind. Every step Olaf takes, every word it speaks, every real-time adjustment to its environment runs on inference. A physical agent performing continuous inference in an unpredictable real-world setting represents a qualitatively different challenge from a digital agent operating inside a structured database.
For business leaders, the implication is that the same inference economics, agent frameworks, and governance gaps that apply to digital AI will soon apply to machines moving through warehouses, driving vehicles on public roads, and interacting with customers face to face. When AI agents operating inside your CRM can already create unauthorized dating profiles, the question of what happens when those agents inhabit a two-ton vehicle or a factory floor robot is not theoretical. It is the next chapter. Physical AI multiplies the stakes of every inference governance question by orders of magnitude.
And we haven't even begun to discuss building AI factories in space!
Why NVIDIA's Inference Position May Be Durable
Huang spent the opening section of his keynote on a concept that explains why NVIDIA can charge what it does: the company's identity as "vertically integrated but horizontally open." Vertically integrated means NVIDIA designs everything from chips to racks to software libraries to AI models, all optimized together. Horizontally open means it integrates with every cloud, every enterprise platform, every country, and every AI model. The CUDA computing platform, now 20 years old with hundreds of millions of GPUs installed worldwide, is the flywheel. Developers build on CUDA because the installed base is massive. The installed base grows because developers build on it.
This matters for inference economics because inference performance is not just a hardware problem. When NVIDIA updated the software stack on existing inference service providers using the same hardware, Huang showed that average token speeds jumped from 700 to nearly 5,000 tokens per second, a seven-fold improvement with zero new hardware. That is the power of vertical integration: chips, systems, and software optimized as a single unit. "If you have the wrong architecture, even if it's free, it's not cheap enough," Huang said. He was talking about competitors. But the insight applies to any board evaluating AI infrastructure: the total cost of inference depends on the entire stack, not just the chip price.
Even NVIDIA's six-year-old Ampere GPUs are appreciating in cloud spot pricing, Huang noted, because the continuously updated software stack keeps improving their performance. That useful-life argument is critical for any CFO modeling AI infrastructure costs: NVIDIA systems do not depreciate on the normal curve because the software layer keeps extracting more inference from the same hardware.
7 Questions for Your Next Board Meeting
The industrialization of intelligence is not a metaphor.
Inference is the engine. Factories are being built to produce it. Agents are being deployed to consume it. Intelligence is getting a physical body. And governments are building sovereign infrastructure that fragments the inference supply chain by jurisdiction. This is the new AI computing paradigm, the "5 layer cake" Huang refers to: Energy, chips, infrastructure, models and AI applications. This is transforming every business and every industry. Remember, stop adding AI to your business. Start adding your business to AI.
So what do you do with all this info? Whether you're in the boardroom or running a P&L or team, here are seven key questions to ask:
First, do you know whether employees are already deploying autonomous agent frameworks like OpenClaw on corporate devices? If your CISO cannot answer, the answer is almost certainly yes. Shadow agentic AI is the next shadow IT, and the risk surface is orders of magnitude larger.
Second, can your management team present an AI cost model denominated in inference consumption and business value produced? Have they modeled what an annual token budget per employee looks like at scale? If not, you are approving AI investments you cannot measure.
Third, what is your governance framework for AI agents operating inside production systems? Not chatbots. Agents that access data, execute code, and communicate externally. Who authorizes access? Who monitors actions? What is the escalation path when an agent acts beyond its intended scope?
Fourth, as your software vendors transition from SaaS to Agent-as-a-Service, are your contracts structured for token-based pricing and autonomous action? Or are you negotiating seat licenses for a computing model that no longer exists?
Fifth, what is your strategy for unlocking unstructured data through inference? Ninety percent of the information your company generates each year sits in emails, contracts, and documents. That is now a competitive asset. Do you have a plan to convert it into one?
Sixth, how does your AI infrastructure strategy account for sovereign AI fragmentation? If you operate across jurisdictions, your inference supply chain may soon be as fragmented as your data privacy compliance.
Seventh, which board committee owns AI governance? Not AI strategy in the abstract. Governance: the permissions, the audit trails, the liability framework for autonomous systems performing inference inside your business. If no committee owns it, no one does.
Jensen closed his two-hour keynote with an AI-generated country song performed by animated robots around a campfire. The lyrics included: "Agents used to wait and see, now act autonomously. But if they ever try to stray, safe course, block and say no way." Even in his own closing number, the tension between inference-powered autonomy and governance wrote itself into the chorus.