Research / Compute Stack

The compute stack beneath agentic AI.

Where Airfy sits in the five-layer AI cake. Which Nvidia silicon we ship today. Which lands next. Why the network and the GPU belong on the same balance sheet as the customer that runs them.

Compute inventory Roadmap to Feynman

Nvidia RTX PRO Blackwell 2U air-cooled server, the SMB-class on-prem inference unit

How do I learn AI and earn from it? How do I protect my kids and their future?

Two questions, one answer. Airfy AI Academy teaches families and businesses to use AI on their own infrastructure. The network is the foundation. Your kids inherit the operator side, not a black-box subscription.

Open Academy

The five layers

Two layers we build. Three layers we depend on.

The AI stack has five layers. Three of them are already won by somebody else, and that is a feature, not a bug. We do not build foundation models. We do not fab chips. We do not run the grid. We build the two layers between the silicon and the user, the two layers the customer actually sees and pays for.

That makes us cheaper to capitalize than a model lab, more durable than an app wrapper, and harder to disintermediate than either. We sit where distribution lives.

01 Applications

Talk, Chat, Metis, Nexus, Atlas

Airfy

02 Models

Open source. Llama, Mistral, Mixtral, Gemma.

Partner

03 Infrastructure

Linux OS, identity, network, orchestration

Airfy

04 Chips

Nvidia, AMD, Apple, Ampere, Qualcomm

Partner

05 Energy

Utilities, on-prem PV, datacenter PPAs

Partner

Layer order top to bottom. Airfy ships Applications and Infrastructure. Open-source Models, third-party Chips, and utility Energy do the rest.

Thesis / why the middle layer

Old code does not become AI by being asked nicely.

Every company that was around before 2023 has a code base, a database, and a software stack built for a pre-agentic world. None of it speaks to a frontier model. None of it carries the identity, the audit log, or the token-auth a real agent needs to act on a live system. That gap is the middle layer.

We close the gap. The customer keeps the chips. The customer keeps the open-source models. We bring the infrastructure that connects the two, the identity layer that lets the agent prove it is allowed to act, and the applications people actually talk to. Monthly subscription. Hardware in the building. Software audited at the firmware level.

Airfy Talk is the first proof. Voice to action on a GPU the customer owns. No cloud round-trip. The same orchestration that ships voice today ships agents tomorrow. The platform is one platform.

Read Airfy Talk

Blackwell-class server, the chassis on which the middle layer runs

The moat

Own the stack. Own the exit.

Every modern platform solves integration by moving you into their cloud. It is the same lock-in the old manufacturers sold, with a nicer dashboard. You trade one cage for a newer one.

We solve it the other way. The control plane runs on hardware you rack. The data never leaves the building. Firmware evidence is scoped per released component. Partner-owned, self-hosted, source availability by release.

They integrate everything into their cloud. We integrate everything into yours.

Verified, May 2026

The whole platform, in four numbers.

Counted from the running code base. Updated when it changes. Anything older that says one hundred and seventy-seven MCP tools or seventeen hundred and sixty-five tests is stale, replace it.

MCP

tool surface in the agentic-networking binary

API

HTTP routes across the cloud API

automated release checks

Ops

callable operator surface the AI can reach

Sources: [13]. Counted by grep on the running code. Re-counted on every release.

Compute inventory

Silicon we ship. Silicon we skip. Silicon at the edge.

Hopper is for fleets that already paid for it. Blackwell is the floor for new builds. Blackwell Ultra is shipping now and goes into the sites that can power it. Rubin ramps into full production this fall. Jetson sits at the edge, on its own.

SKU	Architecture	Memory	FP4 dense	TDP	Airfy fit	Status
H100 SXM	Hopper	80 GB HBM3	n/a	700 W	Legacy training, fast inference	Skipped
H200 SXM	Hopper Ultra	141 GB HBM3e	n/a	700 W	Large-context inference on existing fleets	Skipped
B100 SXM	Blackwell, dual-die	192 GB HBM3e	~7 PFLOPS	700 W	Announced air-cooled drop-in, canceled before volume in favor of B200. We skipped it.	Skipped
B200 SXM	Blackwell, dual-die	180 GB HBM3e	~9 PFLOPS	1,000 W	Dense inference and training, HGX/DGX building block	Shipping
GB200 NVL72	Grace + Blackwell rack	13.4 TB pooled HBM3e	~720 PFLOPS / rack	~120 kW / rack	Rack-scale frontier training and inference	Shipping
B300 / Blackwell Ultra	Blackwell Ultra, dual-die	288 GB HBM3e	~15 PFLOPS	1,400 W	Long-context reasoning and test-time scaling	Shipping
GB300 NVL72	Grace + Blackwell Ultra rack	20+ TB pooled HBM3e	~1,080 PFLOPS / rack	~120 kW / rack	Rack-scale agentic inference and reasoning	Shipping
RTX PRO 6000 Blackwell	Blackwell (GB202)	96 GB GDDR7	~2 PFLOPS	600 W	Workstation + 2U air-cooled on-prem server	Shipping
Jetson AGX Thor T5000	Blackwell edge	128 GB LPDDR5x	~1,035 TFLOPS	40-130 W	Edge robotics, on-prem agentic inference	Edge
Jetson Thor T4000	Blackwell edge	64 GB LPDDR5x	~600 TFLOPS	40-70 W	Industrial AI, autonomous systems	Edge
Jetson AGX Orin 64 GB	Ampere	64 GB LPDDR5	n/a	15-60 W	Industrial gateway, vision pipelines, 275 INT8 TOPS	Edge
Jetson Orin Nano Super	Ampere	8 GB LPDDR5	n/a	7-25 W	Maker, starter inference, 67 INT8 TOPS at $249	Edge

Sources: [1] [2] [3] [6] [7] [8] [16]. FP4 figures are dense, non-sparse peaks. B100 through GB300 are Nvidia-published dense numbers. RTX PRO 6000 and Jetson are halved from Nvidia's published sparse figures, which is what Nvidia headlines for those parts, for example 2,070 sparse TFLOPS on Thor. Real workloads land lower.

Roadmap

From Hopper to Feynman, one architecture per year.

Nvidia stated the cadence at GTC 2024 and has held it through GTC 2026. We plan our partner refresh cycles against it. Each row is what gets installed, not what gets announced.

2022

Hopper (H100)

TSMC 4N · 80 GB HBM3

The last pre-MoE-era flagship. Still doing the heavy lifting in many partner sites.

2023

Hopper Ultra (H200)

TSMC 4N · 141 GB HBM3e

Memory bump for long context. Compute unchanged from H100.

2024

Blackwell (B200, GB200)

TSMC 4NP · 180 GB HBM3e

Dual-die GPU on a single package, 10 TB/s chip-to-chip. NVLink 5 at 1.8 TB/s. NVFP4 native. NVL72 rack. The air-cooled B100 was canceled before volume.

2025

Blackwell Ultra (B300, GB300)

TSMC 4NP · 288 GB HBM3e

15 PFLOPS dense NVFP4, 1.5x the tensor throughput of standard Blackwell, 2x attention from doubled softmax units. NVL72 rack pools 20 TB HBM. Shipping in volume since late 2025.

2026

Rubin + Vera (Vera Rubin NVL72)

TSMC 3nm class · HBM4

Vera CPU plus Rubin GPU. In full production as of mid-2026, shipments begin in the fall on CoreWeave, Lambda, and Oracle. 50 PFLOPS dense NVFP4 per GPU, 288 GB HBM4 at 22 TB/s, NVLink 6 at 3.6 TB/s. A companion die, Rubin CPX, handles million-token context on cheaper GDDR7.

2027

Rubin Ultra (NVL576)

TSMC 3nm class (expected) · HBM4e

Four reticle-sized dies per GPU package, 1 TB HBM4e each, NVLink 7. Rack scale roughly doubles. Power and cooling retrofits become the bottleneck.

2028

Feynman

TSMC 2nm class (expected) · custom HBM

Announced GTC 2025, detailed at GTC 2026: 3D-stacked dies, custom HBM, the new Rosa CPU, and NVLink 8 with co-packaged optics.

Sources: [1] [4] [5] [10] [17] [18].

Sourcing

Boards from the West. Software from us. Chips from Nvidia.

EO 14392 and the FCC firmware waiver redrew the supply chain in 2026. We were already there, because in 2022 the AirZen entity was set up to ship from the United States and the European Union. The router is commodity. The firmware is the asset.

Boards we ship

Reference designs from chipmakers and ODMs, scoped per customer jurisdiction. The router is commodity, the firmware and operating model are the asset.

Servers we resell

Nvidia DGX, MGX, and HGX systems through approved channel partners. RTX PRO Blackwell 2U air-cooled servers for SMB inference.

Silicon we depend on

Nvidia Hopper, Blackwell, Blackwell Ultra, Rubin (incoming). AMD MI300 class as a second source. Apple Silicon for on-device inference.

Software we hand back

Linux firmware on the router, identity and orchestration on the server, applications on top. Auditability and source availability are stated per released component.

Sources: [14] [15].

Two decades

The same direction across product generations.

From custom router firmware to managed WiFi and the AI infrastructure modules that can be verified per deployment.

Origin

Custom router firmware

Linux-based router firmware and managed network operations for demanding deployments.

Platform

Multi-vendor operating model

One operating surface across supported hardware, sites, policies, and support workflows.

Cloud

Remote management

Monitoring, configuration, firmware updates, and evidence reporting across distributed networks.

AI-assisted network diagnostics

MCP tools and cloud API routes give the assistant a bounded surface for network operations.

Roadmap

AI infrastructure modules

Identity, voice, compute, and AI workflows are introduced as deployment-ready modules.

Open questions

The things we have not figured out yet.

Honest list of what is still under bench-marking. We will update this page as the answers harden.

Sources: [5] [6] [9] [11] [12] [17] [19] [20].

Citations

References.

The numbers on this page are sourced from chipmaker datasheets, US regulatory filings, and the running Airfy code base. Every number is verifiable.

[01]
NvidiaBlackwell architecture overview, 10 TB/s chip-to-chip interconnect
[02]
NvidiaDGX B200, eight 180 GB GPUs, 14.3 kW, 10U chassis
[03]
NvidiaDGX B300 system with eight Blackwell Ultra GPUs
[04]
NvidiaGB300 NVL72 rack, 20 TB pooled HBM, agentic inference
[05]
Nvidia Developer BlogInside Blackwell Ultra, 15 PFLOPS dense NVFP4, 2x attention
[06]
NvidiaJetson AGX Thor T5000, 2,070 sparse FP4 TFLOPS, 128 GB LPDDR5x
[07]
NvidiaJetson Orin family, AGX Orin 275 TOPS, Orin Nano Super at $249
[08]
NvidiaBlackwell platform launch, NVL72, 1.8 TB/s NVLink 5
[09]
SemiAnalysisBlackwell perf and TCO: the 30x headline is about 18x once precision-normalized
[10]
Tom's HardwareVera Rubin platform in depth, NVL72, Rubin Ultra NVL576, Feynman
[11]
Nvidia BlogOpen-source inference on Blackwell, 20 to 5 cents per million tokens in NVFP4
[12]
ModalB200 pricing analysis and cloud rental economics
[13]
AirfyVerified KPI snapshot, source of truth
[14]
FCCDA 26-454, router firmware waiver extension to 2029
[15]
US Executive OrderEO 14392 and origin claim enforcement
[16]
NvidiaHGX platform, HGX B200 is 1.4 TB (180 GB per GPU), FP4 144 | 72 PFLOPS sparse | dense
[17]
NvidiaVera Rubin NVL72, 288 GB HBM4, 50 PFLOPS NVFP4 per GPU, NVLink 6
[18]
Nvidia NewsroomVera Rubin ramps into full production, shipments begin fall 2026
[19]
Nvidia NewsroomRubin CPX, a GPU for massive-context inference on GDDR7
[20]
Nvidia Developer BlogIntroducing NVFP4, about 3.5x smaller than FP16 at preserved accuracy

The infrastructure under your AI runs on hardware you can audit.

Download the app to scan your network. Talk to the team if you are mapping a partner site to Blackwell or Rubin. Read the rest of the research below.

Download the app Talk to the team

The compute stack beneath agentic AI.

How do I learn AI and earn from it? How do I protect my kids and their future?

Two layers we build. Three layers we depend on.

Old code does not become AI by being asked nicely.

Own the stack. Own the exit.

The whole platform, in four numbers.

Silicon we ship. Silicon we skip. Silicon at the edge.

From Hopper to Feynman, one architecture per year.

Hopper (H100)

Hopper Ultra (H200)

Blackwell (B200, GB200)

Blackwell Ultra (B300, GB300)

Rubin + Vera (Vera Rubin NVL72)

Rubin Ultra (NVL576)

Feynman

Boards from the West. Software from us. Chips from Nvidia.

Boards we ship

Servers we resell

Silicon we depend on

Software we hand back

The same direction across product generations.

Custom router firmware

Multi-vendor operating model

Remote management

AI-assisted network diagnostics

AI infrastructure modules

The things we have not figured out yet.

Does the partner site need NVLink, or does Ethernet keep up?

FP4 vs FP8 vs INT8: which quantization survives in production?

How much can Jetson actually do on its own?

When does owning beat renting?

Why does the customer want to own the GPU at all?

What breaks when Rubin lands?

References.

The infrastructure under your AI runs on hardware you can audit.