Research / Compute Stack

The compute stack beneath agentic AI.

Where Airfy sits in the five-layer AI cake. Which Nvidia silicon we ship today. Which lands next. Why the network and the GPU belong on the same balance sheet as the customer that runs them.

Nvidia RTX PRO Blackwell 2U air-cooled server, the SMB-class on-prem inference unit

How do I learn AI and earn from it? How do I protect my kids and their future?

Two questions, one answer. Airfy AI Academy teaches families and businesses to use AI on their own infrastructure. The network is the foundation. Your kids inherit the operator side, not a black-box subscription.

Open Academy

The five layers

Two layers we build. Three layers we depend on.

The AI stack has five layers. Three of them are already won by somebody else, and that is a feature, not a bug. We do not build foundation models. We do not fab chips. We do not run the grid. We build the two layers between the silicon and the user, the two layers the customer actually sees and pays for.

That makes us cheaper to capitalize than a model lab, more durable than an app wrapper, and harder to disintermediate than either. We sit where distribution lives.

01 Applications
Talk, Chat, Metis, Nexus, Atlas
Airfy
02 Models
Open source. Llama, Mistral, Mixtral, Gemma.
Partner
03 Infrastructure
Linux OS, identity, network, orchestration
Airfy
04 Chips
Nvidia, AMD, Apple, Ampere, Qualcomm
Partner
05 Energy
Utilities, on-prem PV, datacenter PPAs
Partner

Layer order top to bottom. Airfy ships Applications and Infrastructure. Open-source Models, third-party Chips, and utility Energy do the rest.

Thesis / why the middle layer

Old code does not become AI by being asked nicely.

Every company that was around before 2023 has a code base, a database, and a software stack built for a pre-agentic world. None of it speaks to a frontier model. None of it carries the identity, the audit log, or the token-auth a real agent needs to act on a live system. That gap is the middle layer.

We close the gap. The customer keeps the chips. The customer keeps the open-source models. We bring the infrastructure that connects the two, the identity layer that lets the agent prove it is allowed to act, and the applications people actually talk to. Monthly subscription. Hardware in the building. Software audited at the firmware level.

Airfy Talk is the first proof. Voice to action on a GPU the customer owns. No cloud round-trip. The same orchestration that ships voice today ships agents tomorrow. The platform is one platform.

Read Airfy Talk
Blackwell-class server, the chassis on which the middle layer runs

The moat

Own the stack. Own the exit.

Every modern platform solves integration by moving you into their cloud. It is the same lock-in the old manufacturers sold, with a nicer dashboard. You trade one cage for a newer one.

We solve it the other way. The control plane runs on hardware you rack. The data never leaves the building. Firmware evidence is scoped per released component. Partner-owned, self-hosted, source availability by release.

They integrate everything into their cloud. We integrate everything into yours.

Verified, May 2026

The whole platform, in four numbers.

Counted from the running code base. Updated when it changes. Anything older that says one hundred and seventy-seven MCP tools or seventeen hundred and sixty-five tests is stale, replace it.

MCP

tool surface in the agentic-networking binary

API

HTTP routes across the cloud API

CI

automated release checks

Ops

callable operator surface the AI can reach

Sources: [13]. Counted by grep on the running code. Re-counted on every release.

Compute inventory

Silicon we ship. Silicon we skip. Silicon at the edge.

Hopper is for fleets that already paid for it. Blackwell is the floor for new builds. Blackwell Ultra is shipping now and goes into the sites that can power it. Rubin ramps into full production this fall. Jetson sits at the edge, on its own.

SKUArchitectureMemoryFP4 denseTDPAirfy fitStatus
H100 SXMHopper80 GB HBM3n/a700 WLegacy training, fast inferenceSkipped
H200 SXMHopper Ultra141 GB HBM3en/a700 WLarge-context inference on existing fleetsSkipped
B100 SXMBlackwell, dual-die192 GB HBM3e~7 PFLOPS700 WAnnounced air-cooled drop-in, canceled before volume in favor of B200. We skipped it.Skipped
B200 SXMBlackwell, dual-die180 GB HBM3e~9 PFLOPS1,000 WDense inference and training, HGX/DGX building blockShipping
GB200 NVL72Grace + Blackwell rack13.4 TB pooled HBM3e~720 PFLOPS / rack~120 kW / rackRack-scale frontier training and inferenceShipping
B300 / Blackwell UltraBlackwell Ultra, dual-die288 GB HBM3e~15 PFLOPS1,400 WLong-context reasoning and test-time scalingShipping
GB300 NVL72Grace + Blackwell Ultra rack20+ TB pooled HBM3e~1,080 PFLOPS / rack~120 kW / rackRack-scale agentic inference and reasoningShipping
RTX PRO 6000 BlackwellBlackwell (GB202)96 GB GDDR7~2 PFLOPS600 WWorkstation + 2U air-cooled on-prem serverShipping
Jetson AGX Thor T5000Blackwell edge128 GB LPDDR5x~1,035 TFLOPS40-130 WEdge robotics, on-prem agentic inferenceEdge
Jetson Thor T4000Blackwell edge64 GB LPDDR5x~600 TFLOPS40-70 WIndustrial AI, autonomous systemsEdge
Jetson AGX Orin 64 GBAmpere64 GB LPDDR5n/a15-60 WIndustrial gateway, vision pipelines, 275 INT8 TOPSEdge
Jetson Orin Nano SuperAmpere8 GB LPDDR5n/a7-25 WMaker, starter inference, 67 INT8 TOPS at $249Edge

Sources: [1] [2] [3] [6] [7] [8] [16]. FP4 figures are dense, non-sparse peaks. B100 through GB300 are Nvidia-published dense numbers. RTX PRO 6000 and Jetson are halved from Nvidia's published sparse figures, which is what Nvidia headlines for those parts, for example 2,070 sparse TFLOPS on Thor. Real workloads land lower.

Roadmap

From Hopper to Feynman, one architecture per year.

Nvidia stated the cadence at GTC 2024 and has held it through GTC 2026. We plan our partner refresh cycles against it. Each row is what gets installed, not what gets announced.

2022

Hopper (H100)

TSMC 4N · 80 GB HBM3

The last pre-MoE-era flagship. Still doing the heavy lifting in many partner sites.

2023

Hopper Ultra (H200)

TSMC 4N · 141 GB HBM3e

Memory bump for long context. Compute unchanged from H100.

2024

Blackwell (B200, GB200)

TSMC 4NP · 180 GB HBM3e

Dual-die GPU on a single package, 10 TB/s chip-to-chip. NVLink 5 at 1.8 TB/s. NVFP4 native. NVL72 rack. The air-cooled B100 was canceled before volume.

2025

Blackwell Ultra (B300, GB300)

TSMC 4NP · 288 GB HBM3e

15 PFLOPS dense NVFP4, 1.5x the tensor throughput of standard Blackwell, 2x attention from doubled softmax units. NVL72 rack pools 20 TB HBM. Shipping in volume since late 2025.

2026

Rubin + Vera (Vera Rubin NVL72)

TSMC 3nm class · HBM4

Vera CPU plus Rubin GPU. In full production as of mid-2026, shipments begin in the fall on CoreWeave, Lambda, and Oracle. 50 PFLOPS dense NVFP4 per GPU, 288 GB HBM4 at 22 TB/s, NVLink 6 at 3.6 TB/s. A companion die, Rubin CPX, handles million-token context on cheaper GDDR7.

2027

Rubin Ultra (NVL576)

TSMC 3nm class (expected) · HBM4e

Four reticle-sized dies per GPU package, 1 TB HBM4e each, NVLink 7. Rack scale roughly doubles. Power and cooling retrofits become the bottleneck.

2028

Feynman

TSMC 2nm class (expected) · custom HBM

Announced GTC 2025, detailed at GTC 2026: 3D-stacked dies, custom HBM, the new Rosa CPU, and NVLink 8 with co-packaged optics.

Sources: [1] [4] [5] [10] [17] [18].

Sourcing

Boards from the West. Software from us. Chips from Nvidia.

EO 14392 and the FCC firmware waiver redrew the supply chain in 2026. We were already there, because in 2022 the AirZen entity was set up to ship from the United States and the European Union. The router is commodity. The firmware is the asset.

Boards we ship

Reference designs from chipmakers and ODMs, scoped per customer jurisdiction. The router is commodity, the firmware and operating model are the asset.

Servers we resell

Nvidia DGX, MGX, and HGX systems through approved channel partners. RTX PRO Blackwell 2U air-cooled servers for SMB inference.

Silicon we depend on

Nvidia Hopper, Blackwell, Blackwell Ultra, Rubin (incoming). AMD MI300 class as a second source. Apple Silicon for on-device inference.

Software we hand back

Linux firmware on the router, identity and orchestration on the server, applications on top. Auditability and source availability are stated per released component.

Sources: [14] [15].

Two decades

The same direction across product generations.

From custom router firmware to managed WiFi and the AI infrastructure modules that can be verified per deployment.

Origin

Custom router firmware

Linux-based router firmware and managed network operations for demanding deployments.

Platform

Multi-vendor operating model

One operating surface across supported hardware, sites, policies, and support workflows.

Cloud

Remote management

Monitoring, configuration, firmware updates, and evidence reporting across distributed networks.

AI

AI-assisted network diagnostics

MCP tools and cloud API routes give the assistant a bounded surface for network operations.

Roadmap

AI infrastructure modules

Identity, voice, compute, and AI workflows are introduced as deployment-ready modules.

Open questions

The things we have not figured out yet.

Honest list of what is still under bench-marking. We will update this page as the answers harden.

Sources: [5] [6] [9] [11] [12] [17] [19] [20].

Citations

References.

The numbers on this page are sourced from chipmaker datasheets, US regulatory filings, and the running Airfy code base. Every number is verifiable.

  1. [01]
  2. [02]
  3. [03]
  4. [04]
  5. [05]
  6. [06]
  7. [07]
  8. [08]
  9. [09]
  10. [10]
  11. [11]
  12. [12]
  13. [13]
  14. [14]
  15. [15]
  16. [16]
  17. [17]
  18. [18]
  19. [19]
  20. [20]

The infrastructure under your AI runs on hardware you can audit.

Download the app to scan your network. Talk to the team if you are mapping a partner site to Blackwell or Rubin. Read the rest of the research below.