NVIDIA Spark vs AMD Ryzen AI Max: Which Little AI Box Do I Actually Want?

Wed, Jun 3, 2026 · 7 min read

For two years the answer to “where do I run a model that won’t fit on a normal GPU” was the same boring answer: rent a cloud instance, watch the meter, hope you remembered to shut it down. In 2026 there’s finally a second answer that lives on your desk. Two of them, actually, and they take opposite philosophical bets.

NVIDIA shrank a slice of its datacenter stack into a desktop and called it DGX Spark — a GB10 Grace Blackwell superchip with 128GB of unified memory and a literal petaFLOP of FP4 compute. AMD took the other road: Ryzen AI Max+ 395, codename Strix Halo, a normal x86 APU that happens to carry up to 128GB of unified memory and shows up in mini PCs you can buy for roughly half the price.

I run models at home. Not as a hobby — as part of how I actually work now. So this comparison isn’t academic for me. One of these is going on my desk this year. The question is which.

Why unified memory is the whole story

Start here, because everything else is downstream of it.

When you run a large language model, the bottleneck almost never the GPU’s math. It’s memory. For every single token the model generates, it has to read the entire set of weights out of memory. A 70-billion-parameter model at reasonable quantization is tens of gigabytes that get streamed, in full, once per token. Run out of memory to hold the model and you’re done before you start — it doesn’t matter how fast the cores are.

A normal gaming GPU gives you 16 or 24GB of fast VRAM and a hard wall after that. Unified memory tears the wall down: the CPU and the accelerator share one big pool, so a 70B or 120B model that would never fit on a consumer card just loads. That’s the entire reason these boxes exist. Both NVIDIA and AMD ship 128GB configurations, and both let the GPU side treat almost all of it as model memory. AMD will hand up to 96GB of its 128GB straight to the iGPU. NVIDIA’s whole pool is coherent across the Grace CPU and the Blackwell GPU by design.

Capacity gets the model loaded. Bandwidth decides how fast it talks back. Hold onto that — it’s where the marketing and the reality diverge.

The NVIDIA bet: bring the datacenter home

DGX Spark is NVIDIA doing the thing NVIDIA does best — vertical integration. The GB10 pairs a 20-core Arm CPU (ten Cortex-X925, ten Cortex-A725) with a Blackwell GPU and 128GB of LPDDR5x, ships with 4TB of storage, and quotes up to one petaFLOP of FP4 AI performance. It runs NVIDIA’s own Ubuntu-based stack with the full CUDA toolchain preinstalled. It landed in October 2025 at a $3,999 starting price, though real retail listings drift up toward $4,400–$5,400 depending on who’s selling.

At Build last week Microsoft extended the family with the Surface RTX Spark Dev Box — same Grace-plus-Blackwell idea, 128GB unified, a passive-cooled cheese-grater chassis, Windows 11 Pro preloaded with VS Code, Copilot, Git, Python and Node. NVIDIA also showed RTX Spark notebooks built on the N1X (essentially the same silicon, MediaTek-codesigned Arm cores, up to 6,144 CUDA cores), with Asus, Dell, HP, Lenovo and MSI all signed on for the fall.

The pitch is coherent: if your work already lives on CUDA, this is the shortest possible path from laptop prototype to the H100 cluster you deploy on. Same libraries, same kernels, same nvidia-smi. Where DGX Spark genuinely shines is prefill — chewing through a long prompt before it answers. Benchmarks have it doing prompt processing in the 1,700 tokens-per-second range, which is enormous, and it’ll fine-tune models that used to demand a rented cloud GPU. For an ML engineer iterating on training code, that combination is the real product.

The AMD bet: it’s just a PC, and that’s the point

Ryzen AI Max+ 395 is almost aggressively normal by comparison, and I mean that as praise. Sixteen Zen 5 cores, 40 RDNA 3.5 compute units, an XDNA 2 NPU good for around 50 INT8 TOPS, up to 128GB of LPDDR5X-8000 shared across the whole chip. It is x86. It boots whatever you point it at — Windows, Fedora, whatever’s in your /boot. There is no Arm translation tax, no “does this wheel have an aarch64 build” detour, no separate universe of drivers.

It also shows up in a market rather than a single SKU. GMKtec’s EVO-X2, the Framework Desktop, Sapphire’s box, and AMD’s own “Ryzen AI Halo” developer mini PC hitting Micro Center this month — all the same silicon, competing OEMs, competing prices. A 128GB Framework Desktop lands around $2,348. That’s roughly half a DGX Spark for the same memory capacity, and that gap is the entire argument.

The software story used to be where AMD lost this paragraph. It’s no longer automatic. ROCm has genuinely improved, and for inference specifically the backends that matter — llama.cpp, the Vulkan path, Ollama on top of them — run fine on RDNA 3.5 without you compiling your life away. It isn’t CUDA’s twenty-year ecosystem and it won’t be soon. But for running models rather than authoring novel CUDA kernels, the gap that decides purchases has narrowed to something I can live inside.

The bandwidth number nobody prints on the box

Here’s the catch that reframes the whole comparison, and it’s the same number on both machines: memory bandwidth lands around 273 GB/s.

That is a fraction of what a real datacenter GPU moves, and because token generation is bandwidth-bound, it’s a hard ceiling on how fast either box answers from a big model. The petaFLOP on the NVIDIA spec sheet is real — it just doesn’t help decode, because decode is waiting on memory, not math. On a 120B-class model the published numbers tell the story: DGX Spark generates around 38 tokens per second, the Strix Halo box around 34. Close enough that you would not feel the difference reading the output. For contrast, three secondhand RTX 3090s wired together push past 120 — three times faster — because they have three times the memory bandwidth, not three times the FLOPS.

So the honest framing is: NVIDIA decisively wins prefill and anything involving its training/fine-tuning toolchain. AMD ties on the thing most of us actually do at a desk — decode, generating tokens from a model that’s already loaded — at roughly half the price. The petaFLOP is a prefill-and-training number. It is not a “your chatbot is twice as fast” number, and reading it that way is how people end up disappointed.

What I’d actually do with each one

For me the use case is concrete: run a capable local model behind Ollama, point an agent at it, keep the private and the offline-able work off somebody’s API. Interactive chat, code assistance, document grinding, the occasional 70B when I want a second opinion that never leaves the house. Decode speed and price-per-loaded-gigabyte are what I feel daily. Prefill throughput and CUDA fine-tuning are nice — but they’re not my Tuesday.

If I were shipping production ML on NVIDIA infrastructure, the calculus flips hard. The value of developing on the exact stack you deploy to is enormous, prefill speed compounds across a workday of long contexts, and “my laptop and the cluster speak the same language” is worth real money. DGX Spark is built for that person and it’s very good at being that person’s machine. That person just isn’t me.

So which one do I want?

The AMD Ryzen AI Max+ 395. With my own money, this year.

It delivers the thing I came for — 128GB of unified memory and competitive token generation — for roughly half the price, on x86, booting Linux without an asterisk, in a box I can pick from several competing vendors instead of one. The cost-benefit isn’t close for my workload. Same model fits, the tokens come out at nearly the same speed, and the difference buys a second box or a year of electricity.

I’ll say the quiet part too: if I lived in CUDA all day, I’d buy the Spark and not blink, because then I’d be paying for the ecosystem, not just the silicon, and the ecosystem is worth it. NVIDIA “delivers more” in the absolute sense — more prefill, more software gravity, a real fine-tuning story. AMD delivers more value, which is the question I was actually asked.

And the standard caveat, because the ground is still moving: RTX Spark notebook and Surface Dev Box pricing isn’t announced, final shipping benchmarks will shift these numbers, and AMD’s next Halo part is already on the rumor mill. If NVIDIA prices the consumer RTX Spark aggressively this fall, I reserve the right to change my mind. But on what’s actually purchasable today, for what I actually do — the little AMD box wins, and it isn’t particularly close.