← kwindla hultman kramer

I'm really looking forward to @NVIDIAGTC in March

February 15, 2026

I'm really looking forward to @NVIDIAGTC in March. Last year was amazing. (And I came home with a new 5090!)

I've been working on building multi-agent, local/cloud hybrid applications on my NVIDIA DGX Spark. Here's a video of an LLM-powered game, running on the Spark, in which you fly around by talking to your AI space ship.

The conversational voice agent is a @pipecat_ai pipeline built with:
- Nemotron Speech ASR
- Nemotron 3 Nano
- Magpie TTS

The Nemotron 3 Nano voice agent delegates the long-running, agent-loop tasks to bigger models in the cloud. You can see it start tasks in the video. It has tool calls to start, steer, query into, and stop tasks. It can run multiple tasks at once.

The voice control of the user interface is a separate mini-agent running in parallel with the conversational voice agent.

The UI agent's only job is to send UI update triggers to the client. To do this well, it has to track both the conversation and the world state (what ships are where, etc). You can see in the video that it "figures out" what I mean when I say "two of our corporation ships are close together, zoom into that area on the map." The prompt for the UI agent is mostly a large number of few-shot examples across various categories of "user intent" relating to the visual interface.

The voice agent does automatic, non-blocking context compaction, so there's no limit on how long you can stay connected and talk to it. It can also progressively load skills from markdown files.

The game is multi-player. At the end of the video you can see me send a broadcast message to other players. We have NPCs running around in the game, too, also powered by Nemotron 3 Nano! The NPCs will respond to your messages and react to what you do in the game. (They're fully autonomous LLM inference loops with access to all the same tools that your voice and task agents have.)

Importantly (I think) you can see the voice agent make a mistake in the middle of the video. The main idea of this project is to have an open-ended canvas to explore the leading edge of what today's LLMs can do. And in the context of a game, having the LLM sometimes surprise you with what it can do, and sometimes surprise you by "failing" to do what you thought it would is actually a lot of fun. The whole video is one recording, no cuts or edits.

The voice agent has a system prompt that's about 7,000 tokens. With the Nemotron Speech, LLM, and Magpie models on the DGX Spark, we have a voice-to-voice latency of about 900ms.

In the video I'm using vLLM and the full-weights (BF16) version of Nemotron 3 Nano. You can use smaller quants, too. For Nemotron Speech ASR and Magpie I wrote custom WebSocket streaming servers. Here's a DGX Spark Docker container with all three models.

https://t.co/WurMyJ59Ye

Here's the complete source code for the game. Credit for the beautiful UI design goes to @JonPTaylor who did all the React client work.

https://t.co/M842Jz1sQT

Tagging @ctnzr and #NVIDIAGTC.

I did a talk a couple of weeks ago about the agent patterns in the game, and how they're similar to patterns we use in coding agent harnesses and in voice agents for enterprise applications.

Space Machine Sandboxes[1]

  1. https://www.youtube.com/watch?v=HnYafj9h-48