← Blog
KRAFTON AI · PUBG Ally × NVIDIA Nemotron

From Workflow-Based SLM to Autonomous Agent: Evolving PUBG Ally's Architecture

PUBG Ally is an AI teammate system introduced at GDC 2026. It evolved from a workflow-based SLM design into an autonomous agent architecture, as the language model's role expanded from response generation to tool-using decision-making. This post explains that architectural transition, the design of the agent loop, and the criteria behind choosing Nemotron for on-device deployment.

1. Overview

At GDC 2026, KRAFTON and NVIDIA co-presented "Building a Co-Playable Character: PUBG Ally, an AI Teammate Powered by NVIDIA ACE." The version shown at GDC used a workflow-based SLM design. Since then, Ally has evolved into an autonomous agent architecture, as the language model's role expanded from response generation to tool-using decision-making.

This post covers the background of that architectural transition, the design of the agent loop, and the criteria behind choosing Nemotron for on-device deployment.

2. How the SLM Workflow Expanded

A brief look at how PUBG Ally's SLM workflow evolved.

The version shown at GDC used a workflow-based SLM design. Fast per-frame control was handled by the Behavior Tree, while the language model was responsible for speech and high-level steerable commands — a separation often described as a System 1 / System 2 structure. It worked well, but becoming a true teammate meant expanding what System 2 could do.

As Ally expanded to support a true teammate experience, three capabilities were needed:

As these capabilities were added, the workflow became more structured. Rather than handling everything in a single prompt, the system separated them into dedicated components with different roles and invocation patterns.

User Voice  /  Game Events
Action Agent
Understand · Decide · Respond
Always
Memory Agent
Summarize · Extract · Save
Sometimes
Strategic Agent
Analyze · Generate Strategy
Independent
Proactive Agent
Game Events → Proactive Speech Trigger

The most practical reason for splitting roles was prompt space and latency. A single prompt that handles everything quickly runs into context limits. Separate components mean separate token budgets and call frequencies to optimize independently.

This was where the GDC talk left off. The workflow neatly defined who handles what. But how each component reasons and acts — that part still relied on fixed prompts and predefined flows.

3. When the Workflow-Based SLM Became an Autonomous Agent

The key shift was in the language model's role. In the workflow design, the system called each component in a predefined order, and the language model generated text for a given prompt — orchestration remained under system control.

As the SLM's role expanded, it began to choose which tool to call, read the result, and decide the next step on its own. Rather than receiving instructions from the system, the model became the decision-maker. That is the core transition.

Core Capabilities and Agent Loop

Ally's tools span speech, perception, memory, and action. A few examples of what a tool call looks like in practice:

observe
get_game_overview()
phase 4 · 1m 20s left
alive teams: 8
HP: 72 · zone: safe
query
lookup_game_knowledge("MP5K")
high fire rate
easy recoil control
recent nerf applied
memory
update_memory()
teammate: Minjun
style: aggressive push
weapon: AKM

For each situation, the model selects only the tools it needs, reads the results, and decides the next step — repeating this loop until the response is complete. That requires high tool-calling accuracy, strong result interpretation, and the ability to maintain consistent decision-making across multiple steps.

4. Model Selection for On-Device Deployment: Nemotron

The criteria for choosing an on-device SLM were clear:

The NVIDIA Nemotron (Minitron) family and the NVIDIA stack satisfied all three.

Knowledge Distillation within the Family

We compared two training approaches:

LLM
Teacher · Hard Labels
trajectory data
Minitron 8B
Intermediate · Soft Labels
knowledge distillation
Minitron 2B
On-Device Student
quantize
Q4_K_M GGUF
On-Device Deploy

When teacher and student share the same architecture family, logit distribution transfer is far more natural. Minitron 8B → Minitron 2B share the same tokenizer, attention structure, and training corpus, so the meaning of soft labels is preserved across model boundaries. This was the decisive advantage over other 2B candidates in the KD pipeline.

Practical Advantages of the NVIDIA Stack

IGI (In-Game Inferencing) SDK. A single plugin API allows seamless switching between local execution (GGML/CUDA-based) and cloud execution (NIM). During early development, we validated the architecture on an LLM, then switched to on-device Minitron with a single API change. Without this seamless switching, the entire strategy of "design on LLM, deploy on SLM" would have been impractical.

CUDA in Graphics (CiG). Runs CUDA computation and graphics rendering in the same context. Since Ally repeats tool calls 3–5 times per cycle, the benefit of reduced GPU context switching compounds over repeated calls.

Two On-Device SLM Deployment Patterns

This isn't the first time KRAFTON has deployed an on-device SLM. inZOI's "Smart Zoi" used Minitron 0.5B for character reasoning tasks such as action selection and daily reflection. The two deployments solve different problems, but the contrast illustrates how much the demands on an on-device model can vary.

inZOI (Smart Zoi) PUBG Ally
Model Minitron 0.5B Minitron 2B (Q4_K_M)
Task Single inference (one-shot) Iterative multi-step reasoning
Tool Use None Enabled
Frequency Once per event Iterative per event

Both run on the Nemotron family and the NVIDIA IGI stack — powering in-game AI with an on-device SLM.

5. Summary

The key architectural shift in PUBG Ally was not the replacement of one system with another, but the expansion of the language model's role — from a workflow component responsible for response generation to an agent capable of selecting tools, interpreting results, and deciding what to do next. As that role expanded, so did the requirements placed on the model.

To support this agent loop on player hardware, we chose Nemotron. Multi-turn tool-calling accuracy, Knowledge Distillation within the same architecture family, and game engine integration via IGI SDK and CiG — Nemotron and the NVIDIA runtime stack met all three conditions.

PUBG Ally is set to launch in summer 2026 through PUBG's new Arcade Mode beta.