How DuoQ Works

By Abby Farhat - 4/30/2025

What's so special about DuoQ?

DuoQ is (as of time of writing) the world's first FPS built entirely around conversational AI. Tala, our main character, talks to the player over real-time voice comms, while simultaneously playing the game.

A few FPS games beat us to market on integrating this tech - see PUBG Ally and Arena Breakout: Infinite - but we are the first game designed around the AI from the beginning.

Accordingly, we saw two different directions for how we could best use this technology:

Option 1 - Focus on Combat

Multiplayer games let you strategize with a stranger, performing tactical moves not possible as a solo player.
An AI player lets us recreate that tactical experience in singleplayer.

Option 2 - Focus on Connection

Multiplayer spaces - especially voice calls - are unique in their ability to bring different personalities together.

An AI player allows conversations that are far less rigid than most games, while still following an authored narrative with twists and turns.

In the end, we incorporated both but heavily leaned towards Connection.

The Combat approach seems fun, but if that's all the game is, we are simply making a watered down multiplayer game. On the other hand, the Conversation approach lets us tell a story with a very interesting balance of player choice and authorship. Players freely choose what to say, and we use a massive bank of pre-written dialogue to respond naturally while progressing the story.

Ok, so how does "Conversation" work?

When the player presses their voice-to-talk button, 3 things happen:

1. Transcription

We use an Whisper, a fantastic open-source model to transcribe the player's voice.

This model runs locally on your GPU and, in testing, transcribes an average sentence of speech in under 25ms.

2. Inference

The transcription, along with some game state info, is sent to an LLM.

The final game uses Google's Gemini 2.0 Flash model. On a typical network connection, inference for one sentence hovers around 400ms.

3. Action Planning

Back in the game client, our internal GOAP framework turns the LLM output into in-game goals.

This is a fairly traditional use of game AI and has negligible latency.

So, even on most min-spec computers, DuoQ has at most a 500ms delay between the player finishing speaking and them hearing Tala's response. This is technically worse than real voice chat, which typically aim for <50ms of one-way voice latency, but it perceptibly feels pretty close since Tala's responses are pre-recorded.

Edit Page