From Text to World: Generating Fantasy Art in Real-Time

When the AI Game Master describes a torch-lit dungeon, Embertold can show you that dungeon. Here's how real-time image generation works behind the scenes.
The Challenge
Generating images during gameplay has unique constraints:
- Speed — Players are waiting. Every second counts.
- Relevance — The image must match what was just narrated.
- Style consistency — Images across an adventure should feel like they belong together.
- Cost efficiency — We can't generate a new image for every sentence.
Solving all four simultaneously required careful engineering.
Prompt Engineering
The AI Game Master doesn't just send the narration text to the image generator. It crafts a specific image prompt designed for visual generation:
- Concrete over abstract. "A stone bridge over a rushing river in a pine forest at sunset" generates better than "a beautiful natural scene."
- Composition cues. We include hints about framing: "wide shot," "looking up at," "seen from a distance."
- Style modifiers. Each adventure can have a visual style that's appended to every prompt, ensuring consistency. A gritty dark fantasy adventure gets different modifiers than a whimsical fairy tale.
The Style System
Embertold supports configurable image styles. Each style includes prompt modifiers that define the visual aesthetic:
- Art style (painterly, realistic, stylized)
- Color palette tendencies
- Lighting preferences
- Level of detail
This means two adventures in the same universe can look completely different based on their style configuration. A horror-themed adventure feels darker and grittier than an epic heroic quest.
When Images Are Generated
The AI is selective about when to generate images. Not every message gets one — that would be overwhelming and expensive. Images are generated for:
- Major location changes — Entering a new area
- Dramatic reveals — A hidden chamber, a breathtaking vista
- Key narrative moments — The villain's lair, the final battle, a moment of triumph
The AI judges the narrative importance of each moment and only triggers image generation when it will have real impact.
Caching at Scale
As covered in our caching article, generated images are cached using similarity matching. A "torch-lit stone dungeon" generated for one player benefits all future players encountering similar scenes.
Over time, common fantasy environments (taverns, forests, dungeons, cities) accumulate rich caches of images. The system gets faster and cheaper with every session played.
The Results
The combination of crafted prompts, consistent styles, selective generation, and aggressive caching means that Embertold can show you your adventure — in real-time, with visual coherence, without breaking the bank.
It's not perfect yet. We're constantly refining prompts, adjusting style modifiers, and expanding the cache. But every day, the visual experience gets richer. From text, a world emerges.
Related Posts

Building an AI Game Master from Scratch
The journey of creating an AI that can run tabletop RPG sessions — the challenges, breakthroughs, and design philosophy behind Embertold's Game Master.

Designing the Credit System: Free Gameplay, Premium Immersion
How we built a monetization model that keeps core gameplay free while funding the expensive AI features that make Embertold special.

The Tools of a Digital Dungeon Master
The AI Game Master has over a dozen specialized tools at its disposal. Here's how it uses them to run your adventure.