Yesterday marked an interesting development in the gaming world as Microsoft Xbox unveiled Muse, their new generative AI model focused on “gameplay ideation.” Accompanying this debut was an open-access article on Nature.com, a detailed blog post, and a YouTube video introduction. You might be wondering what exactly “gameplay ideation” entails. Well, according to Microsoft, it involves generating elements like game visuals and controller actions. However, despite its intriguing name, Muse’s practical applications are currently quite limited, and it doesn’t bypass the traditional game development process.
Nevertheless, there’s some interesting data to consider. The AI was trained extensively using high-powered H100 GPUs. It took approximately one million training iterations to stretch a mere second of real gameplay into an additional nine seconds of detailed simulated gameplay. The training data mainly came from existing multiplayer gameplay sessions.
Microsoft didn’t just run this on a single computer. They needed a massive setup with a cluster of 100 Nvidia H100 GPUs, which was a costly and power-intensive endeavor. Even so, this impressive hardware only managed to generate an output resolution of 300×180 pixels for the extra nine seconds of gameplay.
One of the more curious demonstrations of Muse involved copying existing game props and enemies, and replicating their functionalities. It raises the question, though: why employ all this technology and expense to achieve what development tools can already do—like spawning enemies or props?
Muse does succeed in some areas, maintaining object permanence and duplicating the behaviors from the original game. But when weighed against conventional game development processes, it feels like an expensive detour rather than a revolutionary pathway.
As Muse continues to develop, perhaps it will be capable of more remarkable achievements. Yet, for now, it joins the ranks of many projects attempting to replicate gameplay through AI alone. There is a degree of engine precision and object awareness, but the method feels so inefficient that it’s hard to see its appeal. Even after exploring this topic extensively, it’s still puzzling why someone might choose this approach over established techniques.