ImmerseGen

Agent-Guided Immersive World Generation with Alpha-Textured Proxies

Jinyan Yuan1*    Bangbang Yang1*    Keke Wang1    Panwang Pan1    Lin Ma1   
Xuehai Zhang1    Xiao Liu1    Zhaopeng Cui2    Yuewen Ma1   
1PICO, Bytedance    2State Key Laboratory of CAD&CG, Zhejiang University
*Equal Contribution   
Image

ImmerseGen creates panoramic 3D worlds from input prompts by generating compact alpha-textured proxies through agent-guided asset design and arrangement, alleviating the reliance on rich and complex assets while ensuring diversity and realism, which is tailored for immersive VR experience.

Video
Gallery
*Screen Recording from VR Headset

Island

Desert

Lake

Anime Room

Forest

Futuristic City

Framework
Image
Given a user's textual input, our method first retrieve a base terrain and apply terrain-conditioned texturing to synthesize RGBA terrain texture and skybox aligned with base mesh, forming the base world. Next, We enrich the environment by introducing lightweight assets, where VLM-based asset agents are used to select appropriate templates, design detailed asset prompts and determine asset arrangement within the scene. Each placed asset is then instantiated as alpha-textured assets through context-aware RGBA texture synthesis. Finally, we enhance multi-modal immersion by incorporating dynamic visual effects and synthesized ambient sound based on the generated scene.