Alternate views of the same speaker, would work especially well if it's a long shot of the speaker and they turn their head and the AI can see from different angles. As for the room/environment, if theres a built-in depth map option we can easily darken/blur the background, which would help the inevitable problems with generating background visuals