On Tuesday, Microsoft Research Asia unveiled VASA-1, an AI model that can create a synchronized animated video of a person talking or singing from a single photo and an existing audio track. In the future, it could power virtual avatars that render locally and don’t require video feeds—or allow anyone with similar tools to take a photo of a person found online and make them appear to say whatever they want.
This is impressive and terrifying as hell… I’m pretty tech savvy and have watched other AI videos, but if you presented this to me as real, I’d totally believe it! Especially I can imagine it being spliced with B roll footage and maybe “real” footage to make it even more believable. I’m floored…