Deepfake Mandarin-speaking Taylor Swift goes viral in China, prompting mixed reactions

Business & Technology

Taylor Swift can do everything, including speak Chinese. Except she canโ€™t: The Mandarin-language video is a product of artificial intelligence technology from HeyGen, a company founded in Shenzhen but now based in Los Angeles.

Taylor Swift is a woman of many talents. She writes her own songs, sells out stadiums around the world, and is about to make her feature directorial debut with a script she developed herself. Butโ€ฆcan she also speak fluent Mandarin?

โ€œIโ€™ve been to many places lately, such as Italy, France, and Japan,โ€ she says in Chinese in one clip, which appears to be from a late-night talk show. In another video, the American singer-songwriter flaunts her Mandarin skills again, talking about songs that got โ€œleft behindโ€ and kept her thinking, โ€œWhat could have happened? I wish people could hear this.โ€

Alas. Despite the face, the voice, and seemingly flawless lip movements, it wasnโ€™t really Taylor Swift. The videos, which started popping up last week on Chinese social media platforms like Weibo and Douyin, featured an artificial-intelligence-generated doppelgรคnger.

โ€œThis is really impressive,โ€ a Weibo user commented on a video where the fake Swift talks about her recent travels, which has been viewed more than 6 million times since it was shared on October 21. โ€œWhatever technology this video uses is going to put translators and voice actors out of a job,โ€ said another person.

But not everyone was enthused about the deepfake. Some expressed dismay at the potential of such realistic AI manipulation, with one writing, โ€œThis is actually scary.โ€ Others raised questions about how best to prevent malicious misuse of the deceptive technology. โ€œI can see this being used nefariously in myriad ways, and it will cause trouble if awful people like scammers get their hands on it.โ€

Built on a form of artificial intelligence technology called deep learning, computer-generated deepfakes are convincing videos, photos, or audio produced with the end goal of portraying something that didn’t actually occur in reality. In a fad that took Chinese social media by storm in May, fan-made tracks using deepfaked vocals from major recording artists racked up millions of listens, and even led to an unlikely revival of interest in Stefanie Sun (ๅญ™็‡•ๅงฟ Sลซn Yร nzฤซ), a Singaporean singer-songwriter who hasnโ€™t released an album since 2018.

But the work that went into the deepfake Taylor Swift videos is even more complicated, as it involved translation, voice cloning, and lip syncing. According to local Chinese media, the tool behind these insanely realistic clips is a product developed by HeyGen, a Chinese startup previously known as Surreal and Movio before rebranding earlier this year.

HeyGen was founded in November 2020 by Joshua Xu (ๅพๅ“ Xรบ Zhuล), who worked for six years as an engineer at Snap. HeyGen, which launched in Shenzhen and is now headquartered in Los Angeles, immediately attracted interest from investors eager to back the next big thing in artificial intelligence. With its tool that allows online merchants to create promotional videos narrated by synthesized humans, the startup managed to secure a seed round of $2 million to $3 million from two major investors, Sequoia China and ZhenFund, only three months after its establishment.

As of last year, the startup has raised around $9 million, with a user base of more than 100,000 and nearly 1,000 paying customers, according to TechCrunch.

HeyGenโ€™s โ€œVideo Translateโ€ tool, which created the Swift deepfakes, is capable of translating footage into 14 different languages โ€” including Mandarin, Hindi, and Arabic โ€” and can clone the speakerโ€™s voice and sync the personโ€™s lips in an โ€œauthentic speaking style,โ€ according to the companyโ€™s website. Its demo video, which showed a fake Elon Musk speaking French, prompted the real tech billionaire to comment on X, โ€œInteresting.โ€

On X, Xu described the program as a pivotal tool for YouTubers and those in the education sector. โ€œThink about it: breaking down language barriers makes content accessible to the entire globe, not just the 10% who speak English,โ€ he wrote. โ€œWhat if there is a platform where every video can be viewed in any language with native-like fluency? It’s more than just a translation feature; it’s a new paradigm for content consumption.โ€

On Weibo, fans of the tool praised it for its exciting potential to improve dubbing in foreign films, as it is able to match the movements of an actorโ€™s mouth with their translated speech in Chinese. Others also pointed out that it could be an AI-powered solution to revolutionize Chinese ecommerce brands, which have found it difficult to reach global audiences due to a lack of dual-lingual livestreamers.

With its move to the U.S. last October, HeyGen is no longer subject to Chinaโ€™s deepfake rules, which went into effect in January. As one of the first governments to regulate hyper-realistic, AI-generated media, Beijing requires companies to obtain consent from individuals whose likenesses are being manipulated; deepfakes need to be labeled as such on the internet, and canโ€™t be used for purposes deemed harmful (vaguely defined) to national security or the economy.

However, as reported by NBC News in April, the law failed to eliminate China-based developers that have apps enabling the creation of deepfake pornography, as they remain available to download on app stores outside the country.