Vocal deepfakes of famous singers excite Mandopop listeners, but raises questions about ethics and legality

Society & Culture

In recent months, advances in generative AI have allowed softwares to create songs with vocals that sound nearly identical to popular artists. Generators can capture the subtle differences in timbre that help people identify someone's unique vocal profile.

A screenshot of video featuring AI versions of Stefanie Sun on Bilibili

The biggest breakout singer of 2023 in China is someone who hasn’t released an album since 2018. In recent weeks, Stefanie Sun (孙燕姿 SΕ«n YΓ nzΔ«), a Singaporean singer-songwriter with a career spanning more than two decades in the entertainment industry, has been crowned β€œthe queen of covers” by fans of Mandarin pop. On Chinese social media, clips of Sun singing other artists’ songs have emerged in droves since April, with some racking up millions of listens and leading to an unlikely revival of interest in the singer, who has been largely out of the public eye in recent years.

There is a catch, though: Sun didn’t actually perform any of these viral covers. Rather, they were all generated using artificial intelligence tools capable of imitating Sun’s vocals. For Mandopop listeners and tech-savvy content creators in China, Sun’s AI voice clone is a powerful tool that allows them to turn their dream collaborations into a reality. But in the music business, the rapid mainstreaming of passable voice-emulating filters has been perceived as a threat by some β€” especially singers, songwriters, and record labels β€” who have vocalized their concerns about the legal and creative risks to come.

Wāng JΓ¬ngyΓ­ ζ±ͺ静怑, a 33-year-old marketing manager in Shanghai, was in disbelief when she first heard AI-generated Sun performing β€œHair Like Snow” (发如ι›ͺ fΓ rΓΊxuΔ›), a popular song released in 2015 by Mandopop icon Jay Chou (周杰伦 Zhōu JiΓ©lΓΊn). β€œA friend of mine told me to check it out and I was in doubt at first. I was expecting something awkwardly robotic and eerily jarring,” Wang told The China Project. But to her surprise, the track was β€œridiculously good, almost too good that it’s scary.”

But Wang is already late to the frenzy of fan-made AI covers on the Chinese internet. The track she listened to β€” originally uploaded on China’s youth-oriented video-sharing platform Bilibili on April 13 β€” was the first tune of such kind to reach virality, and the most popular one characterizing the trend. Almost immediately after it was dropped, the clip, which has been played more than 2 million times as of today, dominated Bilibili’s music section, inspiring a slew of content creators to try their hands on AI-powered musical manipulation.

On Bilibili, a search for β€œAI Stefanie Sun” now brings up tens of hundreds of videos shared in the past few weeks. There’s fake Stefanie Sun singing β€œSilence is Golden” (ζ²‰ι»˜ζ˜―ι‡‘ chΓ©nmΓ²shΓ¬jΔ«n) by deceased Hong Kong superstar Leslie Cheung (张国荣 Zhāng GuΓ³rΓ³ng), dueting with Jay Chou on his β€œPeninsula Ironbox” (εŠε²›ι“η›’ bΓ ndǎotiΔ›hΓ©), and even grooving over K-pop group Fifty Fifty’s TikTok viral hit β€œCupid.”

Sun is not the only Mandopop singer to fall victim to β€” or benefit from, depending on who you ask β€” the ongoing craze. Jay Chou, AI Stefanie Sun’s favorite musician to cover, has also had his fair share of AI-generated sound-alikes, along with Taiwanese singer Cyndi Wang (ηŽ‹εΏƒε‡Œ WΓ‘ng XΔ«nlΓ­ng), Mandopop’s enduring diva Faye Wang (ηŽ‹θ² WΓ‘ng FΔ“i), and Malaysian pop ballad singer Fish Leong (ζ’ι™θŒΉ LiΓ‘ng JΓ¬ngrΓΊ).

β€œI spent an entire weekend going down this rabbit hole of faux songs and had lots of fun,” Wang said. β€œI don’t really know how these were made and I don’t really see myself adding any of them to my playlist. But it’s kinda fun to judge those AI-generated vocals for how close they sound to the singers they are supposed to mimic.”

Another example of the rapid advancements in AI

Built on a form of artificial intelligence technology called deep learning, computer-generated deepfakes are convincing videos, photos, or audio produced with the end goal of portraying something that didn’t actually occur in reality. As a relatively new type of content to emerge from the proliferation of generative AI β€” technology that forms texts, images, or sounds based on data it is fed β€” deepfakes first gained mainstream attention in 2017 when a Reddit user posted videos of famous people in fabricated sexual encounters, leading to a spate of falsified videos being created and shared on the internet.

For a long time, deepfakes were unable to infiltrate the music industry, as it was a tall task for computers to successfully synthesize someone’s voice and capture all of its complexities. But in recent months, advances in generative AI have allowed softwares to create songs with vocals that sound nearly identical to popular artists. The most powerful AI singing generators can take a set of audio recordings of a real singer and reproduce the exact likeness of that person’s voice, which includes all the subtle differences in timbre that help people identify someone’s unique vocal profile.

For makers of deepfake audio on Bilibili, where there are plenty of videos explaining the creation process, the preferred tool is SoftVC VITS Singing Voice Conversion, or So-Vits-SVC, a free, open-source software developed on GitHub. According to some tutorial videos, to create a simple cover song, users only need a computer with a decent GPU and some devoted time. For someone adept at the software, it can take as little as a few days to train a brand-new vocal model based on a specific singer and produce various tracks using the resulting artificial voice.

On Bilibili, in the comments under faux cover songs, listeners said they were impressed. β€œThis is actually so good!” one person marveled, while another one wrote, β€œI’ve waited my whole life to hear this.”

And it’s not all about cultural novelty and technical innovation. Paying tribute to their favorite singers who have passed away, some creators have used the technology to bring their voices back to life. Under a cover of β€œLong Time No See” (ε₯½δΉ…不见 hǎojiΗ”bΓΉjiΓ n) by Hong Kong singer Eason Chen (ι™ˆε₯•θΏ… ChΓ©n YΓ¬xΓΉn), which featured vocals of Leslie Cheung, who committed suicide in 2013, sentimental notes by fans of the deceased pop star took over the comment section. β€œI’ve had this on repeat for several days. When I played it for the first time, tears instantly came down my face. I wish he was still here with us,” a Bilibili user wrote.

For Wang, a longtime fan of Stefanie Sun, AI-voiced songs are her temporary fix as the singer continues her years-long hiatus from making new music. β€œMany of Sun’s songs are like background music to my teenage years and youth. There’s a certain kind of comfort in her voice because it’s so familiar and is attached to so many memories I have,” Wang said. Admitting that she’s out of touch with the tunes that are popular among members of Gen Z and are dominating music charts nowadays, Wang added, β€œListening to all these legendary singers from my generation just brings me back to what I think is the golden age of Mandopop.”

Deepfake singing around the world

Mandopop singers aren’t unique in experiencing their deepfake audio moment. In April, an AI-generated rap song called β€œHeart on My Sleeve” was produced by an anonymous musician who replicated the voices of the artists Drake and The Weeknd. It became a viral hit, inspiring thousands of derivative TikToks and racking up millions of listeners across various platforms.

However, just as quickly as the song took off, it was removed from virtually every mainstream streaming service after Universal Music Group (UMG), the major label that usually profits from Drake and Weeknd songs, expressed its displeasure. In its statement, UMG said β€œthe training of generative AI using our artists’ music” represented β€œboth a breach of our agreements and a violation of copyright law.” Around the same time, the music publisher also told streaming services, including Spotify and Apple, to block AI companies from scraping melodies and lyrics from their copyrighted songs. Platforms had a “legal and ethical responsibility to prevent the use of their services in ways that harm artists,” UMG wrote in its complaint.

The incident is just the sign of a coming storm, argued many within the music industry. As more tracks like β€œHeart on My Sleeve” keep appearing every day, music labels and artists have felt increasingly threatened by AI’s bold, creative possibilities, and concerned about the ethical and legal risks to come.

During a recent interview, American rapper Ice Cube described AI as being “demonic” and said there would be a “backlash” against it from “real people.” But some musicians have responded more favorably to the mainstreaming of AI across the industry. Canadian producer and pop singer Grimes, who has long been a proponent of technological experimentation in musical creation, has publicly endorsed AI-powered voice cloning. Earlier this month, she launched a new piece of software that allows users to replicate her voice in their songs in exchange for 50 percent of the track’s royalties.

With this gray-area genre of music exploding in popularity on Bilibili, the debate about ethics and legality has also reached China. In the past few weeks, a barrage of think pieces on this subject have appeared in local media, with the consensus for now being that it’s a murky area as China’s copyright law doesn’t cover AI technologies. Although China’s draft guidelines on generative AI services, which are currently under final review after public consultation, have a section about copyright, ambiguous phrasing leaves many questions unanswered. For example, the proposed rules fail to β€œexplain what uses of training material by AI tools are considered an infringement,” according to China Law Translate, a crowdsourcing project that translates Chinese laws.

To avoid legal trouble regarding AI-generated media, some Chinese apps have already shifted the accountability to users of their platforms. For example, Douyin, the Chinese sibling of TikTok, unveiled a set of rules on AI-generated content earlier this month, requiring hobbyist creators to put distinguishing labels on all realistic AI deepfakes and take full responsibility for their work.

For now, there hasn’t been a major backlash against AI voice cloning in the Chinese music industry. And Wang isn’t so worried about her favorite deepfake Stefanie Suns being suddenly yanked from the internet in the near future, because the singer herself β€œseems to have thrown in the towel and let the AI wave wash her over,” she said, referring to Sun’s first public response to the phenomenon on May 22.

β€œAt this point, I feel like a popcorn eater with the best seat in the theatre,” the Singaporean singer wrote on her blog. β€œI mean really, how do you fight with someone who is putting out new albums in the time span of minutes.”