Vocal deepfakes of famous singers excite Mandopop listeners, but raises questions about ethics and legality
In recent months, advances in generative AI have allowed softwares to create songs with vocals that sound nearly identical to popular artists. Generators can capture the subtle differences in timbre that help people identify someone's unique vocal profile.
The biggest breakout singer of 2023 in China is someone who hasnβt released an album since 2018. In recent weeks, Stefanie Sun (εηε§Ώ SΕ«n YΓ nzΔ«), a Singaporean singer-songwriter with a career spanning more than two decades in the entertainment industry, has been crowned βthe queen of coversβ by fans of Mandarin pop. On Chinese social media, clips of Sun singing other artistsβ songs have emerged in droves since April, with some racking up millions of listens and leading to an unlikely revival of interest in the singer, who has been largely out of the public eye in recent years.
There is a catch, though: Sun didnβt actually perform any of these viral covers. Rather, they were all generated using artificial intelligence tools capable of imitating Sunβs vocals. For Mandopop listeners and tech-savvy content creators in China, Sunβs AI voice clone is a powerful tool that allows them to turn their dream collaborations into a reality. But in the music business, the rapid mainstreaming of passable voice-emulating filters has been perceived as a threat by some β especially singers, songwriters, and record labels β who have vocalized their concerns about the legal and creative risks to come.
WΔng JΓ¬ngyΓ ζ±ͺιζ‘, a 33-year-old marketing manager in Shanghai, was in disbelief when she first heard AI-generated Sun performing βHair Like Snowβ (εε¦ιͺ fΓ rΓΊxuΔ), a popular song released in 2015 by Mandopop icon Jay Chou (ε¨ζ°δΌ¦ ZhΕu JiΓ©lΓΊn). βA friend of mine told me to check it out and I was in doubt at first. I was expecting something awkwardly robotic and eerily jarring,β Wang told The China Project. But to her surprise, the track was βridiculously good, almost too good that itβs scary.”
But Wang is already late to the frenzy of fan-made AI covers on the Chinese internet. The track she listened to β originally uploaded on Chinaβs youth-oriented video-sharing platform Bilibili on April 13 β was the first tune of such kind to reach virality, and the most popular one characterizing the trend. Almost immediately after it was dropped, the clip, which has been played more than 2 million times as of today, dominated Bilibiliβs music section, inspiring a slew of content creators to try their hands on AI-powered musical manipulation.
On Bilibili, a search for βAI Stefanie Sunβ now brings up tens of hundreds of videos shared in the past few weeks. Thereβs fake Stefanie Sun singing βSilence is Goldenβ (ζ²ι»ζ―ι chΓ©nmΓ²shΓ¬jΔ«n) by deceased Hong Kong superstar Leslie Cheung (εΌ ε½θ£ ZhΔng GuΓ³rΓ³ng), dueting with Jay Chou on his βPeninsula Ironboxβ (εε²ιη bΓ ndΗotiΔhΓ©), and even grooving over K-pop group Fifty Fiftyβs TikTok viral hit βCupid.β
Sun is not the only Mandopop singer to fall victim to β or benefit from, depending on who you ask β the ongoing craze. Jay Chou, AI Stefanie Sunβs favorite musician to cover, has also had his fair share of AI-generated sound-alikes, along with Taiwanese singer Cyndi Wang (ηεΏε WΓ‘ng XΔ«nlΓng), Mandopop’s enduring diva Faye Wang (ηθ² WΓ‘ng FΔi), and Malaysian pop ballad singer Fish Leong (ζ’ιθΉ LiΓ‘ng JΓ¬ngrΓΊ).
βI spent an entire weekend going down this rabbit hole of faux songs and had lots of fun,β Wang said. βI donβt really know how these were made and I donβt really see myself adding any of them to my playlist. But itβs kinda fun to judge those AI-generated vocals for how close they sound to the singers they are supposed to mimic.β
Another example of the rapid advancements in AI
Built on a form of artificial intelligence technology called deep learning, computer-generated deepfakes are convincing videos, photos, or audio produced with the end goal of portraying something that didn’t actually occur in reality. As a relatively new type of content to emerge from the proliferation of generative AI β technology that forms texts, images, or sounds based on data it is fed β deepfakes first gained mainstream attention in 2017 when a Reddit user posted videos of famous people in fabricated sexual encounters, leading to a spate of falsified videos being created and shared on the internet.
For a long time, deepfakes were unable to infiltrate the music industry, as it was a tall task for computers to successfully synthesize someone’s voice and capture all of its complexities. But in recent months, advances in generative AI have allowed softwares to create songs with vocals that sound nearly identical to popular artists. The most powerful AI singing generators can take a set of audio recordings of a real singer and reproduce the exact likeness of that personβs voice, which includes all the subtle differences in timbre that help people identify someone’s unique vocal profile.
For makers of deepfake audio on Bilibili, where there are plenty of videos explaining the creation process, the preferred tool is SoftVC VITS Singing Voice Conversion, or So-Vits-SVC, a free, open-source software developed on GitHub. According to some tutorial videos, to create a simple cover song, users only need a computer with a decent GPU and some devoted time. For someone adept at the software, it can take as little as a few days to train a brand-new vocal model based on a specific singer and produce various tracks using the resulting artificial voice.
On Bilibili, in the comments under faux cover songs, listeners said they were impressed. βThis is actually so good!β one person marveled, while another one wrote, βIβve waited my whole life to hear this.β
And itβs not all about cultural novelty and technical innovation. Paying tribute to their favorite singers who have passed away, some creators have used the technology to bring their voices back to life. Under a cover of βLong Time No Seeβ (ε₯½δΉ δΈθ§ hΗojiΗbΓΉjiΓ n) by Hong Kong singer Eason Chen (ιε₯θΏ ChΓ©n YΓ¬xΓΉn), which featured vocals of Leslie Cheung, who committed suicide in 2013, sentimental notes by fans of the deceased pop star took over the comment section. βIβve had this on repeat for several days. When I played it for the first time, tears instantly came down my face. I wish he was still here with us,β a Bilibili user wrote.
For Wang, a longtime fan of Stefanie Sun, AI-voiced songs are her temporary fix as the singer continues her years-long hiatus from making new music. βMany of Sunβs songs are like background music to my teenage years and youth. Thereβs a certain kind of comfort in her voice because itβs so familiar and is attached to so many memories I have,β Wang said. Admitting that sheβs out of touch with the tunes that are popular among members of Gen Z and are dominating music charts nowadays, Wang added, βListening to all these legendary singers from my generation just brings me back to what I think is the golden age of Mandopop.β
Deepfake singing around the world
Mandopop singers aren’t unique in experiencing their deepfake audio moment. In April, an AI-generated rap song called βHeart on My Sleeveβ was produced by an anonymous musician who replicated the voices of the artists Drake and The Weeknd. It became a viral hit, inspiring thousands of derivative TikToks and racking up millions of listeners across various platforms.
However, just as quickly as the song took off, it was removed from virtually every mainstream streaming service after Universal Music Group (UMG), the major label that usually profits from Drake and Weeknd songs, expressed its displeasure. In its statement, UMG said βthe training of generative AI using our artists’ musicβ represented βboth a breach of our agreements and a violation of copyright law.β Around the same time, the music publisher also told streaming services, including Spotify and Apple, to block AI companies from scraping melodies and lyrics from their copyrighted songs. Platforms had a “legal and ethical responsibility to prevent the use of their services in ways that harm artists,β UMG wrote in its complaint.
The incident is just the sign of a coming storm, argued many within the music industry. As more tracks like βHeart on My Sleeveβ keep appearing every day, music labels and artists have felt increasingly threatened by AI’s bold, creative possibilities, and concerned about the ethical and legal risks to come.
During a recent interview, American rapper Ice Cube described AI as being “demonic” and said there would be a “backlash” against it from “real people.” But some musicians have responded more favorably to the mainstreaming of AI across the industry. Canadian producer and pop singer Grimes, who has long been a proponent of technological experimentation in musical creation, has publicly endorsed AI-powered voice cloning. Earlier this month, she launched a new piece of software that allows users to replicate her voice in their songs in exchange for 50 percent of the track’s royalties.
With this gray-area genre of music exploding in popularity on Bilibili, the debate about ethics and legality has also reached China. In the past few weeks, a barrage of think pieces on this subject have appeared in local media, with the consensus for now being that itβs a murky area as Chinaβs copyright law doesnβt cover AI technologies. Although Chinaβs draft guidelines on generative AI services, which are currently under final review after public consultation, have a section about copyright, ambiguous phrasing leaves many questions unanswered. For example, the proposed rules fail to βexplain what uses of training material by AI tools are considered an infringement,β according to China Law Translate, a crowdsourcing project that translates Chinese laws.
To avoid legal trouble regarding AI-generated media, some Chinese apps have already shifted the accountability to users of their platforms. For example, Douyin, the Chinese sibling of TikTok, unveiled a set of rules on AI-generated content earlier this month, requiring hobbyist creators to put distinguishing labels on all realistic AI deepfakes and take full responsibility for their work.
For now, there hasnβt been a major backlash against AI voice cloning in the Chinese music industry. And Wang isnβt so worried about her favorite deepfake Stefanie Suns being suddenly yanked from the internet in the near future, because the singer herself βseems to have thrown in the towel and let the AI wave wash her over,β she said, referring to Sunβs first public response to the phenomenon on May 22.
βAt this point, I feel like a popcorn eater with the best seat in the theatre,β the Singaporean singer wrote on her blog. βI mean really, how do you fight with someone who is putting out new albums in the time span of minutes.β