Below is a complete transcript of the Sinica Podcast with Kendra Schaefer and Jeremy Daum.
Kaiser Kuo: Welcome to the Sinica Podcast, a weekly discussion of current affairs in China, produced in partnership with The China Project. Subscribe to Access from The China Project to get access. Access to, not only our great daily newsletter, but to all of the original writing on our website at thechinaproject.com. We’ve got reported stories, essays and editorials, great explainers and trackers, regular columns, and of course, a growing library of podcasts. We cover everything from China’s fraught foreign relations to its ingenious entrepreneurs, from the ongoing repression of Uyghurs and other Muslim peoples in China’s Xinjiang region, to China’s ambitious plans to shift the Chinese economy onto a post-carbon footing. It’s a feast of business, political, and cultural news about a nation that is reshaping the world. We cover China with neither fear nor favor.
I’m Kaiser Kuo, coming to you from Chapel Hill, North Carolina.
_________
Earlier this month, the Cyberspace Administration of China published a draft, now in its public comment phase, of regulations that would govern generative AI in China. Generative AI is, as I trust most listeners to this show are by now well aware, well, I’ll let a generative AI large language model tell you. I asked for a succinct definition from ChatGPT version 4, and here’s what it says.
Bot: Generative AI is a subset of artificial intelligence that focuses on creating new content or data by learning patterns from existing data. It employs algorithms such as deep learning models to generate outputs in the form of text, images, audio, or other data types, often mimicking humanlike creativity and decision-making capabilities.
Kaiser: So, think ChatGPT, obviously, Google’s Bard, Bing’s Chat, things like DALL-E and Stable Diffusion, which create images. Less familiar, but to me equally impressive, are AI music generators like the one I used to create this rather unimpressive background music you’re hearing right now, which I gave the very uncreative name Generative AI PG. Okay. Anyway, today, to talk about the new rules on generative AI from the CAC, I am joined by Jeremy Daum of the Yale-China Center, who published a translation of the new draft regulations on his incredibly useful website, China Law Translate. Jeremy Daum, welcome back to Sinica. And I need to ask, did you use ChatGPT at all in the translation of the regulations?
Jeremy Daum: I did not. I have played around with it in the past, though.
Kaiser: Okay. Good, good, good. All right. Also, joining for the first time of what I hope will be many appearances on Sinica is Kendra Schaefer, who is a partner at the Beijing-based strategic advisory consultancy, Trivium, where she’s the resident tech person. She is someone whose perspectives on China’s tech scene I’ve found to be immensely valuable. Kendra joins us from Portland, Oregon. Welcome to Sinica, Kendra.
Kendra Schaefer: Hi, Kaiser. Happy to be here.
Kaiser: Yeah, well, we’re very, very happy to have you, both of you, and I know the two of you know each other and have worked together, so I look forward to this conversation. Before we go on, though, I want to give a big shout out to the DigiChina team at Stanford, headed up by my good friend Graham Webster, who also did a translation of the new draft regulations. And they also have a kind of round table discussion. It was called a DigiChina Forum, which I highly recommend, which is all about the draft regulations. Really smart folks like Helen Toner who’s at CSET, who’s actually on the board of OpenAI. A go-to lawyer in Beijing at Covington Berlin named Luo Yan, who I’ve had the pleasure of meeting, who’s brilliant โ Rogier Creemers, Matt Sheehan over at Carnegie. Anyway, you get an idea. Definitely check it out.
I don’t know whether you got a peek at that, you guys, but it came out after we’ve already scheduled this recording session. Anyway, some of the observations are very interesting. Maybe we can talk about that. Listeners, you might want to check that out before listening to the conversation. And also, of course, read the draft regulation in translation by Jeremy. You might also want to check out a piece by Paul Triolo on The China Project’s website. It’s called ChatGPT and China: How to think about Large Language Models and the generative AI race. And Paul also offers some of the comments in that DigiChina forum. So, you guys โ this draft is clearly in dialogue, as they say, with other important pieces of legislation, and particularly with the Personal Information Privacy law and with the 2022 regulation on recommendation algorithms with its registry of recommendation algorithms.
Jeremy, as you pointed out in your excellent overview of the regs, its direct ancestor is a set of regulations called provisions on the administration of deep synthesis internet information services, which you translated back in March. Now, this hasn’t gotten all that much attention, in part because I think people don’t immediately know what CAC means by deep synthesis. Looking at that definition, which you hopefully provided in the addendum to those regulations, it’s translated there, I’m not sure that there isn’t much that doesn’t overlap with generative AI. What is the difference between deep synthesis internet services and generative AI?
Jeremy: I don’t think there is much of a difference from the way that they’re defined. You could argue for a little bit about the scope of each of them in that the earlier regs are referring only to services provided through the internet and the generative AI models are maybe a subset of those deep synthesis, but they’re both about artificial intelligence or computer-generated content. That can be media in any form. It can be text, images, voice, video, and all of the above. So, they’re really going at the same thing. It’s kind of surprising in that sense that we had an update to those rules so quickly after they took effect. And those earlier rules took effect in January this year.
Kaiser: Yeah. So, maybe we can talk a little bit about the difference in focus of the two sets of regs โ one maybe more focused on IP violations than on deep fakery and things like that, and the other more on sort of content regulation. Kendra, what did you make of the difference?
Kendra: Yeah, I had an aha moment when I was kind of looking at the new regulations. First of all, I just want to pull back a little bit for your listeners and outline the three regulations that we’re talking about, right? Because of the spirit, what was going on in China at the time that those regulations were released and what was going on in the policy conversation more broadly is incredibly important. The first major set of regulations on AI, arguably speaking, were these, you’ve talked about it before, these rules on recommendation engines. And they took a really broad lens. Usually, when people think of recommendation engines, we think of two things. We think of e-commerce recommendations, and then we think of recommendations for ads or content on social media platforms, right? That’s what we think of.
Kaiser: Sure.
Kendra: But those regulations took an actually broader view, looking at the core function of how the algorithm works. What I mean by that is they also included any kind of algorithm that, for example, determined worker delivery schedules.
Kaiser: Oh, wow.
Kendra: So, when they said recommendation algorithms, what they were looking at was an algorithm that constantly takes, and this isn’t exactly from the regulation text, but this is basically what they were tackling, any algorithm that takes constantly updated data from a large subset of users of some kind, whether that be delivery drivers on a restaurant delivery platform like Meituan, whether that is ride-hailing drivers, whether that is people shopping on an e-commerce site.
Kaiser: Or just people swiping up and down on Douyin.
Kendra: Right. Exactly. Anything that’s taking this constant feat of information from people using the platform, and then the machine makes a recommendation about that. The policy conversation at the time was really focused on information cocoons, disinformation, companies that were abusing algorithms to infringe on workers’ rights and labor rights โ delivery driver A makes a delivery and gets from point A to point B in 15 minutes by running a bunch of red lights. Now the second driver who makes that delivery is expected to do the same by the algorithm, right? Not enough rest time โ all this kind of stuff, including also ad engines, what’s allowed to be distributed through content recommendations, etc. What’s interesting and notable there is it’s looking at the function of the algorithm. The function of the algorithm is what got regulated, which is pretty interesting. Then you move on to the second regulation that you just mentioned, which is the second one that came out that surprised everyone, which was a regulation focused on deep synthesis technologies. In common parlance, what they meant by that is deep fakes.
Kaiser: Right.
Kendra: The background to that regulation was that there was all of this concern, the conversation was happening in the U.S. and Europe, it was also happening in China, what happens when machines start, I don’t know, making videos of Xi Jinping singing the U.S. National Anthem or something, right? Or you’ve got news misinformation proliferating, you have revenge pornography proliferating, that kind of stuff. There was a big concern that users and viewers wouldn’t know that content was generated by a machine. So, they’d be tricked. And there was a concern it would cause social instability issues. That was the background of the deep synthesis regulation. Now, again, they didn’t just say no deep fake videos. They looked at the function of the algorithm. They took a broader lens. They said any AI editing of content without human intervention essentially, right? Video, audio, text language, and they included chatbots in that bucket.
But the spirit of that regulation was not this. It was not generative AI because that wasn’t a thing yet really on anybody’s radar. That brings us to the prison iteration of this regulation. Suddenly ChatGPT emerges, it’s a big deal. And now there’s concern or a question, does ChatGPT fit under the deep fake regulation? It clearly did, but I think they wanted to underscore the fact that this alsoโฆ There were other considerations about ChatGPT, which we’ll get into in a second, I’m sure, about the training data for generative AI that was not included in the deep fake regulation. That’s really the kind of background, but what’s fascinating in China is that regulators are looking at algorithmic function, which takes a broader lens than looking at the end application, like a video or a text or a chatbot, right? So, it’s pretty broad.
Kaiser: When I look at recent legislation that’s come out, recent regulations that have come out governing different aspects of life in our high-tech world, I think it’s fair to say that not many other governments have been as on top of things, as focused on getting out regulations. You look at PIPL, I mean, I don’t know this myself, to be clear, but I’ve heard from smart people that it takes a lot from Europe’s GDPR, but is in some ways actually more robust, even in its protections, at least on paper. I am not aware of other deep synthesis laws or regulations in other major markets, or, for that matter, algorithm laws. Of course, this is certainly the first set of major nation regulations governing generative AI. What can we generally say when we step back and look at these steps that the regulators are taking? I mean, it doesn’t seem like a clumsy ossified bureaucracy at all. CAC seems like it’s fairly nimble and fairly on top of it.
Jeremy: In some ways, they’re nimble, but that’s because they’re willing to walk some of it back and they like to put out rules to get ahead of the problems that are coming forward. But they’re always, in all of these regulations, dealing with essentially the same problems just as new tech applies to them. I find myself thinking back on the now much maligned quote from Bill Clinton back in the day about trying to regulate the internet being, like trying to nail Jell-O to a wall.
Kaiser: Sure.
Jeremy: People often scoff and say, โLook, China’s done so well with censorship. He really had that wrong.โ But no, I think he had it right. What we see is China constantly trying to nail that Jell-O to the wall as the internet takes on new forms, new shapes. We’ve seen a stream, not just with AI and algorithms as Kendra laid out, but as livestreaming became big. We saw things on the comment sections where people could interact more freely on articles and publish non-news content, instant messengers, algorithms, AI, e-commerce. We see separate rules put out on all of these as we see how they’re going to lead to those core concerns of the government about content that can cause social instability, misinformation, and more recently, protections of personal information and data security.
Kaiser: Yeah. Yeah. Yeah.
Kendra: Yeah. I think Jeremy made a very strong point. Just to underscore that, two things to note. I’m sure many of your listeners already know this, but functionally speaking, China’s a one-party system. What that basically means is that they’re free to put out a regulation and then go back and revise it without having to argue with somebody every time that they go back and revise it, right? That allows policy to go out the door quickly that is not done, or that is very broad, or that is in essence a statement of intent with no strong mechanism in place yet for enforcing it. And then when you look at, for example, Europe or the United States, it takes years to formulate this stuff and finalize it, because once it’s out the door, executing on a revision is a very long and difficult process with lots of back and forth and debate.
I always like to say that Chinese regulators like to fail fast and often, right? They act a little bit like startups in the sense that they push something out โ oh, it’s not working; oh, we need this supporting regulation; okay, roll that out; okay. You know what? Make a revision. And so they do iterate on regulation very, very quickly. I think that’s what we’re seeing here in terms of speed.
Kaiser: They don’t let the perfect be the enemy of the good.
Kendra: Right.
Jeremy: I would just add that while a lot of new laws come out, especially in emerging areas, they try to leave space for that new industry to develop. Often it is aspirational with just a lot of principles laid out. But for speech, this is an area that long before the internet was regulated. And there are very real consequences. In addition to addressing training data and these new draft AI regs, what they’ve done is that they’ve made it clear that the people who create the tools will be liable as the content producers.
Kaiser: Yeah. We’ll get into the whole issue of the content regulation of this in quite a bit of depth later on in the conversation. I don’t want to check the gun here and talk about that just yet, but yeah, I mean, that’s obviously one of the things. I think what’s striking is also that, as you say, the intention is not to stifle this in the cradle. They want, within their structures, obviously, to allow this industry to actually develop. And they think that in providing these guidelines earlier on, they’re allowing it to develop in a healthy manner, right?
Kendra: Yeah. I was just going to pull out on that point and just take a broader lens view of all of these algorithm regulations. As Jeremy’s been underscoring tech regulations more broadly, internet regulations more broadly, almost all Chinese internet policy tends to include what I call a little bit of sweet and a little bit of sour. On the one hand, it almost universally requires new technologies to uphold socialist core values and not promote subversive ideas. It increases state control over the distribution of information and content on the internet, right? On the other hand, it also often includes strong and genuine consumer protections that in many cases go beyond what Western democracies have instituted. On paper right now, Chinese consumers and Chinese internet users have far more privacy protections than U.S. users, frankly.
I think what often tends to happen is that because there’s a focus on censorship in these policies, and there absolutely is, the rest of it gets brushed off and brushed aside as, oh, all of these privacy protections aren’t genuine, they’re not real, the Chinese government doesn’t care about privacy. I would actually argue that’s not the case at all. I would argue that the conception of privacy, as implemented, is simply different. When you’re looking at EU privacy policies, privacy includes the law that stands between me and my data, and the government marketers, hackers, and bad actors, right?
Kaiser: Right.
Kendra: When you look at Chinese privacy policy, it’s the government and I protect my data from hackers, marketers, and companies, right? As you look at it from that lens, where the government excused itself from privacy and conversation, to a certain degree, that’s a wild oversimplification. But to a certain degree, in that case, the government has every incentive to enforce consumer privacy protections for Chinese consumers with regards to companies. I think it’s important to think of those regulations with that in mind.
Kaiser: Yeah. Kendra, you have a way of bringing that kind of clarity to things. I remember the data, localization laws, for example, the sort of conventional wisdom was they’re just sort of trying to lock down user data in China. But your take was, no, they want to do this because they believe that this is an essential step to building a market for data. I thought that was a really interesting take, and correct. Let me just stay with you for a second here and ask you, let’s get a sense for where generative AI is in China at the moment. There’s been a ton of buzz every company talking about what it’s doing in this space. I mean, Wang Xingโs partner, the guy who cofounded Meituan, he has a new startup in this space and it’s raised a ton of money. Baidu launched its Ernie Bot. SenseTimeโs got something. iFlytek, obviously, is a big player in this. ByteDance has announced a big initiative. They’re already very, very big in AI. They’re trying to sort of help other companies. I guess I’ve seen a piece about their whole approach, very interesting. I’ve actually lost count now of how many announcements I’ve seen.
How far along are they? I mean, I’ve seen a lot of these appear to fall quite flat on their face, like the Baidu launch was really widely criticized. How far along are they and what’s their focus? Are they also focused primarily on text for large language models or have there been image companies that I haven’tโฆ? Yeah.
Kendra: Yeah. It’s obviously pretty hard to generalize, but if I had to generalize, I would say it’s not just companies, right? It’s also research institutions that are jumping on this.
Kaiser: Yeah.
Kendra: I don’t know. You might have heard about Tsinghua kind of unveiled another large language model that was supposed to be on par with ChatGPT. They called it GLM-130B or something like that. About a month ago, researchers, I believe it was at Fudan University, put out a model as well. All of these models are in their infancy and it’s almost impossible, at least from the outside, to separate fact from hype. For example, one of the first models that was China built that emerged after ChatGPT 3 was released was out of Fudan University.
State media picked up this article that said China’s got its own ChatGPT, and it flew around the internet. Everyone was really excited about it, prompting the researchers to release a follow-up that essentially said, โEveryone needs to calm down. We’re a tiny team with no resources to provide data processing for this commercially. This is 70% accurate. We would rank this as maybe 70% of the way done. And it is just an experiment, sorry.โ And there’s been a lot of hype around Ernie Bot. It’s gotten a lot of criticism domestically. There’s been a ton of conversation around for all these different companies and how far along they are. It’s very difficult to get access to those models, their long waiting list for testing them out, and we’re on it, or we’re on the waiting list for Ernie Bot, but we still haven’t gotten access ourselves.
Ultimately, I think there’s a ton of hype in the industry right now. These regulations that we’re about to really dig into have kind of thrown a huge wrench in there. There were also some rumors, for example, that regulators had come to Alibaba and said, โDon’t go releasing a major ChatGPT clone without talking to us.โ This is before this regulation came out. I think, from the picture I’ve gotten from the domestic conversation, what has happened is that ChatGPT kicked off a furor of the tech community; entrepreneurs got really excited about putting something together; regulators, including the minister of science and technology, said, โWhoa, ho, ho, ho, there are security risks here. We haven’t fully assessed them. We don’t have any clear regulations in place on this, and we haven’t given any guidance. Everyone needs to chill.โ And then the guidance that has come out, as we’ll discuss in a minute, I’m sure, has seemed to throw a big wrench in the ability of these companies to basically train large models. Right?
Kaiser: Right.
Kendra: So, that’s kind of where we’re at overall.
Jeremy: I would just add the single point. I think the release of ChatGPT really was a ground-shaking moment in China. From everyone I’ve spoken to who works in even Addison areas, there had been this assumption that China was sort of way ahead in AI. I don’t know if ahead is even a term that makes sense in talking about AI. I don’t like the idea of a race, but there was an assumption that China’s tech was more advanced, and ChatGPT sort of wowed people. People weren’t ready for that. And I think that even within the US and even the people responsible for ChatGPT are feeling that. That it’s progressing at an incredibly rapid rate. It is self-teaching and is improving really fast. I think that this caused a big shift of priorities in terms of research and development in China as well as here.
Kendra: Absolutely. I think everybody’s got the tiger by the tail. I mean, Jeremy couldn’t be more correct. Our team spends all day every day digging through the China regulatory and media documents and space. As soon as ChatGPT was released, no one was talking about anything else. I mean, it was a real watershed moment.
Kaiser: So, the things they were talking about. Like you said, there were security concerns, and there were concerns immediately about consonants with China’s core socialist values. But there are a whole bunch of other concerns that overlap perfectly with the same things that are being raised in the United States.
Kendra: Absolutely.
Kaiser: About ethics, about bias, about potentially harmful consequences, even from the basic things like using ChatGPT or large language models to cheat on exams, all the way down the line. The ethical concern, it’s interesting that there is some convergence around there. I wanted to finish off this preliminary batch of questions by talking about where we are in the United States in terms of regulation. I mean, I haven’t seen serious efforts yet to write a set of rules. But the rules do seem to be coming out of companies themselves. It’s interesting how they have converged around a set of things. These days, if you’re on ChatGPT 4, it’s pretty hard to get it to say terrible racist or misogynistic or sexist things now, right? They seem to have self-imposed some order. Can you talk about that? What’s the federal government doing? What are the companies themselves doing?
Jeremy: In the U.S., our model for dealing with new tech and the various concerns that it raises, including personal privacy and the content generated by some of these new tools, has been to let the corporations have a really long leash to experiment and allow industry self-regulation and corporate self-regulation. The problem with that is you see things, like what happened with Twitter recently. Twitter had a decent set of standards and community guidelines, and suddenly there’s new management in town, and that changes, and there’s an influx of the kind of content that they’d worked hard to eliminate. So, yes, the people behind ChatGPT have been working hard to make it not become racist, as we’ve seen previous examples of bots trained entirely on free open internet source material, and they very, very quickly became offensive.
Kendra: Much like the global internet.
Kaiser: Yeah.
Jeremy: Well, that’s the thing โ garbage in, garbage out.
Kaiser: Exactly. Right.
Jeremy: They’re working hard, but it is not a legally mandated regime at this point. While there’s efforts in other areas like privacy and data security where they have put forward legislation that hasn’t been put in effect yet, this is too new. And it’s some of that flexibility that Kendra and I were talking about earlier with China, where they’re able to sort of try and get out ahead of the problems. Even if they get it wrong, they can walk it back a little later more quickly and more easily. As opposed to the democratic process, for all its many virtues, sometimes our bipartisan system leads to real slow rulemaking.
Kaiser: Yeah.
Kendra: Yeah. And just to underscore the point a thousand times over, the United States still does not have an effective data privacy regulation. Not really, not for consumer protections like we are so behind. It’s outrageous. At this point, it’s outrageous, right? It’s well past time. The U.S. consumers have some federally mandated data privacy, and we just haven’t been able to sort that out yet. Comparing Chinese regulations to what’s going on in the United States is frankly embarrassing. Comparing Chinese regulations to what’s going on in the EU is a little bit moreโฆ
Kaiser: Slightly less embarrassing. Yeah.
Kendra: Yeah. Slightly less embarrassing.
Kaiser: Yeah. So, that’s where we are. In the deep synthesis regulations, like Kendra suggested, that the shorthand way of understanding them is that they were about deep fakes and the worries about that. The solution seems to be both in that set of regs and in this new proposed set of regs is to tag everything. To label everything, anything that’s created using generative AI. I mean, this is clearly one of the pitfalls of the early trials of LLMs. Is this enough? Is this even remotely sufficient just to label everything that’s created using a generative AI program as such? Is that even workable?
Kendra: As Jeremy can probably tell you better than I can, the stipulation to label content generated by a machine came from the deep synthesis, the previousโฆ I don’t want to call it a previous iteration. It’s a separate policy or a separate regulation.
Kaiser: It stayed in this set of regs too, right?
Kendra: It stayed in this set of regulations. What they were trying to do there is enforce traceability. Again, you can see what they were thinking in the spirit of the regulation, right? Let’s say you put out a video and it’s been modified by AI, there should be some kind of label on it that says machine generated. Or if it’s an audio file, it should include some kind of label or maybe in the audio itself that’s been generated by a machine. That just needs to be transparent and inform people that they’re not reading human-generated content, right? And so that has carried forward into this regulation now on AI-generated content. I think that’s one example of something that’s a nice statement of purpose, but I’m not 100% sure how that’s enforceable. Because, for example, you ask a chatbot a question, it spits out an answer, even if that answer comes with a text-based label that says, โThis was generated by a machine,โ you copy paste that out of the interface, and the interface is gone.
Kaiser: Yeah, itโs gone.
Kendra: I’m not really sure that that is as effective here. What I will be interested to see is how China’sโฆ Usually what happens in cases like this is that we see regulation via enforcement or implementation. What I mean by that is that you’ve got companies start to roll out generative AIs, they attempt to comply with this kind of vague notice, and then regulators in the CAC or somebody comes in and says, โNo, you have to do it this way, you have to do it that way after internal discussion.โ And maybe that rule gets ratified or clarified later, maybe it doesn’t. I think it’s kind of where we’re at.
Jeremy: There’s actually two different labeling requirements, and it’s worth noting that because I think they serve different functions. One of them, the one we’ve been discussing, is a visible label, which is clearly for public facing information that’s meant to alert people who are viewing it, who probably didn’t use the tool to create it themselves that they’re looking at something that was machine generated, may not be an actual photograph, is machine-generated text, etc. The other is what I call the technical labeling requirement, which is not necessarily a visible thing, but is some kind of mark that could be in the code behind the image or text, which is to allow traceability primarily. To make it so that we can find out what tool generated this information. Most users wouldn’t have access to that, but it’s a part of the record-keeping system that the creators of these tools have to have in place.
There are requirements in addressing some of these concerns of how enforcement will work. That removing the tags or concealing them or minimizing them is forbidden. How will that be enforced and who will it be enforced against? If I retweet something that was tweeted by Kendra, that was copied and pasted from AI, am I going to be liable for removing a tag if I never knew it was there? We’ll have to see. They’re going to have to feel that out as they move forward. But I do think that requiring labeling sends the message that you have to notify your audience when you’re using AI generated information. And that principle was there. We’ll see how well the actual tagging does. It’s only a first step, and only a first step in dealing with the misinformation issue.
Kendra: Can I actually also underscore, go a little bit down a small, brief rabbit hole there? Because I think there’s a really interesting point about the traceability of content. Something that’s very interesting about Chinese internet regulation in general is that it functions on a couple of core principles. One of those core principles is no anonymity. According to Chinese regulators, ideally, nobody who accesses the internet, no content that is generated on the Chinese internet should be unattached to a responsible party in some way. That goes all the way back to things like every website that has user registrations, you’re required to register with your phone number that is attached to your ID. When you get a phone, you have to show your national ID, you get the phone number, and therefore, it is a real name registration requirement on that, right?
Interestingly, they extended those kinds of regulations to smart cars. Every smart car on an internet of vehicles network is required to have a chip in it that’s registered to the owner of the car, and that has to be identified on the network. Even a car on the network is identified. Users on the network have to be identified. So, this kind of, as Jeremy called it, the technical labeling requirement, this idea that content generated by an AI has to be traceable back to the original platform that it was issued by is kind of in the same vein, right? You can’t have content floating around that was generated, was created by a specific party, and then it is impossible to trace that content back to the source from whence it came.
I think regulators are going to struggle now to do that. As Jeremy said, there’s now all these technical concerns. It’s easy to do that with a sim card. It’s easy to do that with hardware. It’s easier to do that with a single user account. It’s much more difficult to track and trace back content generated by a machine. We’ve been trying to track and trace back content generated by humans and online behavior perpetrated by humans. Now we’re trying to trace back non-human content generation. I mean, this is a bit of a new frontier.
Kaiser: Yeah. And it’s obviously going to place a burden on the developers of these generative AI platforms. One of a number of things that will prove to be burdensome, but the really big one, of course is, I guess to nobody’s surprise, and I think it’s not inappropriate either, the focus that has been on censorship. When people hear that China has this new set of rules around generative AI, people I think immediately go to, well, this is certainly about controlling the flow of information. After all, they blocked the ChatGPT and Bard and all the other major LLMs right away, right off the door. So, to what extent is this motivated by this desire to censor, and to what extent is it the same kind of generic concerns over AI ethics and governance that we’re all worried about here? I guess there’s no way to break it up, but it’s both, right? I mean, it’s not just one or the other. Jeremy?
Jeremy: Before we even go there, I have to say that the anonymity discussion, which is a really important one, because it links back to what Kendra was saying at the beginning of our conversation about a different notion of privacy. Clearly, what the Chinese government has done with technology regulation is akin to what they’ve tried to do with surveillance cameras in physical spaces. It’s not that you’re constantly being watched at all times. It’s that there’s a record of what happened in every place. So, when a problem gets reported that has to be resolved in order to maintain that all important social stability, they can find the record and go back to it. Kendra was talking about how privacy in the EU and U.S. model is often about limiting government powers. In China, it’s often about the government being the protector of your privacy. The government has access to this information. It’s other users, corporate or individuals who don’t have it.
With real name systems, that’s very much the case where the general rule is that you must register your real name to get a phone to log into an account on a website. But usually, upfront, you’re allowed to use whatever you want, so other users can see whatever alias I’d choose, but there has to be a record connecting that alias that I chose to my account back to my real name registration. Now, to answer the question you actually asked, Kaiser, yes, censorship is part of it, but I think the better way to view it is to view it as China is trying to import all of its data regime and existing laws onto the new technology. The Jell-O is slipping off the nail again. We got to put another nail or put the nail in a new place, but it’s the same principles going through.
And they’re global. A lot of the concerns are global when it comes to this AI type stuff. I broke it down in my overview into censorship and content controls, prevention of discrimination, intellectual property rights, curbing misinformation, and privacy of data protection. The censorship concern, obviously, takes a unique form in China where they have a notoriously expansive censorship regime. But the other concerns are things that we’re all sort of trying to deal with.
Kendra: Yeah. And I’ve got, right in front of me, the actual text, right? I mean, I think this probably leads me into what-
Kaiser: Yeah, me too.
Kendra: โฆ what you want to talk about there.
Kaiser: Shall we read it together?
Kendra: Yeah. Let’s read it together.
Kaiser: Article 4, content generated usingโฆ
Kendra: Yeah, I mean, essentially what it says about training data for AIs, right? Whatever data that you feed the AI, first of all, has to comply with essentially existing requirements. It also canโt infringe on intellectual property rights. If you’re using somebody’s personal information, you have to get the consent of the person whose personal information you’re using. And this is the kicker. This is the thing that everybody’s been chit-chatting about, this number four here, fourth requirement for training data. The person training the machine or the entity training the machine must be able to guarantee the authenticity, accuracy, objectivity and diversity of the data.
That speaks to many of the big concerns that have been raised about ChatGPT and OpenAI. Where are you getting all this stuff? Why is ChatGPT generating false information very confidently, right? How do we know that the data that it’s been trained on is accurate? But the opposite side of that question is how much does this particular stipulation kill innovation? Because what you’re essentially saying is that authenticity, accuracy, and objectivity, if that’s the requirement, you cannot train an AI on the entirety of the internet because it’s full of inaccurate, inauthentic and unobjective information, right?
Kaiser: Yeah. I mean, walk me through how this is supposed to work. It seems like a huge burden to the actual developers, I mean, they have to ensure this is being trained on accurate and diverse information. It’s got to be politically and socially kosher as well, I mean, just to boot. We’re talking about oceanic volumes of text, even if it’s drawn from servers that are within China and supposedly already pre-cleaned and pre-screened and doesn’t contain data that’s in violationโฆ We know that’s not possible. We know there’s always going to be problems with the underlying data. In the case of U.S.-developed LLMs and other generative AI systems here in the West, what you do is you have this human feedback or RLHF, right? Reinforce learning through human feedback.
You’re supposed to weed out the vulgar, the anti-Semitic and the sexist and the racist stuff. But I mean, it seems to be working. I’ve played with ChatGPT 3, 3.5, now 4 โ 4 is a lot better. This whole human feedback thing seems to be working. Wouldn’t it be pretty much the same for China? Can’t they just sort of apply stricter parameters and limitations, but arrive at the same kind ofโฆ? It’s burdensome, sure, but it’s only been a couple of months, and ChatGPT, OpenAI was able to do a much better job.
Kendra: I think the question you’re dealing with here is whether or not regulators will give companies and the general public the leeway to make mistakes in public as these problems get sorted out. Because, well, backing up a step, I think it’s really worth spending some time on this issue of the accuracy, authenticity, objectivity, and diversity of the training data. Essentially, and again, I’m wildly simplifying here, but basically what ChatGPT did was feed its machine the internet and say, โHey, why donโt you chew on this.โ This stipulation makes that impossible. When you get companies, Tsinghua University or these other companies that are saying, โOh, we trained our models similarly to ChatGPT,โ really? Using what data? Is that accurate and authentic? Also, what does authenticity mean? Who gets to decide what’s authentic? Who gets to decide what’s accurate? Who gets to decide what’s objective? I’m not sure. I mean, Jeremy, are there legal definitions for those things? I don’t think so, right?
Jeremy: No, of course not.
Kaiser: Well, itโs what the Party says.
Jeremy: Yeah. Well, no, it’s not even that simple. I gave the example in the overview of if you feed it a work of fiction or a fictional piece of art, like I used the example of an old Greco of a unicorn, clearly that content is false in that unicorns don’t exist. But it’s accurate in that it’s an accurate reproduction of that Greco by a famous artist. It’s not just a question of misinformation. Theyโve gone too far. I would argue that the other data source requirements beyond even this sexier one of requiring accurate and verifiable information are also really problematic, including the IP one. At first glance, it looks like you can’t use any copyrighted works at all in your training data, which copyright isn’t always that, it might not be clear who the copyright holder is or if there is one, or what rights they’re claiming in something. Beyond that, they’ve defined it in sort of a circular way where they’re saying you can’t use data that is infringing on copyright.
Well, the whole question is, is it infringing on copyright for this use? Copyrights aren’t absolute rights. They protect certain reproduction rights and things, but they have what we usually call fair use exceptions. And the whole issue with AI is, is this fair use to use it as training data? It probably depends on what kind of output I use based on it, how far afield it is, whether I’m creating a product that actually competes with the original author’s content, and things like this. But they’ve defined it in a circular way where they say, โWell, you can’t use data that infringes on someone’s copyright,โ but that’s circular cause we don’t know what infringes on somebody’s copyright yet.
Personal information โ We have the Personal Information Protection Law, which creates a robust regime about protecting, mainly identifying information and especially biometric information and what they call sensitive personal information. That’s information that if leaked or used improperly would lead to discrimination or disadvantage of the subject of it. Is all of that going to require consent? If it was a random photo of a person, a street scene with someone in the background, which is allowed under the Personal Information Protection law, where are the lines going to be drawn in there? So, yes, there’s an unworkable standard in the false information, but I think even the IP protections cause problems, as well as the personal information.
There is a nod towards human checking where all of these platform creators are responsible for having their rules and records of human tagging of data and human review of information. But that will only go so far.
Kendra: I think the result of all that, I mean, a lot of the conversation I’ve been having with other people in the data policy space is that the likely result of all of that, at least in the short term, is that you get generative AI used for discreet enterprise applications. In the medium term. What I mean by that is, okay, well, I’m an auto company and I’m an auto manufacturer โ I know I own all the data, all the customer data that I’ve collected. I’ve anonymized it according to the rules. It’s not a big confusing data set of pictures and texts and web scraping I got from God knows where and all this kind of stuff. I’m very confident that this is my corporate data. I have a discreet use case for it. That discrete use case can be verified by regulators. Clearly, if I go to regulators and ask them to verify the security of this is a narrow scope.
Again, I just want to remind everybody, this is a draft regulation. The most interesting part of this is going to beโฆ As Jeremy said, it’s unworkable. Chinese companies will not be able to legally compete with foreign companies who are using massive data sets without these restrictions, given this restriction. I think what we’re going to see now is pushback from those companies as they send in their complaints to the CAC and the releases of this draft and say, โYou guys are going to kill us on this,โ and there’ll be a discussion. And so, it’ll be fascinating to see what the final version looks like. Because when you compare the final version and the draft, you can usually, in some cases, read between the lines and go, โOh, I can see what everybody complained of. I can see what the problem probably was here.โ
Kaiser: Right.
Kendra: I think that that will likely be the case, but enterprise applications with a narrower scope may end up being the outcome depending on what the final version of this is.
Kaiser: Yeah. This is exactly what Rogier Creemers, actually, of Leiden University, what he argued about in that DigiChina Roundtable. He said that Chinese services subject, those that are going to be a political censorship, might be able to emerge within a different industrial policy landscape โ โOneโ, and Iโm quoting here, โwhich sees the future of these technologies as being closely intertwined with existing products and services.โ Baidu has announced partnerships for Ernie with household goods and car manufacturers. This means these services will likely evolve in a more delineated task-specific manner. People usually don’t ask their toaster for relationship advice or political opinions.
But this seems like gigantic overkill. You don’t need a trillion node neural network to create out of a small, really finite data set just to create a set of very task-specific functions. It’s using a shotgun to kill a fly. I mean, it’s insane.
Kendra: Yeah. I mean, I think you’re right.
Jeremy: Yeah, I’ll be a little bit of a devil’s advocate. I think that you’re going to continue to see content generated by AI throughout China’s social media landscape. The reason for that is that people want it as a practical matter. It’s going to be hard to limit access to foreign tools and illegitimate tools. But most importantly, as with all of China’s content regulations, they’re qualified. China’s content regulations are qualified to make it so that it’s disturbances that are the problem, not the content itself. And what theyโre looking for, as we saw even back in 2013, which at a decade ago, has gone so fast, but they put out network regulations about stopping online rumors, which expanded for sort of street crimes to cover online activity, including the notorious pick and quarrels offense.
They defined the internet as a public venue that you could cause a disturbance in for picking quarrels. And what they qualify it with is that it has to lead to either an embarrassment of the nation internationally, a harm to social stability, a great disruption in social order. So this is what they can do. The content will be out there and it’ll be enforced against situations that cause a real problem. Inevitably, that will also have a chilling effect on people who are spreading information benignly, which is the nature of all of China’s speech regulation is that by keeping that shotgun ready, it keeps the activity down, but it doesn’t totally stifle it.
Kaiser: Paul Triolo raised another issue in this piece about another thing that might really slow down the development of generated AI in China, and that is, of course, these export restrictions, higher-end GPUs from companies like Nvidia GPUs, graphic processing units are what most deep learning systems, these huge neural nets are built on. Huawei and Baidu, as he points out in this piece, have their own neural processing units, these NPUs, these dedicated AI chips. But at least in the case of Huawei, they can’t get them anymore because they were being manufactured at TSMC. So, Kendra, what is your take on this? What’s the extent to which these export restrictions are going to hobble China’s efforts and generative AI, if at all?
Kendra: Well, being perfectly transparent, I think the entire policy community is scrambling around to put a number or a timeline on exactly what that means for China, right?
Kaiser: Right.
Kendra: I think that there’s two issues here that often get confused. Issue number one is whether or not the export controls will stymie China’s ability to manufacture advanced semiconductors domestically. Issue number two is whether or not a lack of access to advanced semiconductors will materially damage China’s AI ambitions. Those are two separate questions. In terms of question one, I think absolutely. I think you’re already seeing that. The semiconductor industry, and you’ve done a podcast on that, it’s a very good one. Semiconductor industry in China’s in a bit of a panic right now. There’s material and very valid concerns that a lack of access to advanced manufacturing equipment and chips is going to cause problems, etc. We’ll set that aside, maybe come back to it on another podcast.
But in terms of AI, there’s this huge open question about how much processing power do you need domestically to train these models? And how many of these models do you need? Right now, in the near term, and again, these are all just points for consideration, I don’t have a solid conclusion here, but in the near term, here’s what you’re looking at. One, the chips that currently are available domestically, so we’ve seen a lot of moves for those companies and a lot of reporting on the fact that those companies that currently cannot actually buy the chips themselves are renting server space at cloud centers to run their models. There’s no rules in the export controls that say you can’t rent out use of these chips to XX company or XX server.
So, there’s a bunch of loopholes. We’ve also heard reports from I think FT that there’s purchasing of chips through subsidiary companies. There’s holes in all export controls, right? For the time being, that’s not a sustainable situation. That’s not a sustainable situation. For the time being, that’s buying some time is really what’s going on there. The other interesting thing to note, here’s something that never gets talked about, and again, maybe we should spend some time on this elsewhere, but there’s a huge national initiative right now called the National Unified Computing Power Network. What’s going on is that the central government is trying to create basically what’s an electrical grid of data centers and supercomputing centers where eight major hubs across the country โ and by hub, I mean a location that has a cluster of data centers located there, those data centers being run by different companies โ are interconnected by high-speed fiber-optic.
Then an AI-based platform is installed to take requests for data processing, particularly AI training, from companies and push them to data centers that are underutilized or can handle the load.
Kaiser: Capacity. Yeah.
Kendra: Have capacity to handle that load. One of the reasons that they’re doing that is because there’s a lack of data processing capacity in the east. And along the eastern seaboard is where all the big tech cities are. That’s where all the data processing requirements are. Those data centers are powered by coal, right? That’s dirty energy. And they’re overburdened where you’ve also got data centers out west that are powered by renewable energy and they’re not being used because there’s no big tech cities out there, basically.
Kaiser: Right.
Kendra: So, they thought, okay, we’re going to hook all these. Now this project is in its infancy, but there’s a real question here about whether or not pooled compute will also buy China some time. This is all about buying time. This is all about buying time, because eventually what you end up with is such a large gap between what China has access to, the chips that China has access to and the bleeding edge that now you start to see a competitor disadvantage. It’s very hard to quantify how long that is. There could be a huge breakthrough in semiconductor technology or chip architecture or chip materials that changes the game. So, the question is, does the game change, and does China develop its domestic industry to the point, buy itself enough time to make up the gap?
Right now they’re in pretty dire straits. Frankly, the local policy conversation is weโre in trouble. The newer chips, even one generation newer of Nvidiaโs chips, are significantly faster and more energy intensive than the generation that China has access to now. That gap will widen year by year. The question then is, what are we going to see here? What are we going to see as the response? And how actually damaging is it? How much power does an AI need? How many models do you need to be competitive domestically? Three? Five? 50? A zillion? What do you mean competitive? There’s a lot of open questions there and a lot of kind of moving parts, but I don’t think it’s as simple as China doesn’t get chips anymore, boo-hoo, they lose.
Kaiser: Watch this space.
Jeremy: Just a quick statement, also sort of bring us back to AI regs. What Kendra’s talking about illustrates also something of the complete structure of the regulatory and policy framework. That while we get things like these new draft AI regs, which are really a band aid, these are something that are put out to plug a leak in the dyke or another nail to hold the Jell-O to the wall. These aren’t grand policy schemes. These are just something to deal with an issue that’s emerging and do it quickly. But we also have these long-term policies and plans that include things like pooled computing, Kendra’s mentioning, creation of a data market where there’s a controlled ability to sell and exchange in data, identifying specific locations for that. And it’s very easy for foreign governments and media alike to treat it all sort of as equal and to seize on whatever the latest, newest thing is as opposed to seeing the difference between this grand infrastructure and policy development plans versus these patchwork regs that come out occasionally to address an immediate problem. The draft AI regs are important. Absolutely. They tell us a lot about what challenges China sees in terms of generative AI, but we also have to keep an eye on this bigger plan, and they’re able to do both of these things at once.
Kaiser: That’s ultimately why I asked the two of you to join me because I knew that you’d be able to step back and see the shape of the forest, see the sort of bigger picture, and not just zoom in on these individual incomplete ad hoc sorts of individual sets of regulations. Thank you so much. This is incredibly valuable. I love listening to you guys talk, and I will have you both back on really soon. Let’s move on to recommendations. Before we do that, let me just out for a quick reminder that if you like the work that we’re doing here with the Sinica Podcast and the other shows in the network, please support us by becoming an Access member. Just that.
Subscription is the way that we keep the lights on here. We depend entirely on that. You may have noticed we’ve stopped running those stupid ads at the beginning of the podcast. They weren’t doing anything for you. They weren’t doing much for me, so, hey, they’re gone. But so all the more reason to become a subscriber. Okay. Let’s move on to recommendations. I know I’ve got a couple of good ones, but Kendra, why don’t you go first. What you got for us?
Kendra: Sure. I’m going to recommend a big fat downer.
Kaiser: Oh.
Kendra: It’s a wonderful book called Cobalt Red: How the Blood of the Congo Powers Our Lives. Itโs by an amazing journalist named Siddharth Kara. It kind of tackles claims that there is any such thing as clean cobalt on the supply chain and kind of some of the institutionalized slavery in the Congo that is essentially powering the phones and devices that we use every day. And it’s really a conversation that hasn’t gotten enough media attention as far as I’m concerned. So, definitely check it out.
Kaiser: Yeah. I don’t know if you heard our interview with Henry Sanderson who wrote Volt Rush, but he talks quite a bit about the problematic nature of cobalt in the Congo. I mean, you basically have a choice between working with this guy who’s basically beenโฆ he has a global Magnitsky sanction against him, a real bad actor, or Chinese who are buying from these artisanal mines and pulling this stuff. It’s pretty awful everywhere you look. But yeah, it sounds like a great recommendation. A downer, yeah, to be sure, but Iโm definitely going to check that out. Okay. Jeremy, what do you have for us?
Jeremy: I feel bad. I’ve also got a downer. I was going to recommend-
Kaiser: Okay, don’t worry. I’ve got uppers.
Jeremy: Oh, thank God. I’m going to recommend the last book I read in a while ago, but it was the last book that really made me want to tell people about it and get other people to read, which is a work of fiction, The School for Good Mothers by Jessamine Chan. I liked it because it really addresses some of the emotional as well as practical complexities in child welfare systems through a work of science fiction, getting at them in a way that any level of analysis will never get to while at the same time really heartbreakingly looking at what it means to be both a kid and a parrot.
Kendra: Sounds joyous. Sounds like a Sunday morning reading.
Jeremy: It’s a really great book, with a surprise China connection too, so, you know, there-
Kaiser: Yeah. Okay. Well, I’m going to offer two recommendations pretty quickly. One is a book I recommended a couple weeks ago, a book by Peter Frankopan, Silk Roads: A New History of the World. I’m going to recommend another now, I’m about a third of the way through it, but it’s just great. It’s his brand-new book, just came out โ it’s called The Earth Transformed: An Untold History, which I bought actually as an audiobook. Frankopan reads it himself. He’s got a very nice reading voice. Now, the content of the book is kind ofโฆ It’s the stuff that I just love. It’s very current on all the science of sort of stretching back billions of years, literally billions of years, the whole natural history of the earth. He presents quite cautiously and aptly caveated the case for climate as a major factor in historical events from incursions of steppe nomads to the collapse of civilizations. I mean, medieval warm periods and mini ice ages and things like that. But just correlating all this data and from all over the place, from all these different disciplines that seem to show climatological, and in some cases meteorological data, and how it affects major world events. So, I’m totally digging it. Okay, maybe I’ve sold it as an opera. It’s not an opera. It’s a good book. The lessons aren’t so cheery necessary.
Kendra: Nobody listened to us. Okay. Nobody listened to us.
Kaiser: But itโs good. Okay. I also wanted to make another recommendation. Just this morning I saw one of the Say Farewell to Harry Belafonte, whose music was just the soundtrack to so much of my childhood. My dad had this old reel to reel, and he had all the Harry Belafonte recordings from Carnegie Hall. Thereโs two like double live albums from Carnegie Hall, which are just amazing. Those songs, I mean, I know them super well. I sing them all the time. My siblings and I, Iโm always sort of like leaving little voice messages where Iโll sing a Harry Belafonte, heโd snatch at them just for fun. And mom had this gigantic crush on him, but my dad would get all teary when heโd hear certain songs. So it just reminds me of them so much. And of course, he was just a great American. He had just such amazing contributions to the whole civil rights struggle. He was a big hero to my folks, a big hero to me. So, rest in power, Harry Belafonte. And thank you guys. Kendra, what a delight to have you on the show finally.
Kendra: So fun to be here.
Kaiser: Yeah. Well, weโre going to do it again. And Jeremy, yeah, what a pleasure as always.
Jeremy: Thank you, Kaiser.
Kaiser: All right. The Sinica Podcast is powered by The China Project and is a proud part of the Sinica Network. Our show is produced and edited by me, Kaiser Kuo. We would be delighted if you would drop us an email at sinica@thechinaproject.com or just give us a rating and a review on Apple Podcasts as this really does help people discover the show. Meanwhile, follow us on Twitter or on Facebook at @thechinaproj, and be sure to check out all of the shows in the Sinica Network. Thanks for listening, and weโll see you next week. Take care.