Open-Source Intelligence in Crisis: Navigating China’s Restrictions
D.J. Bobbs from C4ADS and Skip Schiphorst from I-Intelligence delve into the ins and outs of China's growing restrictions on data.
This webinar, moderated by Lizzi C. Lee, featured guests D.J. Bobbs from C4ADS and Skip Schiphorst from I-Intelligence. It covers topics about how to navigate data restrictions in China, in which our panelists spoke broadly about their expertise on how their work and clients’ work have been affected, ways to analyze and cross-check data to ensure reliability, and tips and methods for accessing data from abroad.
The China Project’s CEO, Bob Guterma, also made an exciting announcement about ChinaEDGE, a platform with over 9 million company profiles and dozens of data points that will get you the information you need for your due diligence on China — fully translated into English.
For more information on ChinaEDGE, please go here and register your interest.
Check the below transcript of our video webinar held on September 18 for more details. Our next free webinar in the series, Complying with New U.S. Laws on Investment in China will be announced soon.
Lizzi C. Lee: Hello, everyone, and welcome to this much-anticipated webinar powered by The China Project. We are so delighted to have you join us today to delve into this subject that has been on the minds of many business leaders, data analysts, policymakers, and academics alike. Our topic today is open-source intelligence in crisis. I am Lizzi, your host for today. I’ve been in the China journalism space for over two years now after a decade of training as an academic economist. I’m thrilled to be moderating this incredibly relevant and timely discussion. But first, let me introduce our two amazing panelists.
First, we have Skip. Skip is a true open-source research practitioner with a deep understanding of China. Skip pursued his studies in China at the University of Leiden and Xiamen specializing in China. He now shares his expertise by teaching courses on Chinese online research techniques, aiming at assisting researchers, academics, and business professionals in harnessing the potential of Chinese online language sources. Skip now represents I-intelligence, a Switzerland-based company, which excels at the convergence of intelligence, foresight, strategy, and policy. Welcome, Skip.
Next up, we have D.J. D.J. is a China analyst who played a pivotal role on the state-sponsored threat team at C4ADS. His work involves harnessing the power of high-value datasets and open-source intelligence to delve into matters such as malign investment, influence operations, and other threats emanating from the Chinese government. Before joining C4ADS, D.J. accumulated valuable analytical experience in both public and private sectors. He collaborated closely with international trade specialists at the U.S. Department of Commerce, where he translated China’s five-year plans. In addition, D.J. also contributed his expertise to a wide array of due diligence and litigation activities during his tenure at Portman Square Group. D.J. is an alumnus of NYU-Shanghai, and he furthered his education by earning a Master’s degree in applied intelligence from Georgetown University. Welcome, D.J.
We’re going to start today’s conversation by focusing on what’s currently going on in China’s data control space. The media has lately focused a lot on China’s recent tightening control over its data. In this context, some key events to note include China’s introduction of its data transfer rules and laws in July 2022, which came roughly a year after China scrutinized the DD. In September of 2022, we saw WIND information systems that were asked to limit overseas data access and that copped significant media attention. By March of this year, the CNKI academic database restricted some modules and the U.S. remittance group was then raided. As recently as April and May of this year, we also saw the enactment of new anti-espionage laws and raids on Bain Capital and Capvision offices. But when we talk to insiders in China, they paint a more nuanced picture. Many see this as less of a full-scale crackdown, and more as an evolving legal landscape. China is now treating data experts much like physical experts requiring a license for the same. Despite all the headlines we see in the media, I think it’s important to note that platforms like WIND info, Wanda, and CNKI Zhifang are still largely accessible.
We’re going to start today’s conversation with a series of overarching questions. Then, we’re going to get to some audience questions pre-submitted to us. We also welcome questions as we go along. Please just submit them via the questions and answers function you see in the Zoom window that you’re currently in. We’ll try to get to as many of them by the end of today’s webinar. I’m going to start with Skip. Skip, can you please shed light on how those recent data restrictions we just mentioned in China are complicating the due diligence process for foreign firms planning either joint ventures, mergers, or in some cases, acquisitions? Okay, our team just informed me that Skip is not here. So, I’m going to turn to D.J. first. D.J., your research focuses a lot on the use of high-value datasets and open-source intelligence to study malign investment and other threats. I wonder if you can help us understand the broad implications of those restrictions. How do you think limited data access from China exacerbates concerns over malign investment, defense, trade, and intellectual property security issues?
D.J. Bobbs: Sure, before I continue, I just want to make sure I’m coming through clearly.
Lizzi: Yes, your audio is perfect, thank you.
D.J.: Thank you. Great. Well, first of all, thanks for having me today. Chinese publicly available information. It’s kind of like my bread and butter here at C4ADS, so I’m more than happy to chat about this. I’d say the limited access to data in the Chinese data environment raises concerns over a variety of threats, including online investments, or time defense trade of dual use goods, influence operations and soft power projection, researching IP theft, human rights violations, and so much more.
Let’s say that a university in Europe, for example, is trying to establish a joint research facility for aerospace technology with a Chinese Defense University. It turns out that the Chinese Defense University conducts research for the Chinese military. The European university may want to reconsider their collaboration if they knew who the Chinese universities end user is. We’ve seen a lot of people and organizations, in academia and in the corporate worlds, end up in these seemingly innocuous situations simply because the data in this environment is scarce, and they just simply didn’t know who exactly they were working with. In other words, the climate of the Chinese data environment is leading to preventable breaches in security, corporate security, research, security, economic security, and even national security.
I’d say that organizations like C4ADS rely heavily on the Chinese data environment to investigate these issues and we leverage a variety of data verticals to do so. Corporate data, trade data, judicial procurement, patent property, concession, employment and even leaked data. There’s just so many different types of data that we work with in this space. Even though we’ve been able to capture a lot of it, sometimes I feel like we’ve merely scratched the surface. We’ve noticed that the amount of publicly available information in this space has steadily decreased in the past five years, and while the opacity makes our job a little more difficult, it’s certainly not impossible. It, actually. We even encourage ourselves to become more creative in our data collection strategy, which is cool. When it comes to data from Chinese sources, you have to get it while the getting is good, because there’s no telling how long it will be accessible for. We’ve seen this happen to QiChaCha, a once reliable source of Chinese corporate data that became pretty restricted in the past year. The silver lining is that when one site closes, another one emerges. It’s all about identifying those alternative sources of equally valuable and equally reliable data.
Lizzi: Thank you so much, D.J. So I see Skip this back online. Follow up from the question I just put to D.J. Skip, do you think those data restrictions will have a knock on effect on other governments and maybe third party data sources? What do you think is the likelihood of a domino effect in a contest of future data limitations?
Skip Schiphorst: Good to be back, by the way, I love the internet. I think by restricting mainstream commercial data to be accessed from abroad, I think the effect is going to be twofold like a two fold domino effect. On the one hand, I think this will create opportunities for companies which have soft connections in China. When I say soft, I mean, unofficial. For instance, pulling corporate data from Chinese websites can be done easily if you have someone in China with a Chinese phone number which they have or abroad with a Chinese phone number. Then that person can query the internet to find or retrieve that data and push it forward. This creates go betweens, and this model actually exists in quite a few countries already.
Simply put, someone can easily just send whatever information is required to someone in China and they pull it from the internet and send it back. It’s another link in the chain to get to the information, whether it’s corporate academic data, but it’s still a very effective way to get to it, albeit a bit cumbersome right. So the simple scheme actually also heightens the risk of the go-between in question in China to be perceived as probably breaking the law on a local level.
The other domino effect on the other hand, I also think that this is also going to present entities which do have licenses to pull data from the internet in China. So we will witness an increase in demand for these services. It also implies I think that researchers from any background are going to have to be creative to get to that data without using a data pool, if it dries out. Just by searching themselves, which requires knowledge on how to type and search in Chinese online, which can seem intimidating due to the linguistic aspects of the language, as well as how to navigate the Chinese internet. That’s overall about anti intelligence and it’s not that difficult. It takes a bit of breaking the ice, but it’s still possible, actually.
Lizzi: Fantastic. D.J. What guidance do you have for our audience who are grappling with those data limitations? How can they identify and use still available trustworthy sources in this increasingly opaque environment?
D.J.: Sure, yeah. My advice would be for companies, government agencies, universities and other organizations in this space to take advantage of the scalable, low cost and non resource intensive solutions to the data limitations. Just because the data appears to be limited and hard to find it doesn’t mean that it’s not there. From official government websites to third party data aggregators, we’re constantly identifying new sources of high value data. China is not the only source of Chinese corporate data. Similarly, CNKI is not the only source of Chinese academic data. We’ve noticed that as these once reliable sources start to close, new ones appear with the same if not better data. That’s why I and others at C4ADS attach a lot of importance to data foraging and just seeing what’s out there. Despite the limitations, the sheer amount of information that remains is quite staggering.
Also, if you know what you’re looking for, and how to look for it, the data is essentially handed to you. Basic search techniques and Mandarin VPN services. The will to dig deep will take you quite far and at a relatively low cost, just as it has for us. For example, companies, universities, and government agencies in China post a surprising amount of information on their official platforms just like they do here in the States and elsewhere. If you want to identify shared facilities between a western research institute and a Chinese defense university, you can do that. If you want to track shipments of integrated circuits between Shenzhen and Moscow, you can do that. If you want to scrape tender records where units of the PLA are listed as suppliers, you can do all of those things. It’s all about knowing what you’re looking for, where to look for it, and how to go about finding it.
Lizzi: Fantastic. I see we have a few audience questions alternatives to QiChaCha; we’re gonna leave those to the end. D.J., we’re gonna hear your thoughts on those questions, but I’m going to turn to Skip first. We also have a few questions on the CNKI database, which we know researchers in hard sciences have relied heavily on. People might remember back in 2013, there was a master’s thesis on CNKI, which was pivotal for Indian scientists researching the origin of COVID-19. But under the new regulations in China, accessing this crucial information from outside China has become quite problematic as a few of our webinar participants have pointed out in the Q&A box. Skip, can you discuss the ripple effects of those restrictions on other industries have those restrictions especially on CNKI affected your work? Or your clients work? Can you share the strategies that you’ve been using to circumvent those limitations on the NPI database?
Skip: Yeah, sure. This presents a pretty big problem in order to get to that academic information which is so important to the advancement of society. Understanding how other other countries work and look at information and share information. We talked about pools of data drying out, we talked about corporate information pools drying out, in this case, we’re looking at academic pools drying out as well. This means that, I echo the thoughts of D.J., one has to learn how to search online just as we would in English. Searching for data without going to these data access points like ResearchGate, etc., and learning how to get to that data — that’s going to be the same in Chinese. You just have to switch. When I say that, it might sound difficult, but it’s pretty much the same systems or methodology you have to apply when researching and foreign language.
I would say that researching in Chinese, as intimidating as it might look with the different characters, is probably one of the easiest languages to search in online. Coming back to your question. I think it means that you have to identify keywords, which is something we really emphasize on. It’s a bit like if you have a library card, for instance, and then the library card says I don’t like you anymore, you can’t go in my library. Well, you still want to read that book, right? It just means that you’re going to have to look at other places where you can consume that information, other libraries or have a look at for other libraries.
Lizzi: Definitely. D.J. We also have a question from one of our audience members. The question asks about how significant is the gap in the volume of data available inside versus outside China? Do you think any of this restricted data will be re-available to international users in the near future or is this something more permanent? Are they sort of under the radar sources of Chinese data that our audience should be aware of that you’ve been finding helpful?
D.J.: Sure, yeah. Great question. In my experience, I’ve noticed that the gap in data availability inside versus outside of China is pretty significant. Mostly because a lot of Chinese websites with high value data will require you to create an account with them in order to access the data it holds. To create an account, you have to verify your identity with a Chinese phone number. Folks in China who have that Chinese phone number would in theory be able to do this quite easily, whereas it’s a little bit harder for us to do. So using VPNs, for example, has taken us quite far. But there are certain sites that require you to create an account with a Chinese phone number in the end. Regardless of where your internet traffic is coming from, you’re going to ultimately have to submit some sort of Chinese phone number.
In terms of re-availability, I haven’t seen any data rich websites that were once accessible and reliable, suddenly reappear with less restrictions. However, I have seen data migrate to other newer sites, perhaps ones with similar names for so for example, as QiChaCha, TianTianCha, were starting to close, other third party corporate data aggregators in this space were opening up with the same corporate data containing the same corporate identifiers that we look for. So, Chinese names, unified social credit codes, registered capital, shareholders, directors, etc. It’s possible that these alternative third party data sources we’re scraping from QiChaCha itself, knowing that QiChaCha would one day become inaccessible. When it comes to under the radar sources of Chinese data, first of all, there are several and second they aren’t so under the radar after all, and we’ve been able to identify these alternative sources of high value data through basic internet searches and just some simple data foraging.
Lizzi: D.J., I wanted to step back from the nitty gritty details of data restrictions a little bit. I wanted to talk about perception a little bit. Do you believe that this negative image being reinforced by these restrictions is something that the Chinese government is aware of? Do you think Beijing is cognizant of the global impact of their recent actions, not just on policy researchers but on academic researchers in general? Do you think there are indications that they might take steps to improve the situation?
D.J.: Sure, yeah. I don’t specialize in political analysis. But, I don’t think Beijing cares about the negative image being reinforced by these restrictions. If they did that we wouldn’t be operating in this opaque data environment. They seem to make an attempt to be transparent in the 14 Five-Year plans and other publicly available national plans. But, I found that it’s kind of difficult to draw tight conclusions from these. I think the data in this environment, or lack thereof, speaks for itself. I’d like to think that economic growth and data transparency go hand in hand. So, if economic growth truly is a priority for the party, then we should in theory, have a data environment that facilitates and reflects that.
We know that the limited access to data is making it more difficult for foreign companies to do business with China and to do their due diligence. So in this way, perhaps the negative image is warranted. Surely Beijing is aware of these effects but in my work, I haven’t seen any indication of the situation noticeably improving. With that said, we’ve been able to develop some pretty replicable solutions to the limited data access and they’ve already been proven to be pretty successful. So all to say that not all hope is lost.
Lizzi: Thank you so much, D.J. So speaking of data transfer, Skip, I’m gonna turn this question to you. Are your colleagues or partners in China generally willing or open to send data to you or overseas counterparts? I’ve heard stories, for instance, of U.S.-based academics reaching out to their Chinese colleagues for document sharing, because it’s easier to do this from within China. Is this kind of collaboration potentially problematic or dangerous? Are they also becoming increasingly restricted? Are professionals within China, whether in academic trade or finance, becoming more cautious about sharing data as part of their tangible projects to work?
Skip: I think the first group that really got cautious with this are the persons engaging in corporate research and now we’re seeing a pivot from that to the academic field, which is worrisome because like we said before, it’s so important to get those exchanges. So that means that not only corporate researchers, but also journalists and academics in general need to be sensitized on the implications of data requests and data sharing with their Chinese counterparts, as it might put the receiver in hot waters. From a research and more importantly, an economics perspective. I think if I can just jump on this, I think for those wishing to get more access to Chinese data and really understand how that is if China is their bread and butter, if that’s what they’re going to be focused on. That is what they’re looking at.
It also means that one would have to spend some time over there to get a taste of what is really readily available within the country. How do people engage with data coming from abroad? Are people reluctant getting that information? It does seem like they’re getting a bit more reluctant and more aware that they want to do some research if it is actually okay to do so. I actually encourage any Chinese scholar professional to do regardless of the present data restrictions right now that the pandemic is over. Everybody can travel quite freely, do go over there and support to keep those lines of communications open and keep the culture exchanges going. In terms of implications of data sharing, I think it means that not only to receive what they did that needs to operate within the boundaries of the law or be aware of it, but also the sender of the information from outside of China, something that’s quite foreign to us since in most instances most information from abroad has always been quite readily accessible. It’s going to be interesting to see how faculties in China are going to deal with an increase of unofficial requests, emails peer to peer saying, “Hey, I can’t get to CNKI, can you forward me that PDF?” I still need that for my research.” You’re going to see an increase of that demand, which also means an increase of these faculties having really to think about if they’re going to share that or not, since these data points have become restricted to foreign entities.
Lizzi: Fantastic. Thank you so much, Skip. So we also have a question from the audience on buying accounts from websites like Taobao. What are some of the ethical concerns or potential issues associated with purchasing data from the web, like Taobao, and how would you evaluate the quality of the data from those online? Basically ecommerce market- are they valid data? Are they safe to have? Is it ethical to be included in your research? Skip, what’s your thought on that?
Skip: It’s a tricky one, right? Because when if it’s sold on a second hand, you could call it a secondhand marketplace, how well do you know if the person submitting that information to be sold online is reliable? It’s pretty new concept, selling academic information on a secondhand website. It’s a tricky, gray area, like so many areas with Chinese internet or China in general, whether it’s the internet or politics, that it’s a gray area, you can’t really put your finger on it, but it’s definitely something that people are going to have to be aware of. As I said, if you’re outside of China, you want to access that information, go to Taobao and buy it. You might be okay. There might be no repercussions for you, but there might be repercussions for the persons submitting the data online.
Lizzi: Fantastic. Another question from our audience on basically how to obtain a Chinese phone number to be used for verification. I’m going to direct this question to D.J. Our audience member asks whether online temporary Chinese phone numbers would be useful to receive the messages that work for verification. Have you had experience with that?
D.J.: Yes, we’ve played around with trying to obtain some sort of Chinese SIM card to run those verification checks on those sites with high value data. We’ve had very little luck with that. It’s not super easy to obtain one especially when you’re in a place like the United States or somewhere else in Europe. That’s just kind of a common obstacle we face. There are ways around it. It’s easier said than done. Mostly because you have to be in China to obtain a Chinese phone number or a SIM card. That’s technically the rules that you have to follow. You have to submit some personal identifiers when registering for a Chinese phone number. Also those phone numbers will eventually expire. So, it’s hard to find a replicable solution to that problem. Of course, you can try and reach out to someone within China. There’s risks associated with that. Perhaps putting that person in some sort of danger. That’s not a risk that we or other people in this space are willing to take. That’s just some important ethical dilemmas to keep in mind.
Lizzi: Thank you so much, D.J. Another question for you, D.J. How are financial services and banking sectors? I know those sectors heavily depend on data services, how are they coping with the data restrictions in China now?
D.J.: Another good question. I would say that, who to lend money to, who to do business with, who to hire; from these are difficult decisions to make when you don’t have the data to support them. Nowadays, in the China space, it’s not enough to determine whether or not a vendor or a client or potential partner is a legitimate well managed business or institution. In this space, you also have to consider whether or not they’re a state owned enterprise, a front organization, a Defense University, or company that relies on forced labor a company, or that is run by former state bureaucrats. Basic corporate and economic activities like trade, joint ventures, mergers and acquisitions, setting an interest rate, pricing, hiring, background checks. They become burdensome when you can’t do your standard due diligence.
I think financial institutions, insurers banks, they’re relying heavily on their partners in the private sector to overcome this specifically relying on organizations like C4ADS and other nonprofits who specialize in these areas and are more familiar with the Chinese data environment. we can’t expect people to know how to navigate a Chinese government website or interpret a Chinese court record or even use Chinese social media platforms. But that’s where we come in.
Lizzi: So, we also have another question which is related to this idea of data provider besides open source data. The question is, is Singapore emerging as kind of a go between in terms of data transfer? When it comes to financial services, how would you compare what’s happening with open-source intelligence gathering versus data provided by a major for profit data providers, and that goes in both directions coming out of China and going in. To you, D.J. I see. Well, thank you so much.
The next question is for Skip. When it comes to access to overseas data from inside China, how would you describe the current situation and trend? You know, we know people inside China have for years not been able to access Western news sites directly. But now we’re seeing this trend of further restrictions on other kinds of specific Western data sources like US economic data, financial data, or other databases from you and or IMF, which could potentially be very useful for Chinese researchers. What’s the current situation there?
D.J.: Singapore has not come up in my research and analysis as a jurisdiction to find alternative Chinese data sources. I will say that we’ve worked heavily with mirrored data in the past. I would say that in general, most of the Chinese data I work with it comes from one of three places: official government websites, third party data aggregators, and mirrored datasets.
So let’s take trade data as an example. If I’m trying to identify Chinese exports of dual use goods, just as an example, I’ll look for Chinese shipments that are mirrored in the import records of Russia, Myanmar, or another country or trans-shipment site of interest. We’ve been doing this for a while now, especially since the Russian invasion of Ukraine, and it’s helped us identify previously undetected shipments of military equipment produced in China, which is pretty cool. I think that’s a good example of what you’re referring to you know, looking at mirrored datasets of other countries to find Chinese data that is mirrored elsewhere.
Skip: Very interesting. The very first thing people download when they have a smartphone is going to be WeChat and then a VPN. It’s a bit different than a few years ago. Back in the day, most VPN is downloaded outside of China or within China did the trick to access other than Chinese websites when I was there, everyone was using ExpressVPN. Some of you might have heard of it, it was the best. Every once in a while you get a message, saying, “hey, you need to download this plug in or this upgrade to keep it working.” This is because the data regulators were enforcing restrictions or blocks on a VPN. It’s a bit of a gray area. It’s legal to have a VPN but it’s illegal to provide a VPN.
To give you an example, when I was over there at university, there were these standalone terminals with a computer didn’t have to log in, but they were built in VPNs right at the University Library, especially specifically made for not so much for the few foreign students that were there but more for the domestic students, Chinese students, to go online and consume information in order to write what they have to do. Now this might not cut it these days. The best way to do it now is to download a VPN abroad, but this is cumbersome if you’re in China. They really have to struggle and find these ways around it. This is usually what most people figure out and share directly with their friend. There’s a useful website actually called the Circumvention Central. cc.greatfire.org. It’s going to measure the VPN speed against other VPNs in China. The top ones in terms of speed and stability, for example, are going to be Bluecloud and Astrill, two VPNs I’ve never heard about, but there’s no cookie cutter way around it. People have to be creative to go around it and get to that information which they need.
Lizzi: Fantastic. We have lots of questions on VPNs to use, how to set up VPNs, etc. Are there any safe VPNs besides what Skip just told us that have servers within Mainland China? Do you have any thoughts on that?
D.J.: Yeah, I’ve come across a wide range of VPN services that I have personally used in my workflow. You know, just to kind of see if accessing a specific dataset is more feasible. Rerouting my internet traffic through another country. You know, I think VPN services like Bright data, Nord VPN are good go-tos, because you’re able to kind of reroute your traffic through multiple different jurisdictions, just test it out to see which one works best for your specific workflow or problem set. Yes, definitely VPN services are a big part of my day to day here and they have been proven to be very successful when mining for that really high value data that we look for.
Lizzi: Fantastic. We also have a couple of great questions on our research ethics and research safety. The question from the audience is — isn’t this discussion giving Chinese authorities more clues about the gaps in their current control structure, which is already increasingly strict, that they need to fill? In some sense, once we find out a way to get around those restrictions, then they can impose further restrictions on those loopholes? Is there any way to tell the line between doing due diligence and spying which is under increasingly harsh punishment under current China, anti-Chinese and anti-espionage law? The audience also has a question on how should we describe our great mythologies in accessing data in published research, or, if doing that we’ll put that axis at risk.
Skip, I’ll start with you. What’s your thoughts on those?
Skip: It’s very difficult to put your finger exactly on why it restricts. I’ll go back to a more general question, actually it’s a question I ask myself quite often when I teach the corporate section of the course. Why does it go to such extents to restrict corporate data? If they would make it more transparent, people would be signing deals more quickly, the due diligence would be done quicker; KYC, KYB, know your customer and know your business, would be done easier. I don’t have an answer for that. I don’t know why there isn’t enough transparency, you would think they would want to make that to advance their business interests and advance international trade.
In terms of data or corporate research, and in general, there’s this perception that everything is top down and logical. It’s not always and it does take some time and research to figure out when is something going to be restricted? When they’re going to lift a restriction or why do they actually restrict things it’s not uncommon that whether it’s politics or data or something else, that it baffles all the researchers, and then it does a 180 without warning anyone. It’s very unpredictable.
Lizzi: Thank you so much. D.J., any thoughts on that? How should we think about our safety when it comes to research and being being specific about the way we used to access data?
D.J.: Yeah, you got to be smart when you’re operating in this space, it’s important to not show your entire hand because if you do that, you know the site that you’re trying to scrape data from, it might close. I try to be careful what I make. publicly known. Obviously, we want these things to be in the spotlight. Specifically instances of malign investments, defense, trade and other issues that I referred to earlier. You also have to consider that the data we’re working with in this space could be obfuscated or incomplete. That’s why it’s important for analysis to be done by analysts who are language enabled and who have high context to kind of mitigate these issues. I think I could also talk about leaked data here. You have to take leaked data with a grain of salt and also pay attention to the data provenance. Where exactly is that data coming from? Is it reliable? Because we want to be comfortable standing by the analysis that we ultimately conduct on that data. I would say my biggest points here are just you know, being public with the knowledge that you’re gathering and the data that you’re capturing but also being smart about it and not spilling all the beans.
Lizzi: We have another question from our audience on language proficiency. I’m going to direct this to Skip. For many of the still accessible sources proficiency in Chinese it’s usually necessary to understand and use those data properly. Are there quality English language sources that you would recommend, Skip?
Skip: So the common consensus is that one has to be fluent in a language in order to find information in that language. That might be the case when you’re physically reading a book or consuming information. It’s not so much the case online. There are so many tools out there that have really made us crack languages. The one that comes to everyone’s mind is Google Translate, but be careful there, because as one of the things we emphasize in our courses, whether it’s Arabic, Russian or Chinese courses, is to be careful with over reliance on these tools. I usually crack a couple of jokes about that.
For example, when you see a funny menu translated, it’s so funny when it’s funny translated by Google Translate. But, it becomes a problem. You’re playing dangerously when news outlets overly rely on translation tools and just translate wrong things and give that to the public to consume. I see laziness in that and I think it’s something that you don’t need to be proficient in the language to figure out if a translation, we’re talking about in the tools, is off.
In general, what I usually say is if it sounds weird or off, that’s probably because it is and if you’re in the business of presenting the news, or reporting or doing research, you have to spend time on that, right. It sounds weird. It might be an idiom, how do I figure it out? How can I find out if it’s an idiom, right? If you translate some of these idioms literally from whether it’s Chinese to English or or French to Spanish, they’re gonna sound weird. This is a typical examples of political texts being badly translated, which at the end of the day, get the message out in a wrong way. Going back to proficiency, what we often see in businesses, corporations, journalists or other bureaus, is that every time something is found in Chinese is they usually go to that one person that speaks Chinese, and that’s going to be the go to person for Chinese. What you can actually do is find a lot of information yourself in Chinese without speaking the language. Then when is once it gets really difficult that’s when you pivot to a person that’s really proficient that language telling them “hey, listen, I got really far but now it’s getting really to complex, can you take it from here?”
Lizzi: Well, thank you so much, Skip. We also have a few specific questions on what data do you use to do a certain type of research. We have a question for D.J. on accessing social media data sets such as trending searches are mentioned on WeChat, which is a source of data that is of tremendous interest, especially to social scientists. Are there things related to WeChat, similar to Google search or Twitter data that researchers can take advantage of, D.J.?
D.J.: Absolutely. I’d say when it comes to Chinese social media, whatever we have here in the States elsewhere. There’s an equivalent or near equivalent platform in China. So you know, Baidu is Chinese Google. Weibo is Chinese Twitter, and of course WeChat is the paramount one-stop-shop-all social media platform that is widely used in mainland China and frankly around the world as well. I remember using it for virtually everything during my time in China, and if a company is registered in China, for example, it’s likely that they have an official account on these platforms, especially WeChat. Just like companies do anywhere else in the world.
These are pretty interesting sources of biographical and corporate data I’ve noticed especially if that company or individual has been around for a while or is public facing. For example, Chinese social media has become invaluable in our investigations of Chinese wildlife traffickers in Sub-Saharan Africa, who often use these platforms to communicate with their contacts back in China. I’ll just say that social media, social media pages, and official websites can be quite revealing.
Regarding specific trends and mentions, it’s hard for me to say because you can, in theory, use WeChat with a foreign number, not just WeChat, but Weibo, and other Chinese social media platform. But, in order to reap the full benefits of them, and to gain full access to WeChat and its services, you need to register for an account with a Chinese phone number. As I mentioned previously, it’s kind of hard to do. A lot of data rich Chinese websites will allow you to access their data if you register for an account and you can theoretically do that through a WeChat or Weibo account, but again, only if your account was set up with a Chinese phone number in the first place. Again, you might be wondering, why don’t we just go out and get a Chinese SIM card? Well, again, little easier said than done. You technically have to be in China to get one, they eventually expire, and you need to submit personal identifiers like shenfenzheng, passport number if you’re a foreigner, so that’s a common obstacle we face in this space.
Lizzi: I see. Another audience asked about tools to use to gather information on publicly traded Chinese companies, AKA A-Share companies and other private companies. What are the key tools that you rely on that kind of company research and which ones have become unavailable? Skip, do you have any thoughts on that?
Skip: Yeah, so during the corporate research section of the courses we run, I used to have quite a long list of about six or seven really cool data points like that. TianTianCha was there, and my favorite was, with Weimao, you could make these really cool graphs of companies and everything. Then, once every two or three months, one would dry out, and then at the end of the day, you’re ending up with one or two, and then they dry out. There’s one thing that they have in common and it’s keywords. This is the importance of keywords. Whether it’s corporate research, politics, academics, or pharmaceuticals. The identification of a keyword is important. It takes time and it’s not easy, but it’s not impossible. Finding vocabulary lists online of people that are enthusiasts in one field can be done. There’s a ton of information being posted online in English actually, English is still the leading language online. This will probably be very different in a few years, but it’s still very easy or doable to find keywords. That’s one of the things we really emphasize on how to find these keywords. Once you have such a keyword pivoting back to corporate research.
If I remember correctly, D.J. mentioned the unified social credit code. That’s a keyword, right? If you know how to write that keyword, and you know how to write the name of a Chinese company, you might not end up with one of these data points which are drying out, but you’ll probably end up on the website of the company itself. If these data points dry out, that doesn’t mean that websites themselves if companies aren’t going to put their data on their website, there are so many websites out there. Many companies are quite keen actually to put business license information as a PDF or in text on their website. So you have to be creative and that comes back to a couple of things, finding keywords and being proficient in your Google or Baidu operators.
The good news is that these advanced search operators for Google are quite similar to the ones in Baidu. With a couple of keywords and mastery of these operators, you’ll get quite far and actually, that’s one of the things we actually teach you. How do you find those keywords? How do you find those data points?
Lizzi: Fantastic. We also have an audience who’s interested in learning more about interfirm transactions like buyer supplier relationships, Skip. Do you have good resources for that kind of research?
Skip: No, not at the moment, I would just pivot back on what I just said. Finding the right keywords for the buyer supplier and then pivoting back.
Lizzi: Well, thank you so much. Finally, we have a few questions on safety concerns. First is about Hong Kong and whether Hong Kong is still a workable place in terms of data access? How does Hong Kong compare it with mainland China at this point? Is Hong Kong a safe platform to use to access data in mainland China- if the data is no longer in mainland China, but it’s located in Hong Kong? D.J. or Skip? Any thoughts on that?
D.J.: Yeah, I can jump in here. I deal more with data from mainland Chinese sources. However, I have come across some Hong Kong entities in my investigations. I would say that the data in the Hong Kong space, specifically corporate data, is a little bit more accessible. When it comes to data safety and data security while conducting those investigations, I would just do my best to just make sure that that investigation is not tied to any specific sort of individual that may ultimately be put in danger by for example, using their Chinese phone number to access a high value data set and stuff like that. I think you know, knowing the current political and economic climate of China of Greater China, that is is also good to know when operating in this data environment. Hong Kong is a great example of that.
Lizzi: Fantastic. Skip one more question for you. Can you speak to the risk of accessing data through open-source data sources, in terms of the Chinese government’s ability to monitor and track who is looking at what through those sites, whether through a subscription service or through a public interface? What’s the safety concerns or risks associated with that behavior?
Skip: It really depends on what kind of research you do. Are you researching academic information to further your your studies, or do you work for law enforcement, and if you’re going to be searching sensitive things, you have to be aware that, whether it’s China or with another country, you always have to be aware that in principle, what you are looking for the other side is going to be able to see.
There’s this saying — Is the juice worth the squeeze? If you know that you can get to the information, but going to that information and getting that information might actually just put you on the radar. It’s not only applicable to China, but I think it’s applicable on the whole scope of the internet.
Lizzi: I see. Well, thank you so much. Good. Final question for D.J., doing the work that you do now, do you still feel safe that China is a safe option to visit for you?
D.J.: I’d love to go back. I had a really great time in China. I learned a lot. It’s a great place to learn and grow as a person. I could definitely see myself going back one day I think. I think we’re always trying to build partnerships with foreign governments, especially here at C4ADS. I would love to one day build a bridge between us and them, to collaborate on these issues together. But I don’t see that being possible in the near future. However, I remain hopeful that it will be one day.
Lizzi: What I hear from you on the ground is that there’s also this growing trend of obtaining export license of the data that you want to use. Currently it’s not fully formulated in terms of legislation, but in the future that could be a further trend. If you want to obtain this piece of data. You need to go through this whole administrative process, making sure the data you want to use is safe to use and that it’s okay to be transferred outside of the country or to be published. Once that process is done, the application is crossed off, then researchers will have more transparent access to the data they want. But I think the legislation on that front is still very much a moving process. We’ll need to see further actions and movements on that front.
We are close to the end of today’s conversation. Thank you so much for joining us today in this super interesting, very insightful discussion on open-source intelligence in the context of involving data restrictions in China. We, myself included, have had the privilege of hearing from true to true experts on this front, Skip and D.J. They have shared valuable insights and expertise. Please also remember that the conversation we have today doesn’t end here. If you have further questions or would like to explore those topics in greater length, please feel free to reach out to our panelist or to The China Project team for additional resources.
We are also very excited to announce ChinaEDGE, which is a new open-source intelligence product powered by The China Project. It provides real-time data — over 9 million Chinese companies. Bob will have more to say on that front.
Also, in the coming weeks, there will be further events to continue our discussion today on open-source research. The next event is about complying with the new U.S. laws, how to vet Chinese suppliers, customers, partners and investments, and it will be held online the second week of October, so please stay tuned for further updates from The China Project team. Once again, thank you so much for being part of this engaging conversation. We look forward to seeing you at further future webinars and events. Please stay curious and stay informed. Bob, off to you.
Bob Guterma: Thanks so much, Lizzie, D.J. and Skip. That was a fascinating conversation. I want to take just two or three minutes of everyone’s time before you drop off to tell you a little bit more about why we hosted this webinar and we’ll be hosting similar ones like it.
For those of you who don’t know me, my name is Bob Guterma, and I’m the CEO here at The China Project. This topic of open-source intelligence is near and dear to our hearts because it’s basically what journalism has become, especially for smaller outfits like us. It’s what most China journalists are forced to do as the number of on the ground reporters and also academic researchers for that matter in China is dwindling, and the freedom of those left there is at constant risk.
In a world where primary research and on-the-ground investigation or even human-to-human interaction, as we heard about during this meeting today, are evermore constrained, open-source intelligence is more important than it has ever been. About three years ago, we saw this happening to us and our ability to get information and sources in China. So we began ideating around data products that could solve systemic information gaps between China and the world.
After a couple of soft launches and trial projects in recent years, we’re excited to announce that we are bringing to market in the next week or two, the newest version of our open-source intelligence product called ChinaEDGE. It features some of the information we’ve talked about today, namely the corporate registered details shareholding structures, directors and officers, operating histories, legal histories, legal alerts, and other official data on more than 9 million Chinese companies. It will include publicly traded companies, private companies, state owned enterprises, and the data will be available in English real-time updated directly from Chinese public records on a global basis.
Perhaps most interestingly, related to today’s conversation, there was a lot of talk about VPNs and whether what you’re doing will be perceptible to the original sources of information and our product will not require an a VPN to access. Because of the data channel partners we work with to legally, compliantly, and commercially procure the data, anything you’re looking at will be joining a flow of data literally four to five orders of magnitude larger than even what we’re doing, let alone what you’re doing. So what you’re looking at won’t stick out like a sore thumb so to speak, or, you know, at least it probably won’t — I don’t know what you’re looking at. Anyway, the point is that it in and of itself will be a feature of the product. You will be firewalled, for lack of a better term, from the places you’re looking at.
We’re going to be reaching out to all of you in the coming weeks to see if any of you would be interested in taking a tour of the new product that we’re launching. In the meantime, we’re going to continue to hold a series of events on this topic or topics like it to continue to connect, inform, and provide solutions to researchers who are facing difficulties conducting open source research on China. As Lizzi mentioned, the next event is called Complying with new U.S. laws: How to vet Chinese suppliers, customers, partners, and investments, without breaking Chinese laws. It will take place in the first week or so of October. We’ll send you this specific date in the next day or two here. It will go a level deeper than this, into specific processes and methodology depending on what you’re looking at. Thank you again for joining us. Thank you to our speakers and we look forward to hearing from or speaking with all of you soon.
Lizzi: Thank you, great. Thank you so much.