With Twitter and Facebook blocked in China, the stream of information from Chinese domestic social media provides a case study of social media behavior under the influence of active censorship. While much work has looked at efforts to prevent access to information in China (including IP blocking of foreign Web sites or search engine filtering), we present here the first large–scale analysis of political content censorship in social media, i.e., the active deletion of messages published by individuals.
In a statistical analysis of 56 million messages (212,583 of which have been deleted out of 1.3 million checked, more than 16 percent) from the domestic Chinese microblog site Sina Weibo, and 11 million Chinese–language messages from Twitter, we uncover a set a politically sensitive terms whose presence in a message leads to anomalously higher rates of deletion. We also note that the rate of message deletion is not uniform throughout the country, with messages originating in the outlying provinces of Tibet and Qinghai exhibiting much higher deletion rates than those from eastern areas like Beijing.
1. Introduction 2. Internet censorship in China 3. Microblogs 4. Message deletion 5. Twitter vs. Sina comparison 6. Search blocking 7. Deletion rates of politically sensitive terms 8. Geographic distribution 9. Conclusion
Much research on Internet censorship has focused on only one of its aspects: IP and DNS filtering within censored countries of Web sites beyond their jurisdiction, such as the so–called “Great Firewall of China” (GFW) that prevents Chinese residents from accessing foreign Web sites such as Google and Facebook (FLOSS, 2011; OpenNet Initiative, 2009; Roberts, et al., 2009), or Egypt’s temporary blocking of social media Web sites such as Twitter during its protests in early 2011.
Censorship of this sort is by definition designed to be complete, in that it aims to prevent all access to such resources. In contrast, a more relaxed “soft” censorship allows access, but polices content. Facebook, for example, removes content that is “hateful, threatening, or pornographic; incites violence; or contains nudity or graphic or gratuitous violence” (Facebook, 2011). Aside from their own internal policies, social media organizations are also governed by the laws of the country in which they operate. In the United States, these include censoring the display of child pornography, libel, and media that infringe on copyright or other intellectual property rights; in China this extends to forms of political expression as well.
The rise of domestic Chinese microblogging sites has provided an opportunity to look at the practice of soft censorship in online social media in detail. Twitter and Facebook were blocked in China in July 2009 after riots in the western province of Xinjiang (Blanchard, 2009). In their absence, a number of domestic services have arisen to take their place; the largest of these is Sina Weibo (http://www.weibo.com), with over 200 million users (Fletcher, 2011).
We focus here on leveraging a variety of information sources to discover and then characterize censorship and deletion practices in Chinese social media. In particular, we exploit three orthogonal sources of information: message deletion patterns on Sina Weibo; differential popularity of terms on Twitter vs. Sina; and, terms that are blocked on Sina’s search interface. Taken together, these information sources lead to three conclusions:
External social media sources like Twitter (i.e., Chinese language speakers outside of China) can be exploited to detect sensitive phrases in Chinese domestic sites since they provide an uncensored stream for contrast, revealing what is not being discussed in Chinese social media.
While users may be prohibited from searching for specific terms at a given time (e.g., “Egypt” during the Arab Spring), content censorship allows users to publish politically sensitive messages, which are occasionally, though not always, deleted retroactively.
The rate of posts that are deleted in Chinese social media is not uniform across the entire country; provinces in the far west and north, such as Tibet and Qinghai, have much higher rates of deletion (53 percent) than eastern provinces and cities (ca. 12 percent).
Note that we are not looking at censorship as an abstraction (e.g., detecting keywords that are blocked by the GFW, regardless of the whether or not anyone uses them). By comparing social media messages on Twitter with those on domestic Chinese social media sites and assessing statistically anomalous deletion rates, we are identifying keywords that are currently highly salient in real public discourse. By examining the deletion rates of specific messages by real people, we can see censorship in action.
2. Internet censorship in China
MacKinnon (2011) and the OpenNet Initiative (2009) provide a thorough overview of the state of Internet filtering in China, along with current tactics in use to sway public discourse online, including cyberattacks, stricter rules for domain name registration, localized disconnection (e.g., Xinjiang in July 2009), surveillance, and astroturfing (Bandurski, 2008).
Prior technical work in this area has largely focused on four dimensions. In the security community, a number of studies have investigated network filtering due to the GFW, discovering a list of blacklisted keywords that cause a GFW router to sever the connection between the user and the Web site they are trying to access (Crandall, et al., 2007; Xu, et al., 2011; Espinoza and Crandall, 2011); in this domain, the Herdict Project (http://www.herdict.org) and Sfakianakis, et al. (2011) leverage a global network of users to report unreachable URLs. Villeneuve (2008b) examines the search filtering practices of Google, Yahoo, Microsoft and Baidu in China, noting extreme variation between search engines in the content they censor, echoing earlier results by the Human Rights Watch (2006). Knockel, et al. (2011) and Villeneuve (2008a) reverse engineer the TOM–Skype chat client to detect a list of sensitive terms that, if used, lead to chat censorship. MacKinnon (2009) evaluates the blog censorship practices of several providers, noting a similarly dramatic level of variation in suppressed content, with the most common forms of censorship being keyword filtering (not allowing some articles to be posted due to sensitive keywords) and deletion after posting.
This prior work strongly suggests that domestic censorship in China is deeply fragmented and decentralized. It uses a porous network of Internet routers usually (but not always) filtering the worst of blacklisted keywords, but the censorship regime relies more heavily on domestic companies to police their own content under penalty of fines, shutdown and criminal liability (Crandall, et al., 2007; MacKinnon, 2009; OpenNet Initiative, 2009).
Chinese microblogs have, over the past two years, taken front stage in this debate, both in their capacity to virally spread information and organize individuals, and in several high–profile cases of government control. One of the most famous of these occurred in October 2010, when a 22–year–old named Li Qiming killed one and injured another in a drunk driving accident at Hebei University. His response after the accident — “Go ahead, sue me if you dare. My dad is Li Gang!” (deputy police chief in a nearby district) — rapidly spread on social media, fanning public outrage at government corruption and leading censors to instruct media sources to stop all “hype regarding the disturbance over traffic at Hebei University” (Xiao, 2011; Wines, 2010). In December 2010, Nick Kristof of the New York Times opened an account on Sina Weibo to test its level of censorship (his first posts were “Can we talk about Falun Gong?” and “Delete my weibos if you dare! My dad is Li Gang!” (Kristof, 2011b). A post on Tiananmen Square was deleted by moderators within twenty minutes; after attracting the wider attention of the media, his entire user account was shut down as well (Kristof, 2011a).
Beyond such individual stories of content censorship, there are a far greater number of reports of search censorship, in which users are prohibited from searching for messages containing certain keywords. An example of this is shown in Figure 1), where an attempt to search for “Liu Xiaobo” on 30 October 2011 is met with a message stating that, “according to relevant laws, regulations and policies, the search results were not shown.” Reports of other search terms being blocked on Sina Weibo include “Jasmine” (sc. Revolution) (Epstein, 2011) and “Egypt” (Wong and Barboza, 2011) in early 2011, “Ai Weiwei” on his release from prison in June 2011 (Gottlieb, 2011), “Zengcheng” during migrant protests in that city in June 2011 (Kan, 2011), “Jon Huntsman” after his attendance at a Beijing protest in February 2011 (Jenne, 2011), “Chen Guangcheng” (jailed political activist) in October 2011 (Spegele, 2011) and “Occupy Beijing” and several other place names in October 2011 following the “Occupy Wall Street” movement in the United States (Hernandez, 2011).
4. Message deletion