Data Collection in Communist China and Indian Statisticians
By Ashok K Nag, consultant and retired adviser Reserve Bank of India*
In 1949, after the communist party took over and set up the Peoples Republic of China (PRC,) the country embarked on ambitious reconstruction plans to revive an economy recently ravaged by its civil war and previous plundering by the Japanese occupation. One of the early challenges the party leaders faced was rebuilding a reliable set of data about economic, demographic, defense, geological and other factors. Without such good statistics, devising policies and measuring their impact would have little value.
For a brief period, the Chinese government sought solutions from Indian statisticians. This happened after premier Zhou Enlai visited the Indian Statistical Institute in Kolkata in 1956, where he found value in the work of Prasanta Chandra Mahalanobis and his team.
Making it count
The collection and analysis of official data in China were shaped by decades of ideological as well as epistemological struggles between statisticians, who wanted to use the best scientific methods, and the political goals of their communist masters. This fascinating story is recounted by Arunabh Ghosh in his book Making it Count: Statistics and Statecraft in the Early People’s Republic of China. Having conducted research in Beijing, Guangzhou, New Delhi, and Kolkata, he digs deep into original Chinese materials and Indian sources to tell the tale.
The book, published by the Princeton University press, is based on Ghosh’s Ph.D. thesis at Columbia University, New York. He also studied at Haverford College, near Philadelphia, and at Tsinghua University in Beijing. Ghosh teaches history at Harvard University. His research projects include a history of dam and reservoir construction in twentieth century China and a history of China-Indian networks of science during 1920-1980.
A bourgeois discipline
The etymology of the word “statistics” reveal that its subject matter roots was “counting for the state.” From a communist perspective the “State” is an instrument of oppression by one dominant class over other classes. So, ipso facto, a discipline presumably used by a state to count data can be colored by the interest of the ruling class. So, communists in China, the Soviet Union and elsewhere looked down upon the discipline as “bourgeois statistics.”
But any government, communist or capitalist, needs good data to formulate, implement and measure the effectiveness of policies as well as for the purposes of day to day administration. Ironically, use of state compiled data, also known as administrative or register data, is becoming more and more important in the current Big Data wave in all countries, irrespective of their ideological mooring. Numerous operations, from housing construction and business loans to detecting fraud and security risks in web networks, rely on state collected data.
Focus on data collection
While chronicling the story of the struggle between the science and politics of statistics, Ghosh classifies its methodologies into three groups, namely, “the Ethnographic,” “the Exhaustive,” and “the Stochastic”. This methodological classification is useful for him to describe the socialist critique of “bourgeois statistics.”
However, I find the classification somewhat convoluted as it only deals with the data collection component of statistical methodology and completely ignores the other two equally important methods of data summarizing and drawing inferences from data samples. If, for instance, you want to study the demographics and occupations of households in a village, you can start with data collection. Then, using clustering and other techniques, you can summarize the data according to household size. Further, with regression analysis, you can draw various inferences from the data.
Data collection refers to an immersive study of the object of inquiry. This work Is primarily qualitative, similar to what anthropologists routinely undertake. Counting of numbers is only a part of this process. Data summarizing is essentially a complete enumeration of the entire population of units/people which are being studied. A good example of this is the government statistical systems which collect data generated in the course of routine operations of the state such as births, employment and tax collection. The statisticians involved in such work have no input into decisions about what data is being collected.
Adopting a centralized Soviet-style system
Ghosh begins by describing how a centralized statistical system first came to be set up in post-communist China. The Chinese statisticians, bureaucrats and politicians vehemently argued about the appropriateness and the political correctness of data summarization. They viewed it as superior to probability sampling, the third statistical method of drawing inferences, which they vilified as the “bourgeois method.”
The centralized system used by the Chinese was adopted from fellow communists in the Soviet Union. This was implemented especially in Northeast China with help from Soviet experts. Key Chinese officials described the process as building a “New Statistics for a New China.” Shorn of jargons, there was nothing new in the systems being built.
Similarly, the champions of a centralized system of data collection, referred to as the “Exhaustive” method, hailed it as “socialist statistical work.” But, in reality, this reflected a fundamental misunderstanding of statistical methodologies. This becomes clearer when Ghosh discusses “Measures of Central tendency.” The Chinese statisticians were debating about the merits and demerits of various measures like Arithmetic Mean (AM,) Geometric Mean (GM,) Harmonic Mean (HM) and Median.
Ghosh notes that while Chinese statisticians debunked GM as “bourgeois statistics” they considered it to have less bias than AM and HM. But there was no discussion about the nature of the bias and its dependence on the presumed distribution of the variable under consideration. The Chinese statisticians were against any mathematical formalism, the mathematical formulation of any technique, because they considered it as a bourgeois practice.
Cracks in the Centralized System
Within five years of its implementation, major fault lines became evident in PRC’s centralized system. Insistence on completeness of enumeration, for instance, led to excessive report generation at the cost of accuracy of data. Ghosh traces the gradual realization of the cost of generating inaccurate data, particularly for an economy embarking on centralized planning, among communist party, government, industry and agricultural officials.
There were two official responses to the growing crisis, Ghosh writes. The first addressed the issue of expanding the capacity of intellectual skills. The earlier complete neglect of the probabilistic foundation of statistical methodology, “bourgeois statistics,” had created an acute scarcity of professionals with adequate knowledge of the sampling methodology.
Seeking Indian experts
Second, the value of sampling became clear to premier Zhou Enlai and other Chinese officials. In 1956, during a visit to India, Zhou chatted about this with experts at the Indian Statistical Institute (ISI) in Kolkata, ISI* was founded in 1931 by P.C. Mahalanobis, a physicist turned statistician who studied at Cambridge University. It was funded by the British administration which ruled India at that time. ISI soon gained an international reputation, particularly in the field of sampling design and conduct of large-scale nationwide sample surveys. Mahalanobis was also the architect of India’s second five-year economic plan (1956-1961.)
During his visit, Zhou Enlai was impressed with ISI’s experience in conducting nationwide surveys. He proposed and set up exchange visits for leading statisticians from the two countries. Soon after Zhou’s visit, a team of senior Chinese statisticians spent a month at ISI. The next year, Mahalanobis and D.B. Lahiri, his colleague and sampling expert, visited China.
During these bilateral exchanges, it was obvious that the Chinese leaders were deeply interested in learning about random sampling methodology and its probabilistic foundation. At that time, it felt that science had prevailed over dogma, says Ghosh. But it was not to be.
Great Leap Forward statistics under Mao
The earlier dogma of centralized “socialist statistics” was dumped because it “fundamentally crippled the ability of leadership to assess reliable data.” It was replaced, not by the “stochastic” methodology of random sampling used by Mahalanobis and his team, but by a method blessed by Chairman Mao Zedong.
In 1958, the “Great Leap Forward” (GLF) movement launched by Mao brought about another radical restructuring of the Chinese statistical system, an era of “Great Leap Forward statistics.” It pushed through a complete decentralization of the data collection process and used the “ethnographic“ method of subjective sampling.
Within three years of its launch, it became obvious the GLF movement had failed on most fronts, including in data collection. There were then attempts to have skilled professionals manage the country’s statistical systems. But this effort ended in 1966, after Mao launched his Cultural Revolution. Instead Mao’s preferred policy of data collection by subjective sampling (diaocha baogao or investigation report in Chinese) survived till the end of his era.
Chinese statisticians bow to communist dictates
Ghosh covers only two technical issues of measurement in depth. These are measures of central tendency and index numbers. It seems absurd that a controversy can arise over the use of these two standard methods of summarizing data, though they both have their own inadequacies.
But the Chinese were less concerned about the methods themselves than they were about their uses. In the case of central tendency, for instance, Gross Domestic Product (GDP) per capita in a country does not reveal anything about inequality of income. So, a high GDP per capita may conceal the fact that a large proportion of a country’s population does not have access to two full meals a day. But that does not make “GDP per capita” an example of “bourgeois statistics.”
Similarly, from a scientific point of view, it made little sense for the Chinese to condemn Fisher’s Ideal Index number. The index though, being a geometric mean (GM) of two index numbers, was viewed as another bourgeois technique, like all GM numbers.
On both these topics, Ghosh’s uncritical reproduction of the Chinese viewpoints somehow betrays a reluctance to assess their inherent scientific value. In fact, the construction of index numbers continues to be an important tool, with several research articles published on it each year.
But in Ghosh’s defense, he does point out that Chinese statisticians, who had or sought political patronage, were eager to tailor their views to appease their communist masters. For instance, they disowned the mathematical formulation of any techniques, in keeping with communist orthodoxy. Ghosh also gives examples of Chinese statisticians, who had trained abroad, publicly recanting their earlier use of mathematical formalism.
Insights for statisticians
Despite amassing a treasure trove of primary materials, Ghosh has not been able to establish his main contention that in communist China, “statistics, a putatively neutral field, became the site for a fundamental theoretical battle about the nature of social reality.” It was a battle no doubt, but not one between a political dogma and a questioning mind. There is nothing political about the three methodologies of data collection which he discusses. Each has its own use, depending on the objectives of a study, as well as its shortcomings. The debate about their practical flaws, is completely superfluous, except to the communists and perhaps other politicians seeking to manipulate data to their advantage.
Overall though Ghosh’s Making It Count is a must-read for those wanting to understand how the Chinese statistical system evolved under the communist regime. In the process the book also offers valuable insights for statisticians who want to figure out how politics and politicians impact their work.
As a statistician, I find it comforting that each year since 1979, hundreds of doctoral degrees are being awarded in China in the fields of probability and mathematical statistics. Science has prevailed over dogma in China – at least for now.
*Ashok Kumar Nag is an alumnus of the Indian Statistical Institute, Kolkata, India. After spending more than two decades in the Statistics and Information Management department of the Reserve Bank of India he retired as an adviser. Currently he works as consultant in the area of information management and data analysis. Link
To receive Global Indian TImes stories each week:
send your name and email to: gitimescontact@gmail.com
Or connect through Twitter or Facebok:
https://twitter.com/GlobalIndianTi2
fb.me/GlobalIndianTimes