China is stepping up efforts to cultivate both upstream and downstream enterprises in the data industry chain, aiming to expand the sector to a new high of 7.5 trillion yuan (about 1.04 trillion U.S. dollars) by 2030 and driving breakthrough development of artificial intelligence (AI) technology.
As the world's first country to include data into production factors, China has initially established a complete data industry chain, according to officials at the Data Security Development Conference 2025, which was held from Friday to Sunday in Wenzhou City of east China's Zhejiang Province.
Data show that in 2024, the total amount of data generated nationwide reached 41.06 zettabytes, representing a year-on-year growth of 25 percent.
Up to now, there are over 190,000 enterprises related to the data field across the country, with an industry scale of more than 2 trillion yuan (about 277.51 billion U.S. dollars). Based on an average annual growth rate of over 20 percent, the size is expected to reach 7.5 trillion yuan by 2030.
"At present, we are planning to build a horizontally connected, vertically integrated and powerfully coordinated data infrastructure system. The main structure of the national data infrastructure is expected to be basically completed by 2029," said Liu Liehong, director of the National Data Administration (NDA).
Open sharing of public data has become an important breakthrough for the marketization of data. In 2024, the number of open platforms of public data in city-level regions across China increased by 7.5 percent, with the volume of open data up 7.1 percent and the number of high-quality datasets up 27.4 percent.
In terms of the integration of data and industries, China is accelerating the removal of barriers to the open sharing of public data, promoting the deep integration of public data and enterprise data, and activating a vast amount of "dormant data."
Data have surpassed traditional production factors and become the core driving force for breakthroughs in AI technology and industrial transformation. High-quality datasets are not only the cornerstone for the performance leap of artificial intelligence models, but also reshape the entire industrial chain from technology research and development to commercial application.
In Wenzhou, as a "test field" for the national market-oriented reform of data, a data security and compliance system has been established to ensure large-scale flow of data and form a data trading ecosystem, enabling more data to be utilized.
"We have developed 469 practical, user-friendly and secure data products. A number of high-quality datasets have been built in fields such as healthcare, transportation and the low-altitude economy," said Jin Chuanla, deputy head of the Wenzhou Municipal Data Bureau.
Building a large model dataset mainly includes core links such as data collection, data cleaning, data annotation, and quality assessment. Each step requires targeted technological research and development and adaptation based on the characteristics of large model datasets, such as their large scale, sufficient diversity, and strong vertical industry attributes.
Data annotation and cleaning are the key links in the construction of high-quality datasets. Data annotation teaches AI to "understand the world" through "labeling," such as tagging photos with "cats" or "dogs". Unlabeled data are like garbled textbooks, preventing AI from learning effectively. Data cleaning purifies data through eliminating duplicates and correcting errors. Chaotic data will directly affect the training effect.
"Only when data can cover a wide enough range of scenarios and are professionally labeled, can AI models break through the precision limit in laboratory and truly acquire the ability to be applied in industries, driving development of the digital economy," said Liu Quan, deputy chief engineer of the China Center for Information Industry Development (CCID) affiliated to the Ministry of Industry and Information Technology.
With the iteration of AI and large model technologies, the output value of China's data annotation industry has exceeded 8 billion yuan (about 1.11 billion U.S. dollars), and the construction of high-quality data has entered a new stage of large-scale and standardized development, according to a report released at the 2025 Data Security Development Conference.
Last year, the number of companies in China developing or applying AI grew by 36 percent year-on-year, while the number of high-quality datasets rose by 27.4 percent, providing strong support for the training and application of AI.
Also in 2024, data technology enterprises and data application enterprises that utilize large models increased by 57.21 percent and 37.14 percent respectively year on year.
"The parameters of China's big data have reached the level of hundreds of billions. The construction of seven data annotation bases has been promoted across the country, and 335 high-quality datasets in fields such as healthcare, industry and education have been established, with a total annotation scale of 1.7 trillion terabytes, supporting the research and development of 121 domestic large models," said Liu Wenqiang, vice president of the CCID.
China's data industry expected to reach new high by 2030, driving AI development
