The success of DeepSeek is a testament to the power of engineering innovation in AI development, said Gao Wen, an academician at the Chinese Academy of Engineering and director of the Pengcheng Laboratory director, in an interview with China Central Television (CCTV) aired on Sunday.
DeepSeek-R1, a large language model developed by Hangzhou DeepSeek Technology, has garnered global attention for its remarkable performance comparable to top-tier international models, while achieving significantly lower development costs, one-thirtieth of similar products.
The DeepSeek series models have been made available on open-source platforms for domestic developers to test and verify, based on the Pengcheng Laboratory led by Gao, who is also a deputy to the National People's Congress, China's national legislature.
The success of DeepSeek-R1 is partly attributed to the robust infrastructure and technological advancements in China's computing power network. In 2022, the first phase of the "China Computing Power Network," known as the "Intelligent Computing Network," was officially launched.
This network connects and manages over 20 computing power centers of different types across different locations, with its aggregate computing power gradually increasing to 5E Flops, equivalent to 500 trillion calculations per second.
One of its key computing power hub nodes is the "Pengcheng Cloud Brain II," the Artificial Intelligence (AI) computing platform of the Pengcheng Laboratory.
It has a peak computing power of 100 billion calculations per second and began operation in 2020. This represents a tenfold increase in computing power compared to its predecessor, "Pengcheng Cloud Brain I," which could perform 100 trillion calculations per second. The upgrade was completed in just one year.
Gao said that the growth is mainly driven by the large demand of language model analysis.
"When developing 'Pengcheng Cloud Brain I,' the focus was on discriminative AI, which is primarily used for image recognition tasks, such as identifying individuals in photos. This type of AI typically requires less computing power, requiring just 100 Peta. As we've calculated, language models require higher computing and storage capabilities due to the vast availability of language data. As a result, the computing power for language processing needs to be 10 times greater than that for image processing," he said.
"Pengcheng Cloud Brain II" has achieved remarkable success in global high-performance computing benchmarks. It has clinched the top spot nine times in a row in the IO500 overall ranking, which measures data throughput capabilities of high-performance platforms. It has also topped the international AI computing power performance AIPerf500 ranking for four consecutive sessions.
Based on "Pengcheng Cloud Brain II," the Pengcheng Laboratory has built an AI training platform capable of handling ultra-large-scale AI models with hundreds of billions of parameters. "Pengcheng Mind" is one such ultra-large-scale natural language processing model trained and operated on "Pengcheng Cloud Brain II."
Reflecting on DeepSeek's success, Gao emphasized the importance of engineering optimization in AI development. By focusing on efficient training and deployment strategies, DeepSeek has set a new benchmark for large language models.
"Actually, this is where DeepSeek's ingenuity lies. The technology behind ‘Pengcheng Mind’ and ChatGPT is exactly the same. There is a model called the attention mechanism. For example, when a computer processes an article, it forgets the beginning by the time it reaches the end. However, GPT is a transformer. It invented the attention mechanism, or attention model, allowing it to focus on relevant information while filtering out the unnecessary, or zeroing in on what matters most and disregarding the trivial," he said.
"In terms of engineering, DeepSeek has done something that no one else can do. What is its technical approach? It's called a Mixture of Experts (MoE) system. DeepSeek has done this by training it in specific domains with specific expressions, so the training cost is not that high. It has a total of 256 expert models, but you don't need to load all 256 when using it; you can get by with just eight at most. This means the cost of using it is very low, and the training time can be saved. I believe DeepSeek is not an innovation in theory, but more in engineering," he added.
Cloud computing lab director analyzes DeepSeek's engineering breakthrough in AI development
Cloud computing lab director analyzes DeepSeek's engineering breakthrough in AI development
