Vidu, a text-to-video large AI model capable of creating a 16-second, high-definition video in 1080p resolution with a single click, was unveiled on Saturday at the 2024 Zhongguancun Forum in Beijing.
Developed by Tsinghua University and Chinese AI firm ShengShu Technology, Vidu is China's first video large AI model with extended duration, exceptional consistency and dynamic capabilities.
Vidu not only could simulate the real physical world with precision, but it also possesses rich imagination, featuring multi-camera generation and high spatio-temporal consistency, said Zhu Jun, vice dean of the Institute for Artificial Intelligence at Tsinghua University.
"For example, the dust billowing up we see as the car moves, and the sunlight and shadow effects at different times of the day, can be rendered quite realistically. Another feature is, it can comprehend some of the language of multiple camera angle utilization, and focusing and light tracking effects, even some illusory scenes you can envision," Zhu explained.
As a large AI model developed in China, Vidu is able to understand and generate Chinese content such as giant panda and dragon.
It emerges as the pioneering breakthrough in video modeling after release of Sora, which is developed by the U.S.-based developer OpenAI.
"Vidu utilizes our proprietary technology architecture. Sora does not disclose its technical roadmap. We have been developing core technologies such as deep artificial intelligence and diffusion models independently," Zhu said.
The company said that Vidu's core architecture was proposed as early as in 2022.
The development of Vidu made major progress in January by generating four-second video and eight-second video in March.
Although there is a duration gap compared to Sora's ability to generate one-minute video, Zhu revealed that the technology roadmap has been realized over the past two months, and Vidu is iterating at a faster pace.
The release of Vidu has once again ignited deeper discussions about AI among the public, encompassing hopes, concerns, and visions for the future.
"AI has yet to achieve the capability of generating full-length films with a single click. We still have a long way to go. However, once certain conditions are met, we will undoubtedly make it available to everyone. In this process, we need to make it controllable, safe and accountable. In our country, development and governance are valued equally and we have actually attached equal importance to both of them," Zhu said.