Alibaba Cloud is putting money into a new kind of artificial intelligence that aims to better mimic the real world, using a different approach than chatbots like OpenAI’s ChatGPT. This shift acknowledges the limitations of “large language models” (LLMs), which are mainly trained on text. Instead, developers are starting to focus more on “world models” built from videos and real-life physical situations.
To join this trend, Alibaba led a 2 billion yuan ($290 million) investment in ShengShu, the startup behind the AI video generation tool Vidu, the company announced Friday. TAL Education and Baidu Ventures also took part in this Series B funding round. This investment comes about two months after ShengShu raised 600 million yuan from Qiming Venture Partners and other supporters. The startup did not reveal its valuation.
ShengShu stated that the new funding will help develop a “general world model.” This model uses AI to connect two currently separate areas: the digital world of games and AI-generated video, and the physical world of self-driving cars and robots. “ShengShu believes that a general world model, built on data from various senses like sight, sound, and touch, more naturally captures how the physical world works than large language models,” the three-year-old startup explained.
“We aim to connect what AI perceives with how it acts,” added Zhu Jun, founder of ShengShu. This would allow AI systems to model and predict real-world behavior consistently. ShengShu’s latest Vidu Q3 Pro model, released in January, is ranked among the top 10 AI models for creating videos from text and images, according to Artificial Analysis.
The company launched Vidu globally months before OpenAI made its now-closed Sora tool for AI video generation widely available. Chinese short-video companies Kuaishou and ByteDance have also released similar competing AI tools for generating videos.










