China’s DeepSeek launches next-gen AI model. Here’s what makes it different

Anna Barclay | Getty Images News | Getty Images

Chinese startup DeepSeek’s latest experimental model promises to increase efficiency and improve AI’s ability to handle a lot of information at a fraction of the cost, but questions remain over how effective and safe the architecture is.  

DeepSeek sent Silicon Valley into a frenzy when it launched its first model R1 out of nowhere last year, showing that it’s possible to train large language models (LLMs) quickly, on less powerful chips, using fewer resources.

The company released DeepSeek-V3.2-Exp on Monday, an experimental version of its current model DeepSeek-V3.1-Terminus, which builds further on its mission to increase efficiency in AI systems, according to a post on the AI forum Hugging Face.

“DeepSeek V3.2 continues the focus on efficiency, cost reduction, and open-source sharing,” Adina Yakefu, Chinese community lead at Hugging Face, told CNBC. “The big improvement is a new feature called DSA (DeepSeek Sparse Attention), which makes the AI better at handling long documents and conversations. It also cuts the cost of running the AI in half compared to the previous version.”

“It’s significant because it should make the model faster and more cost-effective to use without a noticeable drop in performance,” said Nick Patience, vice president and practice lead for AI at The Futurum Group. “This makes powerful AI more accessible to developers, researchers, and smaller companies, potentially leading to a wave of new and innovative applications.”

a bubble forming, AI remains at the centre of geopolitical competition with the U.S. and China vying for the winning spot. Yakefu noted that DeepSeek’s models work “right out of the box” with Chinese-made AI chips, such as Ascend and Cambricon, meaning they can run locally on domestic hardware without any extra setup.

DeepSeek also shared the actual programming code and tools needed to use the experimental model, she said. “This means other people can learn from it and build their own improvements.”

But for Almasque, the very nature of this means the tech may not be defensible. “The approach is not super new,” she said, noting the industry has been “talking about sparse models since 2015” and that DeepSeek is not able to patent its technology due to being open source. DeepSeek’s competitive edge, therefore, must lie in how it decides what information to include, she added.

The company itself acknowledges V3.2-Exp is an “intermediate step toward our next-generation architecture,” per the Hugging Face post.

As Patience pointed out, “this is DeepSeek’s value prop all over: efficiency is becoming as important as raw power.”

“DeepSeek is playing the long game to keep the community invested in their progress,” Yakefu added. “People will always go for what is cheap, reliable, and effective.”

[title_words_as_hashtags

Leave a Reply

Your email address will not be published. Required fields are marked *