Deepseek : analysis on american over reaction
This week, Silicon Valley, Wall Street, and Washington were all fixated on one thing: DeepSeek. Earlier this month, the Chinese artificial intelligence (AI) company debuted a free chatbot app that stunned many researchers and investors. While there is a lot of uncertainty around some of DeepSeek’s assertions, its latest model’s performance rivals that of ChatGPT, and yet it appears to have been developed for a fraction of the cost.
Using creative methods to increase efficiency, DeepSeek’s developers seemingly figured out how to train their models with far less computing power than other large language models. As a result, they say, they were able to rely more on less sophisticated chips in lieu of more advanced ones made by Nvidia and subject to export controls.
On Monday, American tech stocks tumbled as investors reacted to the breakthrough. (Prices recovered partially later in the week.) If a Chinese upstart mostly using less advanced semiconductors was able to mimic the capabilities of the Silicon Valley giants, the markets feared, then not only was Nvidia overvalued, but so was the entire American AI industry.
Some also argued that DeepSeek’s ability to train its model without access to the best American chips suggests that U.S. export controls are ineffective, or even counterproductive. Many have called the DeepSeek shock a “Sputnik moment” for AI—a wake-up call that should sow doubt about U.S. competitiveness and spur renewed investment to secure America’s lead. Others view this as an overreaction, arguing that DeepSeek’s claims should not be taken at face value; it may have used more computing power and spent more money than it has professed.
Why was there such a profound reaction to DeepSeek? And what does it mean for U.S.-Chinese competition? To make sense of this week’s commotion, I asked several of CFR’s fellows to weigh in.
Sebastian Elbaum, Technologist-in-Residence, explained how DeepSeek was able to match the performance of other AI models while incurring far lower training costs
Paradoxically, some of DeepSeek’s impressive gains were likely driven by the limited resources available to the Chinese engineers, who did not have access to the most powerful Nvidia hardware for training. This constraint led them to develop a series of clever optimizations in model architecture, training procedures, and hardware management.
Two optimizations stand out. First is the low-level programming of hardware to address bandwidth limitations. (Using the latest Nvidia hardware would have been easier, but they did not have access to it.) Second is the use of “reinforcement learning,” but without human intervention, allowing the model to improve itself.
Kat Duffy, Senior Fellow for Digital and Cyberspace Policy, argued that these innovative methods highlight a downside to the United States’ approach to AI:
The focus in the American innovation environment on developing artificial general intelligence and building larger and larger models is not aligned with the needs of most countries around the world. For them, the greatest interest is in seizing the potential of functional AI as quickly as possible. The existing chips and open models can go a long way to achieving that.
The more the United States pushes Chinese developers to build within a highly constrained environment, the more it risks positioning China as the global leader in developing cost-effective, energy-saving approaches to AI. These will be far more compelling to many governments and entrepreneurs than the “compute or bust” mindset that has been driving AI investments and innovation priorities in the United States.
Comments
Post a Comment