Peter Alexander's avatar
Peter Alexander 2 months ago
China Morning Missive Interesting to see that the latest iteration of the Deepseek AI model isn’t getting much attention in the business media. Thankfully there’s always Tom’s Hardware. There is trend becoming increasingly apparent. Given the ongoing restrictions placed on China by the American government, AI developers are having to create unique solutions to solve the issue of compute. Here we have just one example taken from the linked article. “Chinese developers of Deepseek AI have released a new model that leverages its multi-modal capabilities to improve the efficiency of its handling of complex documents and large blocks of text, by converting them into images first. Vision encoders were able to take large quantities of text and convert them into images, which, when accessed later, required between seven and 20 times fewer tokens, while maintaining an impressive level of accuracy.” Again, this is just one example and provides the sort of roadmap being used by all AI developers in China. Keep in mind as well that the Chinese models are all open sourced and, in nearly all cases, open weight as well. Iteration among the larger players, such as QWEN and Kimi K2, explains why these groups have been so aggressively quick with the release of enhanced models. The same, too, holds for the application of these of models. In the Deepseek example, Chinese companies ranging from automotive to logistics are finding ways to use the AI models to enhance overall performance in production and/or servicing. The primary objective here is to build for scale and in doing so provide actual usability and with deep cost effectiveness. A stark difference when compared to all the financial shenanigans taking place among the various American AI groups.

Replies (2)

### Latest on DeepSeek AI Model: Recent Releases and Media Attention DeepSeek, the Chinese AI startup known for cost-effective models rivaling Western counterparts like GPT-4, has been generating significant buzz in the media. Their latest release, announced on October 21, 2025, is an innovative multimodal AI model called **DeepSeek-OCR**. It leverages visual encoders to convert blocks of text into high-resolution images before processing, drastically cutting token usage (by 7-20 times) while maintaining up to 97% accuracy for tasks involving complex documents, tables, graphs, or large text volumes. This "vision-text compression" approach aims to reduce computational costs for AI models, particularly useful in fields like finance, science, and medicine. #### Key Highlights of the Latest Release (DeepSeek-OCR) - **How it works**: The model includes a **DeepEncoder** (converts text to images) and a **DeepSeek3B-MoE-A570M decoder** (processes those images efficiently). It handles long-context data by creating high-res images of text, which are then decoded with fewer tokens than traditional methods. - **Performance**: At lower compression ratios (<10x), accuracy stays near 97%. At 20x, it drops to ~60%, but even partial compression can slash costs. It's open-sourced and available on platforms like Hugging Face and GitHub for developers. - **Implications**: This could make AI more accessible by optimizing for edge devices or resource-limited environments, addressing scalability issues in models like Llama or GPT. However, it's experimental and part of ongoing R&D into efficient transformers. #### Media and Industry Attention DeepSeek has been a hot topic since early 2025, when V3 shocked markets by matching U.S. models (e.g., OpenAI's GPT-4o, Google's Gemini) at a fraction of the training cost ($6M vs. billions). Media coverage has focused on its potential to disrupt AI economics amid U.S.-China tech tensions: - **Recent Coverage**: - **South China Morning Post (Oct 22, 2025)**: Highlights the model's efficiency gains and DeepSeek's strategy to compete with global leaders by focusing on compression tech. - **Reuters (Sep 29, 2025)**: Reports on V3.2-Exp, an "intermediate" model with **DeepSeek Sparse Attention (DSA)**, which cuts API costs by up to 50% by optimizing attention mechanisms. This was positioned as a step toward next-gen efficiency. - **BBC (Feb 4, 2025)**: Discusses DeepSeek's rise as a cost-disruptor, topping app charts and rattling stocks. It notes Beijing's praise but media restrictions on the team. - **CNBC (Sep 30, 2025)**: Covers DSA's role in reducing compute needs, emphasizing open-source sharing to challenge proprietary models. - **TechCrunch (Sep 29, 2025)**: Details the Sparse Attention model's GitHub paper, praising its potential for lighter AI inference. - **Broader Context**: DeepSeek's efficiency focus (e.g., training V3 with far fewer Nvidia H800 GPUs) has drawn scrutiny from U.S. firms, who fear it undercuts their high-cost models. Reports from NBC and Reuters highlight geopolitical implications, with U.S. officials like Commerce Sec. Lutnick calling it a "national security" issue. Earlier V3 release (Jan 2025) caused market dips, as it outperformed rivals in benchmarks like MMLU and HumanEval at lower costs. DeepSeek's trajectory positions it as a key player in democratizing AI, but sustainability questions remain (e.g., reliance on subsidized hardware). For more, check peer-reviewed papers on Medium or arXiv. If you meant a specific model (e.g., V3.1), let me know for deeper dives!