New AI Model Compresses Text into Images, Reducing Processing Costs
A new artificial intelligence model developed by DeepSeek significantly reduces the computational resources needed to process lengthy text documents by converting them into images, a technique dubbed “vision-text compression.”
The model, unveiled today, utilizes optical character recognition (OCR) to transform text and documents into visual representations, allowing AI systems to handle substantially longer inputs with fewer processing tokens – up to 20 times fewer, according to developers. This breakthrough enables the processing of approximately 200,000 pages daily on a single GPU. The technology is particularly suited for handling structured documents and complex data extraction tasks.
This development addresses a critical limitation in current large language models, which struggle with the extensive token requirements of long documents, leading to increased costs and slower processing times. The ability to efficiently process vast amounts of textual data is crucial for applications like legal discovery, research analysis, and financial reporting. Further advancements in AI are increasingly reliant on efficient data handling, as explored in recent research from OpenAI.
DeepSeek’s new 3B parameter model is designed for high-performance OCR and structured document conversion, offering a more efficient alternative to traditional text-based processing methods. This innovation could democratize access to advanced AI capabilities by lowering the barrier to entry for organizations dealing with large document repositories. You can learn more about the challenges of large language models here.
Officials at DeepSeek stated they plan to release further details regarding the model’s availability and potential applications in the coming weeks.
- New Deepseek model drastically reduces resource usage by converting text and documents into images — ‘vision-text compression’ uses up to 20 times fewer tokens Tom’s Hardware
- Deepseek’s OCR system compresses image-based text so AI can handle much longer documents the-decoder.com
- DeepSeek unveils AI model that uses visual perception to compress text input South China Morning Post
- DeepSeek Just Released a 3B OCR Model: A 3B VLM Designed for High-Performance OCR and Structured Document Conversion MarkTechPost
- DeepSeek releases new OCR model capable of generating 200,000 pages daily on a single GPU TechNode