Switch to the dark mode that's kinder on your eyes at night time.

Switch to the light mode that's kinder on your eyes at day time.

Add New Post

Switch to the dark mode that's kinder on your eyes at night time.

Switch to the light mode that's kinder on your eyes at day time.

Add New Post
in

CMC’s Vietnamese AI Model Soars to Top 12 in Document Insight

CMC’s Vietnamese AI model ranks Top 12 globally in document understanding

The research team at CMC has announced that their AI model, CATI-VLM, has earned a place among the Top 12 globally and secured the top position in Vietnam in the Document Visual Question Answering (DocVQA) category at the Robust Reading Competition (RRC), as revealed in June 2025.

Created by the CMC Applied Technology Institute (CMC ATI), the CATI-VLM model (Visual Document Understanding) was developed using an extensive 5TB dataset. This accomplishment signifies a noteworthy achievement in Vietnam’s journey towards AI advancement and research.

“We are excited that CMC’s research talents have been acknowledged on such a prominent international stage like the RRC,” stated Dang Minh Tuan, Director of CMC ATI. “We take great pride in reaching this point swiftly, standing alongside top global institutions. Crucially, this reflects our capability to harness technology and address unique Vietnamese and sector-specific issues.”

As Vietnam experiences rapid digital transformation, the embrace of AI technology has become increasingly prevalent. Optical character recognition (OCR) tools are essential for digitizing materials, streamlining workflows, cutting costs, and enhancing management effectiveness.

Nevertheless, due to the intricacies of the Vietnamese language—characterized by tonal diacritics and handwritten styles—text recognition necessitates more than merely reading text. It requires a comprehensive contextual comprehension.

CATI-VLM distinguishes itself from conventional OCR systems by not only extracting characters but also decoding various layers of information. This encompasses textual details, non-text elements (such as checkboxes, graphs, signatures, formulas), structural layouts (like pages, tables, and forms), and stylistic features (including fonts and highlighted text).

This model can visually answer inquiries regarding document images—similar to ChatGPT—without any previous exposure to specific template formats.

The Robust Reading Competition is an esteemed global scientific event orchestrated by the Computer Vision Center at the Universitat Autònoma de Barcelona (UAB), a respected institution in the realm of computer vision.

Since it first took place in 2011, the competition has maintained a close affiliation with the International Conference on Document Analysis and Recognition (ICDAR), which serves as a major platform for document analysis and computer vision discussions. It routinely attracts researchers and engineers from prestigious universities, research bodies, and leading technology firms, including Tsinghua University, Hyundai Motor Group, and Tencent.

Tasks set by the RRC aim to stimulate technological progress and tackle real-world challenges, covering domains such as translation, enterprise data management, urban analysis, and historical document processing.

Thai Khang


Report

What do you think?

301 Points
Upvote Downvote

Leave a Reply

Avatar

Your email address will not be published. Required fields are marked *

Back to Top

Log In

Forgot password?

Forgot password?

Enter your account data and we will send you a link to reset your password.

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

To use social login you have to agree with the storage and handling of your data by this website. %privacy_policy%

Add to Collection

No Collections

Here you'll find all collections you've created before.