GeniiVision is a new feature in our SaaS tool designed to automate customer support using advanced language models. Previously, Genii could only process the text extracted from input documents, such as PDFs, Word documents, and URLs, which limited its ability to handle rich formats like images, tables, and graphs. GeniiVision addresses this limitation by allowing the ingestion and interpretation of rich content within documents, enhancing the tool’s accuracy and expanding its use cases.
Why We Developed GeniiVision
Genii previously extracted only the text from documents, ignoring valuable information in images, tables, and graphs. This limitation required manual reformatting of documents to make them usable for Genii, reducing efficiency and missing critical context. GeniiVision was developed to:
- Fully exploit the information within documents, including rich content.
- Increase the accuracy and breadth of responses by interpreting images, tables, and graphs.
- Open new use cases that were previously not possible due to the need for manual document reformatting.
How GeniiVision Works
When GeniiVision is enabled, the system processes documents page by page, identifying and isolating rich formats like tables, images, and graphs. Here’s how it works:
- Identification: The system first identifies rich content within the document, such as tables, images (e.g., screenshots), and graphs.
- Isolation and Processing: Identified rich content is isolated and sent to a multimodal model with a customized prompt tailored to the specific use case.
- Captioning and Interpretation: The multimodal model describes or reconstructs the content in text form, making it readable and usable for Genii.
- Integration: The interpreted content is automatically integrated into Genii’s knowledge base.
- Usage: When users query this content, it behaves like any other document in Genii. The dashboard shows context paragraphs extracted via GeniiVision, allowing users to assess the model’s performance and relevance.
Best Practices and Limitations
Best Practices