HUN-REN SZTAKI and Bosch Collaboration in AI-Powered Enterprise Search
The Department of Distributed Systems (DSD) at HUN-REN SZTAKI is proud to announce the successful ending of a three-year collaborative research project with the Bosch Group in Hungary. This partnership, centered around the "Graph Database Project," has produced a new methodology for information retrieval, significantly improving how large enterprises can access and utilize knowledge from vast stores of unstructured data.
The Challenge: Unlocking Enterprise Knowledge
Modern enterprises like Bosch possess a wealth of information spread across countless documents, from technical specifications in PDFs to project reports in presentations. The core challenge was to move beyond simple keyword searches and create a system that could understand the context and relationships within this data, providing employees with precise and comprehensive answers to complex queries.
The Collaboration: A Journey of Innovation with GraphRAG
The joint project team embarked on a mission to build a sophisticated information retrieval system by exploring the frontiers of AI and knowledge graph technology. The collaboration systematically evolved, investigating several cutting-edge techniques:
- Building the Foundation: The project began by establishing a "Graph Database" to map and connect Bosch's diverse data assets, aiming to create a unified and accessible knowledge system.
- Exploring Advanced RAG: The team delved into Retrieval-Augmented Generation (RAG), a technique that enhances Large Language Models (LLMs) with information from private databases. The research quickly advanced to GraphRAG, which leverages the power of knowledge graphs to find connections and context that traditional RAG methods miss.
- A Novel Approach to Knowledge Extraction: A key innovation from the collaboration was the development of a unique framework for building the knowledge graph. The team used advanced LLMs to automatically extract, categorize, and group over 14,000 key terms from hundreds of technical documents. This process transformed disconnected documents into a rich, interconnected web of institutional knowledge.
The Graph-based Answer
The most significant achievement of the project was the development of a multi-layered search strategy that combines several techniques for near-perfect accuracy. The final system integrates:
- Full-text search for broad discovery.
- Vector search for semantic understanding.
- Graph-based keyword queries for contextual connections.
Crucially, the team introduced a final LLM-powered post-validation step. This algorithm intelligently re-ranks the combined search results, filtering for relevance and ensuring the highest quality answers.
Results and Impact
The performance of this novel, combined approach is remarkable. While testing on a set of 365 technical documents, the method improved the recall@10 score of retrieval from 0.8 (using traditional fulltext and embedding search) to 1.0. This means that the first ten retrieved documents contained all the relevant 1-5 documents for a given query. This is a promising leap forward in enterprise search and has led to the publication of an internal invention report by the SZTAKI and Bosch researchers.
The successful partnership between SZTAKI and Bosch exemplifies the power of combining deep academic expertise with industrial ambition. The "Graph Database Project" has contributed with great results to a critical business challenge for Bosch, and also provided new experimental knowledge to the field of AI-driven information retrieval.