Thumbnail for Constrained Linked Entity ANnotation using RAG (CLEANR)
CLEF 2025
Benedikt Kantz, Stefan Lengauer, Peter Waldert, Tobias Schreck

Background

The GutbrainIE challenge is part of the HEREDITARY efforts to promote research on medical retrieval systems for knowledge representation. We participated in this challenge, but did not succeed, but provided a quite novel approach that required little resources.

Abstract

Structured information extraction from text relies heavily on natural language processing tools and a robust understanding of the structure. Language Models (LMs) provide the text understanding for long and unstructured input, even in domain-specific data. The generative aspect of these systems, however, can be unstructured and quickly return data that does not conform to the intended structural constraints. Our system, Constrained Linked Entity ANnotation using RAG (CLEANR), introduces structured output based on the ontological constraint placed through a grammar to the LM. This addition enables us to reliably utilize relatively small and inexpensive models in our pipeline to process domain-specific data for information extraction in the CLEF GutBrainIE task, resulting in good precision in the Relation Extraction (RE) tasks and improving the Graphwise solution by taking the union.