ROUGE-K: A New Metric for Keyword-Aware Summary Evaluation

By: Prof. Dr. Kai Eckert | Sat, 15 Jun 2024

Our paper “ROUGE-K: Do Your Summaries Have Keywords?” got published at *SEM 2024, authored by Sotaro Takeshita, Simone Paolo Ponzetto, and Kai Eckert.

The Importance of Keywords

Keywords—content-relevant words in summaries—play a crucial role in efficient information conveyance. When reading a summary, these informative words help readers quickly grasp the main points and decide whether to engage with the full content. This makes it critical to assess whether system-generated summaries contain such keywords during evaluation.

A Critical Gap in Evaluation

The research reveals a significant limitation in existing evaluation metrics for extreme summarization models: they do not explicitly pay attention to keywords in summaries. This oversight leaves developers unaware of whether their systems are successfully including the most informative words, potentially resulting in summaries that are grammatically correct but miss essential content.

Introducing ROUGE-K

To address this gap, the researchers developed ROUGE-K, a keyword-oriented evaluation metric that provides a quantitative answer to a fundamental question: “How well do summaries include keywords?”

Surprising Findings

Through the lens of this new keyword-aware metric, the research team made a surprising discovery: current strong baseline models often miss essential information in their summaries. This finding suggests that models optimized for traditional metrics may not be capturing the content-relevant words that make summaries truly useful for readers.

Implications for Future Development

ROUGE-K provides developers with a new tool to ensure their summarization systems not only produce fluent text but also capture the keywords that convey essential information. This metric can guide the development of next-generation summarization models that better serve users’ informational needs by explicitly optimizing for keyword inclusion.

The work contributes to ongoing efforts to develop more comprehensive and meaningful evaluation frameworks for natural language generation systems, particularly in the critical domain of text summarization.

Citation: Sotaro Takeshita, Simone Paolo Ponzetto, Kai Eckert (2024): ROUGE-K: Do Your Summaries Have Keywords? In Proceedings of the 13th Joint Conference on Lexical and Computational Semantics (*SEM 2024), pages 69-79.