A Comprehensive Evaluation on Event Reasoning of Large ...

08 Jul.,2024

 

A Comprehensive Evaluation on Event Reasoning of Large ...

Zhengwei Tao

12

Read more

 Zhi Jin

12

 Yifan Zhang

12

 Xiancai Chen

12

 Xiaoying Bai

3&#;

 Yue Fang

12


 Haiyan Zhao

12

 Jia Li

12

 Chongyang Tao

4


1

Key Laboratory of High Confidence Software Technologies (PKU), MOE, China
2School of Computer Science, Peking University
3Advanced Institute of Big Data  4Beihang University

{tttzw, xiancaich, yifanzhang, y.fang}@stu.pku.edu.cn

,  


 

{zhijin,zhhy.sei, lijiaa}@pku.edu.cn

 

*Corresponding authors.

Hongyu Dinghao supply professional and honest service.

Event reasoning is a fundamental ability that underlies many applications. It requires event schema knowledge to perform global reasoning and needs to deal with the diversity of the inter-event relations and the reasoning paradigms. How well LLMs accomplish event reasoning on various relations and reasoning paradigms remains unknown. To mitigate this disparity, we comprehensively evaluate the abilities of event reasoning of LLMs. We introduce a novel benchmark EV 2 superscript EV 2 \textsc{EV}^{2} EV start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for EValuation of EVent reasoning. EV 2 superscript EV 2 \textsc{EV}^{2} EV start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT consists of two levels of evaluation of schema and instance and is comprehensive in relations and reasoning paradigms. We conduct extensive experiments on EV 2 superscript EV 2 \textsc{EV}^{2} EV start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . We find that LLMs have abilities to accomplish event reasoning but their performances are far from satisfactory. We also notice the imbalance of event reasoning abilities in LLMs. Besides, LLMs have event schema knowledge, however, they&#;re not aligned with humans on how to utilize the knowledge. Based on these findings, we introduce two methods to guide the LLMs to utilize the event schema knowledge. Both methods achieve improvements. Code and Dataset are available on https://github.com/TZWwww/EV2 .

1

Introduction

Figure 1:

An example of event reasoning. The

red

words are event schema knowledge. The sentences below are event instances. In event reasoning, there are various paradigms such as Contextual Event Classification (CEC) and Contextual Relation Reasoning (CRR), and diverse inter-event relations.

Events are instances or occurrences that form the basic semantic building units encompassing the meanings of Activities, Accomplishments, Achievements, and States Vendler (). Event Reasoning is the ability to process and analyze events and their complex interconnections. Compared with other abilities, event reasoning is unique in some aspects. Firstly, it requires knowledge in the form of event schemas, capturing the progress of event evolution in scenarios, then performing global reasoning Li et al. (a); Mao et al. (). As shown in Figure 1, each event instance is associated with an event type. All event types and their relations form the event schema knowledge which reflects the logic and mechanism of event evolution. Knowing &#;Memory&#; would often happen after &#;Learn&#; can help answer the reasoning question. Second, the inter-event relations and reasoning paradigms are various. Event reasoning incorporates reasoning events according to a certain relation Du et al. (); Sap et al. (b) and reasoning inter-event relations Ning et al. (); Caselli and Vossen (). The queried relations are diversified such as causality Roemmele et al. (), temporality Zhou et al. (), and hierachy Glavaš et al. (). There are various paradigms such as reasoning the event or the inter-relation.

As a fundamental competency within LLMs, event reasoning supports a multitude of Natural Language Processing (NLP) tasks, including recommendation engines Yang et al. (), interactive question-answer systems Souza Costa et al. (), and AI Agents Liu et al. (). Therefore, the enhancement of event reasoning abilities is essential for the advancement of LLMs.

LLMs like LLAMA Touvron et al. () series and GPT series Brown et al. () have demonstrated exceptional accomplishments in various natural language reasoning Bang et al. (); Xu et al. (b). Existing research has evaluated a broad spectrum of reasoning abilities of LLMs such as commonsence Bian et al. (), sentence relations Chan et al. (), and math Arora et al. (). However, studies on the comprehensive evaluation of event reasoning of LLMs are scarce. Current works only focus on instance-level events, resulting in unclearness of how LLMs understand and utilize the event schema knowledge Chan et al. (). Besides, they ignore the diversity of relations and paradigms Yuan et al. (). These disparities hinge on the development of such crucial abilities of LLMs.

In this paper, we comprehensively evaluate event reasoning in knowledge and abilities. Since there are existing datasets that are comprehensive in relations and paradigms, and can cover both levels of schema and instance, we introduce a novel benchmark EV2superscriptEV2\textsc{EV}^{2}EV start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for the EValuation of EVent reasoning. EV2superscriptEV2\textsc{EV}^{2}EV start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is featured in evaluating both aligned schema-level and instance-level. The schema-level evaluation investigates the event schema knowledge of LLMs while the instance-level testifies the event reasoning abilities. Besides, to evaluate event reasoning in various types of relation and reasoning paradigms, EV2superscriptEV2\textsc{EV}^{2}EV start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT includes two event reasoning tasks, namely Contextual Event Classification (CEC) and Contextual Relation Reasoning (CRR) as shown in Figure 1. EV2superscriptEV2\textsc{EV}^{2}EV start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT is constructed from both GPT generation and human annotation. Utilizing EV2superscriptEV2\textsc{EV}^{2}EV start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, we comprehensively evaluate how well LLMs do event reasoning in terms of abilities and knowledge. Specifically, we mainly explore four research questions: 1) How proficient abilities of event reasoning do LLMs have? 2) To what extent do LLMs have the event schema knowledge? 3) Are LLMs aligned with humans in leveraging event schema knowledge? 4) Can LLMs perform better event reasoning with explicit guidance of leveraging event schema knowledge?

We conduct extensive experiments on EV2superscriptEV2\textsc{EV}^{2}EV start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT to answer these questions. The results provide insights into event reasoning that: 1) LLMs have the abilities of event reasoning, but are far from satisfactory and are imbalanced in different relations and reasoning paradigms. 2) LLMs have event schema knowledge. They can answer the schema-level questions with similar accuracy to the instance-level questions. However, the development of schema-level abilities falls behind those of instance-level. 3) LLMs are not aligned with humans in the aspect of leveraging event schema knowledge. 4) Based on the findings, we design two mentoring methods to guide the LLMs to utilize event schema knowledge. One is to directly add event schema knowledge to the prompt. The second is guiding in a chain-of-thought format. With the designed guidances for utilizing event schema knowledge, LLMs can perform better event reasoning. Especially with direct guidance, LLMs get significant improvements.

We summarize our contributions as follows:

  • &#;&#;\bullet&#;

    We evaluate event reasoning in both levels of schema and instance, and various relations and paradigms.

  • &#;&#;\bullet&#;

    We construct a novel benchmark EV2superscriptEV2\textsc{EV}^{2}EV start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT which features two levels of evaluation and comprehensive in relations and reasoning paradigms. We conduct extensive experiments to probe how LLMs perform event reasoning.

  • &#;&#;\bullet&#;

    We conclude several insights. Based on our findings, we design mentoring methods to guide LLMs to utilize event schema knowledge which achieves improvements in event reasoning.

    Contact us to discuss your requirements of Hongyu Dinghao. Our experienced sales team can help you identify the options that best suit your needs.