无码中文
发布日期:2025-12-17 14:58 点击次数:189
LLMs之GraphRAG:《From Local to Global: A Graph RAG Approach to Query-Focused Summarization》翻译与解读导读:该论文提议了一种基于图结构的学问图谱增强生成(Graph RAG)要领,用于回复用户针对总计文本王人集的全局性质筹商,以赞助东说念主们对无数数据进行全面结实。布景痛点:传统的回复增强生成(RAG)要领主要用于土产货问答任务,无法很好惩处针对总计文本王人集的全局性质筹商问题。传统的凭证查询聚焦的自动纲领(QFS)要领难以大意RAG系统常见的大范畴文本索引。中枢旨趣:GraphRAG终局全局性降低答的中枢旨趣如下:>> 成就基于学问图谱的二级索引结构。第一步,从源文档中通过LLM索取实体与关系,构建学问图谱;第二步,使用社区检测算法将学问图谱分割成与实体细密关系的社区模块。>> 对每个社区模块使用LLM生成敷陈式自动纲领,变成一个掩饰源文档过火基础的学问图谱的模块性图索引。>> 用户提议查询时,领先让每个社区纲领孤苦并应用用LLM生成部分回复;然后对扫数关系部分回复再次使用LLM进行汇总,得出全局回返回回给用户。念念路形状:源文档→文本块→实体与关系实例→实体与关系面貌→学问图谱→Graph Communities→社区自动纲领→社区谜底→全局谜底总体来说,GraphRAG通过分层构建学问图谱索引,利用其内在的模块性达成并行处理才略;然后使用map-reduce念念想终局对全局查询的回复,在保证回复全面性的同期普及了恶果,这是其终局全局性降低答任务的中枢念念路。中枢特质:>> 充分利用学问图谱内在的模块性,终局并行处理才略。>> 社区模块中的实体与关系得到充分深远面貌,有益于生成更全面和千般化的回复。>> 与径直聘用源文档比拟,图结构索引简约无数险峻文信息量,且查询恶果更高。上风:>> 实验终结标明,与传统RAG要领和径直全局文本汇总要领比拟,Graph RAG要领在回复全面性和千般性方面都有权臣普及,同期简约无数险峻文信息量,尤其是利用根社区水平得到很好的查询性能。该要领终局了复杂问题回复任务的可膨大性。总之,该论文提议的Graph RAG要领很好地将学问图谱、RAG和查询聚焦纲领技艺相结合,终局了对大范畴文本王人集的全局性质筹商的回复,有益于赞助东说念主类进行深远结实和宏不雅把执。《From Local to Global: A Graph RAG Approach to Query-Focused Summarization》翻译与解读地址论文地址:https://arxiv.org/abs/2404.16130时候2024年4月24日作家Microsoft团队Abstract纲领The use of retrieval-augmented generation (RAG) to retrieve relevant informa-tion from an external knowledge source enables large language models (LLMs) to answer questions over private and/or previously unseen document collections. However, RAG fails on global questions directed at an entire text corpus, such as “What are the main themes in the dataset?”, since this is inherently a query-focused summarization (QFS) task, rather than an explicit retrieval task. Prior QFS methods, meanwhile, fail to scale to the quantities of text indexed by typical RAG systems. To combine the strengths of these contrasting methods, we propose a Graph RAG approach to question answering over private text corpora that scales with both the generality of user questions and the quantity of source text to be in-dexed. Our approach uses an LLM to build a graph-based text index in two stages: first to derive an entity knowledge graph from the source documents, then to pre-generate community summaries for all groups of closely-related entities. Given a question, each community summary is used to generate a partial response, before all partial responses are again summarized in a final response to the user. For a class of global sensemaking questions over datasets in the 1 million token range, we show that Graph RAG leads to substantial improvements over a 精真金不怕火的RAG baseline for both the comprehensiveness and diversity of generated answers. An open-source, Python-based implementation of both global and local Graph RAG approaches is forthcoming at https://aka.ms/graphrag.使用检索增强生成(retrieve -augmented generation, RAG)从外部学问着手检索关系信息,使大型言语模子(LLM)或者回复独到和/或曩昔未见过的文档王人集上的问题。可是,RAG在针对总计文本语料库的全局问题上失败了,举例“数据荟萃的主题是什么?”,因为这施行上是一个以查询为中心的查询聚焦纲领(QFS)任务,而不是一个明确的检索任务。与此同期,先前的QFS要领无法膨大到典型RAG系统索引的文本数目。为了结合这些对比要领的上风,咱们提议了一种基于独到文本语料库的Graph RAG要领,该要领不错凭证用户问题的通用性和要索引的源文本的数目进行膨大。咱们的要领使用LLM分两个阶段构建基于图的文本索引:领先从源文档中导出实体学问图,然后为扫数密切关系的实体组预生成社区纲领。给定一个问题,每个社区纲领用于生成部分反馈,然后将扫数部分反馈再次汇总为对用户的最终反馈。对于100万个令牌范围内的数据集上的一类全局语义问题,咱们标明Graph RAG在生成谜底的全面性和千般性方面比精真金不怕火的RAG基线有了实质性的校正。一个开源的、基于python的全局和局部Graph RAG要领的终局行将在https://aka.ms/graphrag上终局。Figure 1: Graph RAG pipeline using an LLM-derived graph index of source document text. This index spans nodes (e.g., entities), edges (e.g., relationships), and covariates (e.g., claims) that have been detected, extracted, and summarized by LLM prompts tailored to the domain of the dataset. Community detection (e.g., Leiden, Traag et al., 2019) is used to partition the graph index into groups of elements (nodes, edges, covariates) that the LLM can summarize in parallel at both index-ing time and query time. The “global answer” to a given query is produced using a final round of query-focused summarization over all community summaries reporting relevance to that query.图1:使用LLM派生的源文档文本的图索引的Graph RAG管说念。该索引涵盖了节点(举例,实体)、边(举例,关系)和协变量(举例,主张),这些节点、边和协变量是由针对数据集范畴的LLM教导进行检测、索取和回来的。社区检测(举例,Leiden,Traag等东说念主,2019年)用于将图索引离别为元素组(节点、边、协变量),LLM不错在索引时候和查询时候并行回来这些元素组。给定查询的“全局谜底”是通过在扫数与该查询关系的社区纲领上使用临了一轮的查询聚焦纲领来产生的。
图片 无码中文
1 Introduction先容Human endeavors across a range of domains rely on our ability to read and reason about large collections of documents, often reaching conclusions that go beyond anything stated in the source texts themselves. With the emergence of large language models (LLMs), we are already witnessing attempts to automate human-like sensemaking in complex domains like scientific discovery (Mi-crosoft, 2023) and intelligence analysis (Ranade and Joshi, 2023), where sensemaking is defined as “a motivated, continuous effort to understand connections (which can be among people, places, and events) in order to anticipate their trajectories and act effectively” (Klein et al., 2006a). Supporting human-led sensemaking over entire text corpora, however, needs a way for people to both apply and refine their mental model of the data (Klein et al., 2006b) by asking questions of a global nature.东说念主类在各个范畴进行的行径依赖于咱们阅读和推理无数文档的才略,通常得出超出源文本自身的论断。跟着大型言语模子(LLMs)的出现,咱们也曾见证了在科学发现(Mi-crosoft, 2023)和谍报分析(Ranade和Joshi, 2023)等复杂范畴自动化类东说念主语义构建的尝试,其中文义构建被界说为“一种有动机的、连接的用功,以结实研究(不错是东说念主、场地和事件之间的研究),以便量度它们的轨迹并灵验地采选行动”(Klein等东说念主,2006a)。可是,赞助东说念主类主导的总计文本语料库的语义构建,需要一种要领,让东说念主们通过提议全局性的问题来应用和完善他们对数据的形势模子(Klein等东说念主,2006b)。Retrieval-augmented generation (RAG, Lewis et al., 2020) is an established approach to answering user questions over entire datasets, but it is designed for situations where these answers are contained locally within regions of text whose retrieval provides sufficient grounding for the generation task. Instead, a more appropriate task framing is query-focused summarization (QFS, Dang, 2006), and in particular, query-focused abstractive summarization that generates natural language summaries and not just concatenated excerpts (Baumel et al., 2018; Laskar et al., 2020; Yao et al., 2017) . In recent years, however, such distinctions between summarization tasks that are abstractive versus extractive, generic versus query-focused, and single-document versus multi-document, have become less rele-vant. While early applications of the transformer architecture showed substantial improvements on the state-of-the-art for all such summarization tasks (Goodwin et al., 2020; Laskar et al., 2022; Liu and Lapata, 2019), these tasks are now trivialized by modern LLMs, including the GPT (Achiam et al., 2023; Brown et al., 2020), Llama (Touvron et al., 2023), and Gemini (Anil et al., 2023) series, all of which can use in-context learning to summarize any content provided in their context window.检索增强生成(RAG, Lewis等东说念主,2020)是一种针对总计数据集回复用户问题的既定要领,但它是为这些谜底局部包含在文本区域内的情况而设想的,这些文本区域的检索为生成任务提供了填塞的基础。相背,更合适的任务框架所以查询为中心的纲领(QFS, Dang, 2006),尽头所以查询为中心的抽象纲领,它生成天然言语纲领,而不单是是勾通的摘录(Baumel等东说念主,2018;Laskar et al., 2020;Yao等东说念主,2017)。可是,连年来,抽象与抽取、通用与以查询为中心、单文档与多文档的纲领任务之间的区别也曾变得不那么蹙迫了。天然transformer架构的早期应用在扫数此类汇总任务上都涌现出了精深的逾越(Goodwin et al., 2020;Laskar et al., 2022;Liu和Lapata, 2019),这些任务当今被当代LLMs简化了,包括GPT (Achiam等东说念主,2023;Brown et al., 2020), Llama (Touvron et al., 2023)和Gemini (Anil et al., 2023)系列,扫数这些都不错使用险峻体裁习来往来险峻文窗口中提供的任何内容。The challenge remains, however, for query-focused abstractive summarization over an entire corpus. Such volumes of text can greatly exceed the limits of LLM context windows, and the expansion of such windows may not be enough given that information can be “lost in the middle” of longer contexts (Kuratov et al., 2024; Liu et al., 2023). In addition, although the direct retrieval of text chunks in 精真金不怕火的RAG is likely inadequate for QFS tasks, it is possible that an alternative form of pre-indexing could support a new RAG approach specifically targeting global summarization.可是,对于总计语料库的以查询为中心的抽象纲领来说,挑战仍然存在。这么的文本量不错大大超越LLM险峻文窗口的摈弃,况兼计议到信息可能会“丢失在中间”的较长的险峻文,这么的窗口的膨大可能是不够的(Kuratov等东说念主,2024;Liu et al., 2023)。此外,尽管在精真金不怕火的RAG中径直检索文本块可能不允洽QFS任务,但是一种替代体式的预索引可能赞助挑升针对全局纲领的新RAG要领。In this paper, we present a Graph RAG approach based on global summarization of an LLM-derived knowledge graph (Figure 1). In contrast with related work that exploits the structured retrieval and traversal affordances of graph indexes (subsection 4.2), we focus on a previously unexplored quality of graphs in this context: their inherent modularity (Newman, 2006) and the ability of com-munity detection algorithms to partition graphs into modular communities of closely-related nodes (e.g., Louvain, Blondel et al., 2008; Leiden, Traag et al., 2019). LLM-generated summaries of these community descriptions provide complete coverage of the underlying graph index and the input doc-uments it represents. Query-focused summarization of an entire corpus is then made possible using a map-reduce approach: first using each community summary to answer the query independently and in parallel, then summarizing all relevant partial answers into a final global answer.在本文中,咱们提议了一种基于LLM派生的学问图的全局回来的Graph RAG要领(图1)。与利用图索引的结构化检索和遍历可视性的关系使命(第4.2节)比拟,咱们专注于在此布景下曩昔未探索的图的质料:它们固有的模块化(Newman, 2006)以及社区检测算法将图离别为密切相要津点的模块化社区的才略(举例,Louvain, Blondel等东说念主,2008;莱顿,Traag等东说念主,2019)。LLM生成的这些社区面貌的纲领提供了底层图形索引过火所代表的输入文档的圆善掩饰。然后,不错使用map-reduce要领对总计语料库进行以查询为中心的汇总:领先使用每个社区汇总来独当场并行地回复查询,然后将扫数关系的部分谜底汇总为最终的全局谜底。To evaluate this approach, we used an LLM to generate a diverse set of activity-centered sense-making questions from short descriptions of two representative real-world datasets, containing pod-cast transcripts and news articles respectively. For the target qualities of comprehensiveness, diver-sity, and empowerment (defined in subsection 3.4) that develop understanding of broad issues and themes, we both explore the impact of varying the the hierarchical level of community summaries used to answer queries, as well as compare to 精真金不怕火的RAG and global map-reduce summarization of source texts. We show that all global approaches outperform 精真金不怕火的RAG on comprehensiveness and diversity, and that Graph RAG with intermediate- and low-level community summaries shows favorable performance over source text summarization on these same metrics, at lower token costs.为了评估这种要领,咱们使用LLM从两个具有代表性的实在寰宇数据集的简略面貌中生成了一组以行径为中心的意念念构建问题,这些数据集分别包含播客文稿和新闻著述。对于发展对粗糙问题和主题的结实的抽象性、千般性和赋权(在第3.4末节中界说)的主见质料,咱们都探索了用于回复查询的不同社区纲领的档次水平的影响,并与精真金不怕火的RAG和源文本的人人舆图减少纲领进行了比较。咱们标明,扫数全局要领在全面性和千般性方面都优于精真金不怕火的RAG,况兼具有中级和初级社区纲领的Graph RAG在这些交流的计算上以更低的令牌资本涌现出比源文本纲领更好的性能。2 Graph RAG Approach & Pipeline图RAG要领和管说念We now unpack the high-level data flow of the Graph RAG approach (Figure 1) and pipeline, de-scribing key design parameters, techniques, and implementation details for each step.当今咱们解压缩Graph RAG要领的高级数据流(图1)和管说念,面貌每个形状的谬误设想参数、技艺和终局细节。2.1 Source Documents → Text Chunks源文档→文本块A fundamental design decision is the granularity with which input texts extracted from source doc-uments should be split into text chunks for processing. In the following step, each of these chunks will be passed to a set of LLM prompts designed to extract the various elements of a graph index. Longer text chunks require fewer LLM calls for such extraction, but suffer from the recall degrada-tion of longer LLM context windows (Kuratov et al., 2024; Liu et al., 2023). This behavior can be observed in Figure 2 in the case of a single extraction round (i.e., with zero gleanings): on a sample dataset (HotPotQA, Yang et al., 2018), using a chunk size of 600 token extracted almost twice as many entity references as when using a chunk size of 2400. While more references are generally better, any extraction process needs to balance recall and precision for the target activity.一个基本的设想决策是将从源文档中索取的输入文分内割成文本块进行处理的粒度。在接下来的形状中,每个块都将传递给一组LLM教导符,这些教导符旨在索取图索引的千般元素。较长的文本块需要较少的LLM调用来进行这种索取,但较长的LLM险峻文窗口会导致调回率下落(Kuratov等东说念主,2024;Liu et al., 2023)。在单个索取轮(即零网罗)的情况下,不错在图2中不雅察到这种行径:在样本数据集(HotPotQA, Yang等东说念主,2018)上,使用块大小为600的令牌索取的实体援用险些是使用块大小为2400时的两倍。天然援用越多越好,但任何索取过程都需要均衡主见行径的调回率和精度。2.2 Text Chunks → Element Instances文本块→元素实例The baseline requirement for this step is to identify and extract instances of graph nodes and edges from each chunk of source text. We do this using a multipart LLM prompt that first identifies all entities in the text, including their name, type, and description, before identifying all relationships between clearly-related entities, including the source and target entities and a description of their relationship. Both kinds of element instance are output in a single list of delimited tuples.The primary opportunity to tailor this prompt to the domain of the document corpus lies in the choice of few-shot examples provided to the LLM for in-context learning (Brown et al., 2020). For example, while our default prompt extracting the broad class of “named entities” like people, places, and organizations is generally applicable, domains with specialized knowledge (e.g., science, medicine, law) will benefit from few-shot examples specialized to those domains. We also support a secondary extraction prompt for any additional covariates we would like to associate with the extracted node instances. Our default covariate prompt aims to extract claims linked to detected entities, including the subject, object, type, description, source text span, and start and end dates.这一步的基本要求是从每个源文本块中识别和索取图节点和边的实例。咱们使用多部分LLM教导符来完成此操作,该教导符领先识别文本中的扫数实体,包括它们的称呼、类型和面貌,然后识别明确关系实体之间的所研究系,包括源实体和主见实体以及它们之间关系的面貌。这两种类型的元素实例都输出在单个分隔元组列表中。将此教导定制为文档语料库范畴的主要契机在于弃取提供给LLMs进行险峻体裁习的极少示例(Brown et al., 2020)。举例,天然咱们的默许教导索取“定名实体”(如东说念主员、场地和组织)的粗糙类别频繁是适用的,但具有挑升学问的范畴(举例,科学、医学、法律)将受益于挑升针对这些范畴的极少示例。对于咱们但愿与索取的节点实例关系联的任何其他协变量,咱们还赞助辅助索取教导符。咱们默许的协变量教导旨在索取与检测到的实体关系联的声明,包括主题、对象、类型、面貌、源文本跨度以及运行和终局日历。To balance the needs of efficiency and quality, we use multiple rounds of “gleanings”, up to a specified maximum, to encourage the LLM to detect any additional entities it may have missed on prior extraction rounds. This is a multi-stage process in which we first ask the LLM to assess whether all entities were extracted, using a logit bias of 100 to force a yes/no decision. If the LLM responds that entities were missed, then a continuation indicating that “MANY entities were missed in the last extraction” encourages the LLM to glean these missing entities. This approach allows us to use larger chunk sizes without a drop in quality (Figure 2) or the forced introduction of noise.为了均衡恶果和质料的需要,咱们使用多轮“网罗”,直到指定的最大值,以饱读动LLM检测之前索取轮次中可能遗漏的任何其他实体。这是一个多阶段的过程,咱们领先要求LLM评估是否索取了扫数实体,使用100的logit偏差来强制作念出是/否的决定。如果LLM反馈实体丢失了,那么领导“在前次索取中丢失了好多实体”的延续将饱读动LLM网罗这些丢失的实体。这种要领允许咱们使用更大的块大小,而不会裁汰质料(图2)或强制引入噪声。2.3 Element Instances → Element Summaries元素实例→元素纲领The use of an LLM to “extract” descriptions of entities, relationships, and claims represented in source texts is already a form of abstractive summarization, relying on the LLM to create inde-pendently meaningful summaries of concepts that may be implied but not stated by the text itself (e.g., the presence of implied relationships). To convert all such instance-level summaries into sin-gle blocks of descriptive text for each graph element (i.e., entity node, relationship edge, and claim covariate) requires a further round of LLM summarization over matching groups of instances.使用LLM来“索取”源文本中示意的实体、关系和声明的面貌也曾是一种抽象纲领的体式,依靠LLM来创建可能隐含但未由文本自身阐明的见解的孤苦有意念念的纲领(举例,隐含关系的存在)。要将扫数这么的实例级纲领调遣为每个图元素(即实体节点、关系旯旮和索赔协变量)的单个面貌性文本块,需要对匹配的实例组进行进一步的LLM纲领。A potential concern at this stage is that the LLM may not consistently extract references to the same entity in the same text format, resulting in duplicate entity elements and thus duplicate nodes in the entity graph. However, since all closely-related “communities” of entities will be detected and summarized in the following step, and given that LLMs can understand the common entity behind multiple name variations, our overall approach is resilient to such variations provided there is sufficient connectivity from all variations to a shared set of closely-related entities.Overall, our use of rich descriptive text for homogeneous nodes in a potentially noisy graph structure is aligned with both the capabilities of LLMs and the needs of global, query-focused summarization. These qualities also differentiate our graph index from typical knowledge graphs, which rely on concise and consistent knowledge triples (subject, predicate, object) for downstream reasoning tasks.这个阶段的一个潜在问题是,LLM可能无法永久如一地以交流的文本形状索取对归并实体的援用,从而导致重复的实体元素,从而导致实体图中的重复节点。可是,由于扫数密切关系的实体“社区”将在接下来的形状中被检测和回来,况兼计议到LLM不错结实多个称呼变化背后的共同实体,咱们的举座要领对于这些变化是有弹性的,唯有扫数变化与一组分享的密切关系的实体有填塞的勾通。2.4 Element Summaries → Graph Communities元素纲领→图社区The index created in the previous step can be modelled as an homogeneous undirected weighted graph in which entity nodes are connected by relationship edges, with edge weights representing the normalized counts of detected relationship instances. Given such a graph, a variety of community detection algorithms may be used to partition the graph into communities of nodes with stronger connections to one another than to the other nodes in the graph (e.g., see the surveys by Fortu-nato, 2010 and Jin et al., 2021). In our pipeline, we use Leiden (Traag et al., 2019) on account of its ability to recover hierarchical community structure of large-scale graphs efficiently (Figure 3). Each level of this hierarchy provides a community partition that covers the nodes of the graph in a mutually-exclusive, collective-exhaustive way, enabling divide-and-conquer global summarization.在前一步中创建的索引不错建模为一个同构无向加权图,其中实体节点通过关系边勾通,边的权重示意检测到的关系实例的范例化计数。给定这么一个图,不错使用千般社区检测算法将图离别为节点社区,这些节点之间的勾通比图中其他节点之间的勾通更强(举例,参见fortune -nato, 2010和Jin et al., 2021的看望)。在咱们的管说念中,咱们使用Leiden (Traag等东说念主,2019),因为它或者灵验地规复大范畴图的分层社区结构(图3)。该档次结构的每个级别都提供了一个社区分区,该分区以互斥的、集体详备的方式掩饰图的节点,从而终局分而治之的全局回来。2.5 Graph Communities → Community Summaries社区图→社区汇总The next step is to create report-like summaries of each community in the Leiden hierarchy, using a method designed to scale to very large datasets. These summaries are independently useful in their own right as a way to understand the global structure and semantics of the dataset, and may themselves be used to make sense of a corpus in the absence of a question. For example, a user may scan through community summaries at one level looking for general themes of interest, then follow links to the reports at the lower level that provide more details for each of the subtopics. Here, however, we focus on their utility as part of a graph-based index used for answering global queries.Community summaries are generated in the following way:下一步是使用一种旨在膨大到相称大的数据集的要领,为Leiden档次结构中的每个社区创建近似敷陈的纲领。这些纲领看成结实数据集的举座结构和语义的一种方式,它们自身是孤苦有用的,况兼不错在莫得问题的情况下用于结实语料库。举例,用户不错浏览某一级别的社区纲领,寻找感深嗜深嗜的一般主题,然后点击指向较初级别敷陈的王人集,这些王人集为每个子主题提供了更多马虎信息。可是,在这里,咱们关爱的是它们看成用于回复全局查询的基于图的索引的一部分的遵循。社区纲领以以下方式生成:2.6 Community Summaries → Community Answers → Global Answer社区纲领→社区解答→全局解答Given a user query, the community summaries generated in the previous step can be used to generate a final answer in a multi-stage process. The hierarchical nature of the community structure also means that questions can be answered using the community summaries from different levels, raising the question of whether a particular level in the hierarchical community structure offers the best balance of summary detail and scope for general sensemaking questions (evaluated in section 3).给定一个用户查询,在前一步中生成的社区纲领可用于在多阶段历程中生成最终谜底。社区结构的档次性也意味着不错使用来自不同档次的社区纲领来往复问题,这就提议了这么一个问题:在档次化社区结构中,某个特定的档次是否提供了纲领细节和一般性问题范围的最好均衡(在第3节中进行了评估)。For a given community level, the global answer to any user query is generated as follows:>> Prepare community summaries. Community summaries are randomly shuffled and divided into chunks of pre-specified token size. This ensures relevant information is distributed across chunks, rather than concentrated (and potentially lost) in a single context window.>> Map community answers. Generate intermediate answers in parallel, one for each chunk.The LLM is also asked to generate a score between 0-100 indicating how helpful the gen-erated answer is in answering the target question. Answers with score 0 are filtered out.>> Reduce to global answer. Intermediate community answers are sorted in descending order of helpfulness score and iteratively added into a new context window until the token limit is reached. This final context is used to generate the global answer returned to the user.对于给定的社区级别,生成任何用户查询的全局谜底如下:>>准备社区回来。社区纲领被巧合打乱并分红事先指定的令牌大小的块。这确保了关系信息散播在各个块之间,而不是荟萃在单个险峻文窗口中(况兼可能丢失)。>>舆图社区谜底。并行生成中间谜底,每个块一个。LLMs还被要求生成一个0-100分之间的分数,标明生成的谜底对回复主见问题的匡助进度。得分为0的谜底将被过滤掉。>>减少到全局谜底。中间社区谜底按有用性分数降序排序,并迭代地添加到新的险峻文窗口中,直到达到令牌摈弃。临了一个险峻文用于生成返回给用户的全局谜底。3 Evaluation评估3.1 Datasets数据集We selected two datasets in the one million token range, each equivalent to about 10 novels of text and representative of the kind of corpora that users may encounter in their real world activities:>> Podcast transcripts. Compiled transcripts of podcast conversations between Kevin Scott, Microsoft CTO, and other technology leaders (Behind the Tech, Scott, 2024). Size: 1669 × 600-token text chunks, with 100-token overlaps between chunks (∼1 million tokens).>> News articles. Benchmark dataset comprising news articles published from September 2013 to December 2023 in a range of categories, including entertainment, business, sports, technology, health, and science (MultiHop-RAG; Tang and Yang, 2024). Size: 3197 × 600-token text chunks, with 100-token overlaps between chunks (∼1.7 million tokens).咱们在100万个令牌范围内弃取了两个数据集,每个数据集相称于节略10本演义的文本,代表了用户在现实寰宇行径中可能遭遇的语料库类型:>>播客文本。汇编了微软首席技艺官凯文·斯科特与其他技艺首领之间的播客对话记载(《科技背后》,斯科特,2024年)。大小:1669 × 600个令牌文本块,块之间有100个令牌重复(约100万个令牌)。新闻著述。基准数据集包括从2013年9月到2023年12月在一系列类别中发布的新闻著述,包括文娱,交易,体育,技艺,健康和科学(MultiHop-RAG;Tang and Yang, 2024)。大小:3197 × 600个令牌文本块,块之间有100个令牌重复(约170万个令牌)。3.2 Queries查询Many benchmark datasets for open-domain question answering exist, including HotPotQA (Yang et al., 2018), MultiHop-RAG (Tang and Yang, 2024), and MT-Bench (Zheng et al., 2024). However, the associated question sets target explicit fact retrieval rather than summarization for the purpose of data sensemaking, i.e., the process though which people inspect, engage with, and contextualize data within the broader scope of real-world activities (Koesten et al., 2021). Similarly, methods for extracting latent summarization queries from source texts also exist (Xu and Lapata, 2021), but such extracted questions can target details that betray prior knowledge of the texts.咫尺存在好多洞开域问答的基准数据集,包括HotPotQA (Yang等东说念主,2018)、MultiHop-RAG (Tang和Yang, 2024)和MT-Bench (Zheng等东说念主,2024)。可是,关系的问题集以明确的事实检索为主见,而不所以数据语义为目的的回来,即东说念主们在更粗糙的现实寰宇行径范围内查抄、参与和情境化数据的过程(Koesten et al., 2021)。相似,从源文本中索取潜在纲领查询的要领也存在(Xu和Lapata, 2021),但这些索取的问题可能针对反水文本先验学问的细节。To evaluate the effectiveness of RAG systems for more global sensemaking tasks, we need questions that convey only a high-level understanding of dataset contents, and not the details of specific texts. We used an activity-centered approach to automate the generation of such questions: given a short description of a dataset, we asked the LLM to identify N potential users and N tasks per user, then for each (user, task) combination, we asked the LLM to generate N questions that require understanding of the entire corpus. For our evaluation, a value of N = 5 resulted in 125 test questions per dataset. Table 1 shows example questions for each of the two evaluation datasets.为了评估RAG系统在更多全局意念念生成任务中的灵验性,咱们需要的问题只传达对数据集内容的高级次结实,而不是特定文本的细节。咱们使用以行径为中心的要领来自动生成此类问题:给定数据集的简略面貌,咱们要求LLM识别N个潜在用户和每个用户的N个任务,然后对于每个(用户,任务)组合,咱们要求LLM生成N个需要结实总计语料库的问题。对于咱们的评估,N = 5的值导致每个数据集有125个测试问题。表1涌现了两个评估数据集的示例问题。3.3 Conditions条款We compare six different conditions in our analysis, including Graph RAG using four levels of graph communities (C0, C1, C2, C3), a text summarization method applying our map-reduce approach directly to source texts (TS), and a na¨ıve “semantic search” RAG approach (SS):在咱们的分析中,咱们比较了六种不同的情况,包括使用四个级别的图社区(C0, C1, C2, C3)的Graph RAG,径直应用咱们的map-reduce要领到源文本的文本纲领要领(TS),以及na¨ıve“语义搜索”RAG要领(SS)。3.4 Metrics计算LLMs have been shown to be good evaluators of natural language generation, achieving state-of-the-art or competitive results compared against human judgements (Wang et al., 2023a; Zheng et al., 2024). While this approach can generate reference-based metrics when gold standard answers are known, it is also capable of measuring the qualities of generated texts (e.g., fluency) in a reference-free style (Wang et al., 2023a) as well as in head-to-head comparison of competing outputs (LLM-as-a-judge, Zheng et al., 2024). LLMs have also shown promise at evaluating the performance of conventional RAG systems, automatically evaluating qualities like context relevance, faithfulness, and answer relevance (RAGAS, Es et al., 2023).LLMs已被阐明是天然言语生成的细密评估者,与东说念主类判断比拟,取得了首先进或具有竞争力的终结(Wang等东说念主,2023a;郑等东说念主,2024)。天然这种要领不错在黄金范例谜底已知的情况下生成基于参考的计算,但它也或者以无参考的方式(Wang等东说念主,2023a)测量生成文本的质料(举例开通性),以及对竞争输出进行正面比较(LLM-as-a-judge, Zheng等东说念主,2024)。LLMs在评估传统RAG系统的性能方面也进展出了但愿,自动评估险峻文关系性、实在度和谜底关系性等质料(RAGAS, Es等东说念主,2023)。3.6 Results终结The indexing process resulted in a graph consisting of 8564 nodes and 20691 edges for the Podcast dataset, and a larger graph of 15754 nodes and 19520 edges for the News dataset. Table 3 shows the number of community summaries at different levels of each graph community hierarchy.Global approaches vs. 精真金不怕火的RAG. As shown in Figure 4, global approaches consistently out-performed the 精真金不怕火的RAG (SS) approach in both comprehensiveness and diversity metrics across datasets. Specifically, global approaches achieved comprehensiveness win rates between 72-83%for Podcast transcripts and 72-80% for News articles, while diversity win rates ranged from 75-82%and 62-71% respectively. Our use of directness as a validity test also achieved the expected results, e., that 精真金不怕火的RAG produces the most direct responses across all comparisons.索引过程的终结是Podcast数据集的图由8564个节点和20691条边构成,News数据集的图由15754个节点和19520条边构成。表3涌现了每个图社区档次结构中不同级别的社区纲领数目。人人要领vs. ıve RAG。如图4所示,在数据集的全面性和千般性计算方面,全局要领永久优于精真金不怕火的RAG (SS)要领。具体而言,人人要领在播客文本和新闻著述上的抽象胜率分别为72-83%和72-80%,而千般性胜率分别为75-82%和62-71%。咱们使用径直性看成效度测试也达到了预期的终结,即精真金不怕火的RAG在扫数比较中产生最径直的反应。Community summaries vs. source texts. When comparing community summaries to source texts using Graph RAG, community summaries generally provided a small but consistent improvement in answer comprehensiveness and diversity, except for root-level summaries. Intermediate-level summaries in the Podcast dataset and low-level community summaries in the News dataset achieved comprehensiveness win rates of 57% and 64%, respectively. Diversity win rates were 57% for Podcast intermediate-level summaries and 60% for News low-level community summaries. Table 3 also illustrates the scalability advantages of Graph RAG compared to source text summarization: for low-level community summaries (C3), Graph RAG required 26-33% fewer context tokens, while for root-level community summaries (C0), it required over 97% fewer tokens. For a modest drop in performance compared with other global methods, root-level Graph RAG offers a highly efficient method for the iterative question answering that characterizes sensemaking activity, while retaining advantages in comprehensiveness (72% win rate) and diversity (62% win rate) over 精真金不怕火的RAG.社区纲领vs.源文本。当使用Graph RAG将社区纲领与源文本进行比较时,除了根级纲领外,社区纲领频繁在谜底的全面性和千般性方面提供了小而一致的校正。Podcast数据荟萃的中级纲领和News数据荟萃的初级社区纲领的抽象胜率分别为57%和64%。播客中级回来的千般性胜率为57%,新闻初级社区回来的千般性胜率为60%。表3还阐明了与源文本纲领比拟,Graph RAG的可伸缩性上风:对于初级社区纲领(C3), Graph RAG需要的险峻文令牌减少了26-33%,而对于根级社区纲领(C0),它需要的令牌减少了97%以上。与其他全局要领比拟,在性能上略有下落的情况下,根级图RAG提供了一种高效的迭代问题回复要领,该要领具有意念念生成行径的特征,同期保留了比精真金不怕火的RAG在全面性(72%胜率)和千般性(62%胜率)方面的上风。Empowerment. Empowerment comparisons showed mixed results for both global approaches versus 精真金不怕火的RAG (SS) and Graph RAG approaches versus source text summarization (TS). Ad-hoc LLM use to analyze LLM reasoning for this measure indicated that the ability to provide specific exam-ples, quotes, and citations was judged to be key to helping users reach an informed understanding. Tuning element extraction prompts may help to retain more of these details in the Graph RAG index.赋权。授权比较涌现,全局要领与精真金不怕火的RAG (SS)和图形RAG要领与源文本纲领(TS)的终结不同。挑升使用LLMs来分析LLMs对这一度量的推理标明,提供具体示例、援用和援用的才略被以为是匡助用户赢得知情结实的谬误。调优元素索取教导可能有助于在Graph RAG索引中保留更多这些细节。4 Related Work关系使命4.1 RAG Approaches and Systems要领和系统When using LLMs, RAG involves first retrieving relevant information from external data sources, then adding this information to the context window of the LLM along with the original query (Ram et al., 2023). 精真金不怕火的RAG approaches (Gao et al., 2023) do this by converting documents to text, splitting text into chunks, and embedding these chunks into a vector space in which similar positions represent similar semantics. Queries are then embedded into the same vector space, with the text chunks of the nearest k vectors used as context. More advanced variations exist, but all solve the problem of what to do when an external dataset of interest exceeds the LLM’s context window.当使用LLM时,RAG领先触及从外部数据源检索关系信息,然后将此信息与原始查询总计添加到LLM的险峻文窗口(Ram等东说念主,2023)。精真金不怕火的RAG要领(Gao et al., 2023)通过将文档调遣为文本,将文分内割成块,并将这些块镶嵌到向量空间中,其中相似的位置示意相似的语义来终局这一丝。然后将查询镶嵌到交流的向量空间中,使用最近k个向量的文本块看成险峻文。存在更高级的变体,但都惩处了当感深嗜深嗜的外部数据集超出LLM的险峻文窗口时该何如办的问题。Advanced RAG systems include pre-retrieval, retrieval, post-retrieval strategies designed to over-come the drawbacks of 精真金不怕火的RAG, while Modular RAG systems include patterns for iterative and dynamic cycles of interleaved retrieval and generation (Gao et al., 2023). Our implementation of Graph RAG incorporates multiple concepts related to other systems. For example, our community summaries are a kind of self-memory (Selfmem, Cheng et al., 2024) for generation-augmented re-trieval (GAR, Mao et al., 2020) that facilitates future generation cycles, while our parallel generation of community answers from these summaries is a kind of iterative (Iter-RetGen, Shao et al., 2023) or federated (FeB4RAG, Wang et al., 2024) retrieval-generation strategy. Other systems have also combined these concepts for multi-document summarization (CAiRE-COVID, Su et al., 2020) and multi-hop question answering (ITRG, Feng et al., 2023; IR-CoT, Trivedi et al., 2022; DSP, Khattab et al., 2022). Our use of a hierarchical index and summarization also bears resemblance to further approaches, such as generating a hierarchical index of text chunks by clustering the vectors of text embeddings (RAPTOR, Sarthi et al., 2024) or generating a “tree of clarifications” to answer mul-tiple interpretations of ambiguous questions (Kim et al., 2023). However, none of these iterative or hierarchical approaches use the kind of self-generated graph index that enables Graph RAG.先进的RAG系统包括预检索、检索和后检索战略,旨在克服精真金不怕火的RAG的污点,而模块化RAG系统包括交错检索和生成的迭代和动态轮回模式(Gao等东说念主,2023)。咱们对Graph RAG的终局包含了与其他系统关系的多个见解。举例,咱们的社区概如若一种自我顾虑(Selfmem, Cheng等东说念主,2024),用于世代增强检索(GAR, Mao等东说念主,2020),有益于翌日的世代轮回,而咱们从这些纲领中并行生成社区谜底是一种迭代(ter- retgen, Shao等东说念主,2023)或和谐(FeB4RAG, Wang等东说念主,2024)检索生成战略。其他系统也将这些见解结合起来用于多文档纲领(cire - covid, Su等东说念主,2020)和多跳问答(ITRG, Feng等东说念主,2023;IR-CoT, Trivedi等,2022;DSP, Khattab et al., 2022)。咱们对档次索引和纲领的使用也与进一步的要领相似,举例通过聚类文本镶嵌向量来生成文本块的档次索引(RAPTOR, Sarthi等东说念主,2024)或生成“清楚树”来往复对歧义问题的多种解释(Kim等东说念主,2023)。可是,这些迭代或分层要领都莫得使用赞助graph RAG的自生成图索引。4.2 Graphs and LLMs图与LLMsUse of graphs in connection with LLMs and RAG is a developing research area, with multiple directions already established. These include using LLMs for knowledge graph creation (Tra-janoska et al., 2023) and completion (Yao et al., 2023), as well as for the extraction of causal graphs (Ban et al., 2023; Zhang et al., 2024) from source texts. They also include forms of ad-vanced RAG (Gao et al., 2023) where the index is a knowledge graph (KAPING, Baek et al., 2023), where subsets of the graph structure (G-Retriever, He et al., 2024) or derived graph metrics (Graph-ToolFormer, Zhang, 2023) are the objects of enquiry, where narrative outputs are strongly grounded in the facts of retrieved subgraphs (SURGE, Kang et al., 2023), where retrieved event-plot sub-graphs are serialized using narrative templates (FABULA, Ranade and Joshi, 2023), and where the system supports both creation and traversal of text-relationship graphs for multi-hop question an-swering (Wang et al., 2023b). In terms of open-source software, a variety a graph databases are supported by both the LangChain (LangChain, 2024) and LlamaIndex (LlamaIndex, 2024) libraries, while a more general class of graph-based RAG applications is also emerging, including systems that can create and reason over knowledge graphs in both Neo4J (NaLLM, Neo4J, 2024) and Nebula-Graph (GraphRAG, NebulaGraph, 2024) formats. Unlike our Graph RAG approach, however, none of these systems use the natural modularity of graphs to partition data for global summarization.在LLMs和RAG中使用图形是一个发展中的研究范畴,也曾成就了多个主见。其中包括使用LLMs创建学问图谱(Tra-janoska等东说念主,2023)和完成学问图谱(Yao等东说念主,2023),以及索取因果图(Ban等东说念主,2023;Zhang et al., 2024)。它们还包括高级RAG的体式(Gao等东说念主,2023),其中索引是一个学问图(KAPING, Baek等东说念主,2023),其中图结构的子集(g - retriver, He等东说念主,2024)或派生的图度量(graph - toolformer, Zhang, 2023)是查询对象,其中叙事输出激烈地基于检索子图的事实(SURGE, Kang等东说念主,2023),其中检索的事件情节子图使用叙事模板(FABULA, Ranade和Joshi)序列化。2023),其中系统赞助多跳问答的文本关系图的创建和遍历(Wang et al., 2023b)。在开源软件方面,LangChain (LangChain, 2024)和LlamaIndex (LlamaIndex, 2024)库都赞助多种图形数据库,而更通用的基于图形的RAG应用范例也正在兴起,包括不错在Neo4J (NaLLM, Neo4J, 2024)和星云图(GraphRAG,星云图,2024)形状下创建和推理学问图的系统。可是,与咱们的Graph RAG要领不同,这些系统都莫得使用图的天然模块化来离别数据以进行全局汇总。5 Discussion筹商Limitations of evaluation approach. Our evaluation to date has only examined a certain class of sensemaking questions for two corpora in the region of 1 million tokens. More work is needed to understand how performance varies across different ranges of question types, data types, and dataset sizes, as well as to validate our sensemaking questions and target metrics with end users. Comparison of fabrication rates, e.g., using approaches like SelfCheckGPT (Manakul et al., 2023), would also improve on the current analysis.评价要领的局限性。到咫尺为止,咱们的评估仅查抄了100万个令牌区域中两个语料库的某类语义问题。需要作念更多的使命来了解不同问题类型、数据类型和数据集大小范围的性能变化,以及与最终用户考据咱们的语义问题和主见计算。比较伪造率,举例,使用SelfCheckGPT (Manakul等东说念主,2023)等要领,也将校正刻下的分析。Trade-offs of building a graph index. We consistently observed Graph RAG achieve the best head-to-head results against other methods, but in many cases the graph-free approach to global summa-rization of source texts performed competitively. The real-world decision about whether to invest in building a graph index depends on multiple factors, including the compute budget, expected number of lifetime queries per dataset, and value obtained from other aspects of the graph index (including the generic community summaries and the use of other graph-related RAG approaches).构建图表索引的猛烈权衡。咱们一直不雅察到,Graph RAG与其他要领比拟赢得了最好的正面终结,但在许厚情况下,无图要领对源文本进行全局汇总具有竞争性。对于是否投资构建图索引的施行决策取决于多个成分,包括绸缪预算、每个数据集的预期人命周期查询数目,以及从图索引的其他方面赢得的值(包括通用社区纲领和其他与图关系的RAG要领的使用)。Future work. The graph index, rich text annotations, and hierarchical community structure support-ing the current Graph RAG approach offer many possibilities for refinement and adaptation. This includes RAG approaches that operate in a more local manner, via embedding-based matching of user queries and graph annotations, as well as the possibility of hybrid RAG schemes that combine embedding-based matching against community reports before employing our map-reduce summa-rization mechanisms. This “roll-up” operation could also be extended across more levels of the community hierarchy, as well as implemented as a more exploratory “drill down” mechanism that follows the information scent contained in higher-level community summaries.翌日的使命。赞助刻下graph RAG要领的图索引、富文本扫视和分层社区结构为校正和休养提供了好多可能性。这包括以更局部的方式操作的RAG要领,通过基于镶嵌的用户查询匹配和图形扫视,以及搀和RAG决策的可能性,该决策在使用咱们的map-reduce汇总机制之前将基于镶嵌的匹配与社区敷陈结合起来。这种“上卷”操作还不错膨大到社区档次结构的更多级别,也不错看成一种更具探索性的“下钻”机制来终局,该机制罢免更高级别社区纲领中包含的信息偃息。6 Conclusion论断We have presented a global approach to Graph RAG, combining knowledge graph generation, retrieval-augmented generation (RAG), and query-focused summarization (QFS) to support human sensemaking over entire text corpora. Initial evaluations show substantial improvements over a 精真金不怕火的RAG baseline for both the comprehensiveness and diversity of answers, as well as favorable comparisons to a global but graph-free approach using map-reduce source text summarization. For situations requiring many global queries over the same dataset, summaries of root-level communi-ties in the entity-based graph index provide a data index that is both superior to 精真金不怕火的RAG and achieves competitive performance to other global methods at a fraction of the token cost.咱们提议了一种全局的Graph RAG要领,将学问图生成、检索增强生成(RAG)和以查询为中心的查询聚焦纲领(QFS)相结合,以赞助东说念主类对总计文本语料库的意念念结实。初步评估标明,在谜底的全面性和千般性方面,相较于精真金不怕火的RAG基线有实质性的校正,况兼与使用映射-简化源文本纲领的全局但无图要领比拟也涌现出有益的比较。对于需要针对归并数据集进行好多全局查询的情况,基于实体的图索引中的根级社区纲领提供了一个优于精真金不怕火RAG的数据索引,况兼以较小的令牌资本终局了与其他全局要领相比好意思的性能。An open-source, Python-based implementation of both global and local Graph RAG approaches is forthcoming at https://aka.ms/graphrag.一个开源的、基于python的全局和局部Graph RAG要领的终局行将在https://aka.ms/graphrag上终局。 本站仅提供存储做事,扫数内容均由用户发布,泰国按摩群如发现存害或侵权内容,请点击举报。