GenAI Productionize 2.0: The premier conference for GenAI application development
We are excited to introduce the second installation of Galileo's Hallucination Index, the RAG Special!
The LLM landscape has changed a lot since launching our first Hallucination Index in November 2023, with larger more powerful open and closed-sourced models being announced monthly. Since then, two things happened: the term "hallucinate" became Dictionary.com’s Word of the Year, and Retrieval-Augmented-Generation (RAG) has become the leading method for building AI solutions. And while models continue to increase in size and performance, the risk of hallucinations remains.
For our second installation of the Hallucination Index, we tested 22 of the leading foundation models from brands like OpenAI, Anthropic, Meta, Google, and more in real-world RAG-based use cases.
Context Length and Model Performance
Given the growing popularity of RAG, understanding how context length impacts model performance was a key focus. We tested the models across three scenarios:
Open-Source vs. Closed-Source Models
The open-source vs. closed-source software debate has waged on since the Free Software Movement (FSM) in the late 1980s. This debate has reached a fever pitch during the LLM Arms Race. The assumption is closed-source LLMs, with their access to proprietary training data, will perform better, but we wanted to put this assumption to the test.
Measuring Performance with Galileo's Context Adherence Evaluation Model
LLM performance was measured using Galileo's Context Adherence Evaluation Model. Context Adherence uses a proprietary method created by Galileo Labs called ChainPoll to measure how well an AI model adheres to the information it is given, helping spot when AI makes up information that is not in the original text.
We hope this index helps AI builders make informed decisions about which LLM is best suited for their particular use case and need.
With that, dig into the rankings, model-specific insights, and our methodology at www.rungalileo.io/hallucinationindex.
Working with Natural Language Processing?
Read about Galileo’s NLP Studio