Llama-2-7b-chat

The smallest chat model in Llama 2 family of LLMs developed and publicly released by Meta. This model was pretrained on 2 trillion tokens of data from publicly available sources and fine-tuned on over one million human-annotated instruction datasets.

Model

Llama-2-7b-chat

Details

Developer

Model Performance Across Task-Types

Here's how Llama-2-7b-chat performed across all three task types

Metric

ChainPoll Score

QA without RAG

0.52

QA with RAG

0.65

Long form text generation

0.72

Model Info Across Task-Types

Digging deeper, here’s a look how Llama-2-7b-chat performed across specific datasets

Tasks	Insights	Dataset Name	Dataset Performance
QA without RAG	The model performs poorly which show high bias and errors in factual knowledge.	Truthful QA	0.42
QA without RAG		Trivia QA	0.63
QA with RAG	The model struggles on this which demonstrates poor reasoning and comprehension skills. It struggles on mathematical skills as it scores relatively low on DROP compared to other dataset. It performs much better than 6x larger model Falcon-40b-instruct.	MS Marco	0.86
		Hotpot QA	0.59
		Drop	0.42
		Narrative QA	0.74
Long form text generation	The model performs near satisfactory which shows ability to generate long text without factual errors.	Open Assistant	0.72

💰 Cost insights

The model offers a decent balance of cost and performance. It is 13x cheaper compared to GPT3.5 and 6x cheaper compared to Llama 70b variant. We suggest using Zephyr-7b-beta instead of this.

LLMHALLUCINATIONINDEXLLM