Details
Developer
OpenAI
License
NA (private model)
Model parameters
NA (private model)
Supported context length
128k
Price for prompt token
$5/Million tokens
Price for response token
$15/Million tokens
Chainpoll Score
Short Context
0.96
Medium Context
1
Long Context
0.99
Digging deeper, here’s a look how gpt-4o-2024-05-13 performed across specific datasets
This heatmap indicates the model's success in recalling information at different locations in the context. Green signifies success, while red indicates failure.
This heatmap indicates the model's success in recalling information at different locations in the context. Green signifies success, while red indicates failure.
Tasks | Task insight | Cost insight | Dataset | Context adherence | Avg response length |
---|---|---|---|---|---|
Short context RAG | The model demonstrates exceptional reasoning and comprehension skills, excelling at short context RAG. It outperforms other models in mathematical proficiency, as evidenced by its strong performance on DROP and ConvFinQA benchmarks. | It is a great model but is nearly 2x costlier than Sonnet 3.5. If cost is your concern its better to try out Gemini-1.5-Pro or Llama-3-70b. | Drop | 0.97 | 360 |
Hotpot | 0.93 | 360 | |||
MS Marco | 0.95 | 360 | |||
ConvFinQA | 1.00 | 360 | |||
Medium context RAG | Flawless performance making it suitable for any context length upto 25000 tokens. | Great performance but we recommed using 50x cheaper Gemini Flash. | Medium context RAG | 1.00 | 360 |
Long context RAG | Flawless performance making it suitable for any context length upto 100000 tokens. | Great performance but we recommend using either 1.5x cheaper Claude 3.5 Sonnet or 50x cheaper Gemini Flash. | Long context RAG | 0.99 | 360 |