Claude 3.5 Haiku Review

2024-11-05
#claude, #anthropic
460 Words
3 min
  1. Ranks 4th on the aider code leaderboard.

Ranks 4th on aider code leaderboard

Pending rankings on livebench and lmarena leaderboards—code capability is expected to not match claude 3.5 sonnet(new).

  1. Model API pricing

Model API pricing

Currently only accessible via API, expected to later replace claude 3 haiku on claude.ai.

  1. Claude 3.5 haiku highlights: fast, updated knowledge cutoff date (July 2024—can’t help but complain about OpenAI, when will they push their model knowledge cutoff date forward, stuck at October 2023, affecting model accuracy and user experience).

Note: Doesn’t support image input.

  1. Official main use cases for claude 3.5 haiku [customized the model in lobechat, using the latest haiku model from the official API]:

Claude 3.5 haiku main use cases

  1. Benchmarks—feels like haiku is a distilled version of claude 3.5 sonnet(new)

Benchmarks

  1. Claude 3.5 Sonnet evaluation
  • Knowledge cutoff date test [no political factors involved, purely for testing purposes]

Knowledge cutoff test image 1

Funny thing about claude 3.5 sonnet(new)—initially refused to respond [politically sensitive, cautious], started a new chat and got accurate response.

Knowledge cutoff test image 2

Knowledge cutoff test image 3

What happened? Feels like claude 3.5 sonnet haiku really is a distilled version of claude 3.5 sonnet(new).

Knowledge cutoff test image 4

Knowledge cutoff test image 5

I specifically checked Anthropic’s console logs—it really was requesting the claude 3.5 haiku model.

Anthropic console logs

Tried asking in English—claude 3.5 haiku directly said the cutoff date was February 2024.

Knowledge cutoff test image 6

Knowledge cutoff test image 7

Instantly lost motivation to continue testing. Actually, a later LLM knowledge cutoff doesn’t mean the LLM learned all knowledge from that period—for example, cohere’s command r plus has the latest cutoff on lmarena (August 2024), but that model doesn’t know comprehensively about events in 2024. Can’t help but appreciate the importance of data for LLM training.

Cohere’s command r plus doesn’t know comprehensively about 2024 events

Maybe I just haven’t found claude 3.5 haiku’s advantages compared to claude 3.5 sonnet(new) yet. According to Alex Albert (Anthropic’s Head of Developer Relations), claude 3.5 haiku still has lots of potential.

Alex Albert’s description of claude 3.5 haiku

  • Reasoning ability

A pile of shit.

Reasoning ability test image 1

Claude 3.5 sonnet(new) also has hallucinations in reasoning.

Reasoning ability test image 2

Reasoning ability test image 3

  1. Let’s briefly look at Musk’s xAI [free $25 monthly quota until year end]. The librechat open-source project supports xai.

xai test image 1

xai test image 2

Musk’s xai grok was trained on data from X (formerly Twitter), coincidentally covered it? xai’s knowledge date shows March 2024 on lmarena—did grok time-travel? And it actually got it right.

xai test image 3

No motivation to explore further.

xai test image 4

xai test image 5

Better stick with claude 3.5 sonnet(new). What truly excites is always the strongest LLM.

The LLM field is destined to be winner-takes-all. Except for some cost-conscious application scenarios, who would want to waste time chatting with lower-tier LLMs in most other scenarios?

Currently claude 3.5 haiku is so much more expensive than gpt-4o mini—considering costs, I can’t find any reason to use claude 3.5 haiku.

gpt 4o-mini pricing

As I said, maybe I just don’t have any use cases for claude 3.5 haiku currently. If you have scenarios like those described in the image below, you can try using it with prompt caching.

Alex Albert’s promotion of claude 3.5 haiku

Of course, cost reductions will come later.

Claude 3.5 haiku cost reduction coming later

via: https://www.anthropic.com/claude/haiku


Emoji Reaction


© 2022-2026 Made with ❤️ By Jiakai