0

o3 model just got teased?

Profile picture
Intern at Taro Community2 months ago

What are peoples thoughts on it?

https://x.com/polynoamial/status/1870172996650053653

theyre saying its scoring 2700 on code forces which would put it in the 99 percentile compared to o1 1600 putting it in the 60-70 percentile. gpt 4o gets 11th percentile

its also crushing benchmarks about reasoning: https://x.com/arcprize/status/1870169260850573333

about arc agi: https://arcprize.org/ (its basically visual puzzles)

literally beating humans (the mturk score)

https://x.com/iScienceLuvr/status/1870172171886067774

43
2

Discussion

(2 comments)
  • 1
    Profile picture
    Tech Lead @ Robinhood, Meta, Course Hero
    2 months ago

    I'm sure it's good, but it's important to remember that with any new AI thing (and just every tech announcement in general), data will be cherry-picked to make it seem better than it actually is.

    On top of using AI as part of your workflow, I think way more engineers need to think about how they can leverage this new technology to build better side projects. There is a gold mine of slick AI utilities that can be built.

  • 1
    Profile picture
    AI/ML Eng @ Series C startup
    2 months ago

    To add to Alex's statement, it's easy to misread data. For example, those metrics like MMLU don't mean much when you're trying to make these "AI agents".

    Benchmarks, by design, look really good. But AI projects/papers often solve a really specific problem, so blindly hoping research results generalize to your use case doesn't work.

    When learning any science, the real skill is finding which information is relevant/actionable to you. It's 99.9% noise out there in GenAI. You need to do side projects/implement papers yourself to get any real alpha in GenAI tbh.