Samedi 21 décembre 2024 à 20:40

Chose amusante, alors que ce matin même, j'ai découvert l'existence de o1, sortie il y a seulement quelques jours, le 5 décembre 2024.
Voilà que je découvre ce soir, dans ce thread Hacker News la sortie de o3 le 20 décembre 2024 : "OpenAI O3 breakthrough high score on ARC-AGI-PUB".

Les releases sont très réguliers en ce moment, il est difficile de suivre le rythme 😮 !

Dans ce thread, j'ai découvert le prix ARC (https://arcprize.org), lancé le 11 juin 2024, par le français François Chollet, basé sur le papier de recherche "On the Measure of Intelligence" sorti en 2019, il y a 5 ans.

ARC est un outil de mesure de AGI.

#JaimeraisUnJour prendre le temps de lire On the Measure of Intelligence.

Je lis ici :

OpenAI o3 Breakthrough High Score on ARC-AGI-Pub

OpenAI's new o3 system - trained on the ARC-AGI-1 Public Training set - has scored a breakthrough 75.7% on the Semi-Private Evaluation set at our stated public leaderboard $10k compute limit. A high-compute (172x) o3 configuration scored 87.5%.

This is a surprising and important step-function increase in AI capabilities, showing novel task adaptation ability never seen before in the GPT-family models. For context, ARC-AGI-1 took 4 years to go from 0% with GPT-3 in 2020 to 5% in 2024 with GPT-4o. All intuition about AI capabilities will need to get updated for o3.

source

Plus loin, je lis :

However, it is important to note that ARC-AGI is not an acid test for AGI – as we've repeated dozens of times this year. It's a research tool designed to focus attention on the most challenging unsolved problems in AI, a role it has fulfilled well over the past five years.

Passing ARC-AGI does not equate to achieving AGI, and, as a matter of fact, I don't think o3 is AGI yet. o3 still fails on some very easy tasks, indicating fundamental differences with human intelligence.

source

Donc, j'en conclus qu'il ne faut pas s'emballer outre mesure sur les résultats de ce test, bien que les progrès soient impressionnants.

La première partie du thread semble aborder la thématique du coût financier de o3 versus un humain : 309 commentaires.

Dans ce commentaire #JaiDécouvert le papier de recherche "H-ARC: A Robust Estimate of Human Performance on the Abstraction and Reasoning Corpus Benchmark" qui date de 2024.

Quitter le mode Zen