Leading AI models accused of cheating benchmark tests – Computing UK

Leading AI models accused of cheating benchmark tests – Computing UK
GPT-4 o1 on OpenAI’s SWE-Bench Verified benchmark. In independent testing, GPT-4 o1 scored only 30%, well below OpenAI’s claimed 50% performance.

See more –> Source

Connect with us on X