Leading AI models accused of cheating benchmark tests – Computing UK
… GPT-4 o1 on OpenAI’s SWE-Bench Verified benchmark. In independent testing, GPT-4 o1 scored only 30%, well below OpenAI’s claimed 50% performance.
See more –> Source
Connect with us on X