Vision language models struggle to solve simple visual puzzles that humans find intuitive

Vision language models struggle to solve simple visual puzzles that humans find intuitive
GPT-4o, currently considered the most advanced multimodal model, could only solve 21 out of 100 visual puzzles. Other well-known AI models, including …

See more –> Source

Connect with us on X