Vision language models struggle to solve simple visual puzzles that humans find intuitive
GPT-4o, currently considered the most advanced multimodal model, could only solve 21 out of 100 visual puzzles. Other well-known AI models, including …
See more –> Source
Connect with us on X