Posted inChatGPT Technology News
Vision language models struggle to solve simple visual puzzles that humans find intuitive
Vision language models struggle to solve simple visual puzzles that humans find intuitive GPT-4o, currently considered the most advanced multimodal model, could only solve 21 out of 100 visual puzzles.…