Planetarium: A New Benchmark to Evaluate LLMs on Translating Natural Language …
Direct plan generation using LLMs has shown limited success, with GPT-4 achieving only 35% accuracy on simple planning tasks. This low accuracy …
See more –> Source
Connect with us on X