Planetarium: A New Benchmark to Evaluate LLMs on Translating Natural Language …

Planetarium: A New Benchmark to Evaluate LLMs on Translating Natural Language …
Direct plan generation using LLMs has shown limited success, with GPT-4 achieving only 35% accuracy on simple planning tasks. This low accuracy …

See more –> Source

Connect with us on X