Reinforcement Learning with TEXT2REWARD’s Automated Reward Function Design Using Advanced Language Models
Researchers have developed TEXT2REWARD, a groundbreaking framework that uses large language models (LLMs) to automate the design of reward functions in reinforcement learning (RL). The framework takes a natural language description of a goal and generates an executable program to interpret that goal, offering a convenient alternative to traditional, domain-specific methods. Tested on robotic manipulation and locomotion benchmarks, TEXT2REWARD consistently outperformed or matched expert-designed reward functions. The framework also emphasizes iterative refinement through human feedback and has been successfully deployed in real-world robotic simulations. Despite a 10% error rate, largely due to syntax or shape mismatches, TEXT2REWARD signals promising advancements in the intersection of RL and LLMs.