Paper reading: A REAL-WORLD WEBAGENT WITH PLANNING, LONG CONTEXT UNDERSTANDING, AND PROGRAM SYNTHESIS
A REAL-WORLD WEBAGENT WITH PLANNING, LONG CONTEXT UNDERSTANDING, AND PROGRAM SYNTHESIS
Deepmind, ICLR 2024
Motivation
LLM Agents in real-world websites has still suffered from open domainness, limited context length and lack of inductive bias on HTML.
They introduce WebAgent, an LLM-driven agent that learns from self experience to complete tasks on real websites following natural language instructions.
Strength
- method:
- two-model method.
- HTML-T5 to predict the next sub-instruction (planning) and generate related HTML snippets (summarization)
- Flan-U-PaLM prompted to generate Python programs
- Action is represented by python programs with selenium driver calls.
- a large action space
- flexbility
- two-model method.
- Supports any website without predefined action spaces.
Challenges
- method: a bit complicated
- action representation: safety? Run generated code from LLM is not safe.
- Use a smaller model for planning – WHY?