Paper reading: WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents [extensive reading]
UCB and Bardeen; seems rejected by CoLM 2024.
WILBUR, an approach that uses a differentiable ranking model and a novel instruction synthesis technique to optimally populate a black-box large language model’s prompt with task demonstrations from previous runs.
Strength:
- Interesting Claim: learning how specific websites work is needed for both person and LLMs.
- Implementation:
- explore, reflect and backtrack: verify whether the action succeed, and if not, backtrack to a previous successful state, while storing the failure in the model’s context
- retrieve demonstrations from a scalable knowledge bank: teach the agent to perform a similar task on a potentially unseen website and teach the agent to act on a similar web page regardless of the task
- demonstration ranking model
- webvoyager sota, 53%
Drawbacks
- no structural information in their dom representation: why?
- I would like to first see experiments that verify your hypothesis (models need to learn how to use unseen websites)
- too similar with RAP and Agent Hospital paper