Paper reading: MIND2WEB: Towards a Generalist Agent for the Web

MIND2WEB: Towards a Generalist Agent for the Web

OSU-NLP, NIPS 2023 Dataset Track

A dataset of real tasks on real-world websites, including tasks and user interaction traces.

image-20240716201230890

Dataset Format

image-20240717195904846

  • Action Traces
    • cleaned html and raw html at each point
    • repr of action

Strength

  • Very good project homepage – https://osu-nlp-group.github.io/Mind2Web/
  • annotation process:
    • human-annotated dataset
    • select element then select action
  • Method:
    • two-step.
      1. select candidate dom elements
        1. a 0-1 score for Each Candidate Element
        2. random negative samples
      2. Generate action
        1. a multi-choice on the candidate element

Challenges

  • a 0-1 score for each candidate element is too complex
    • you need many calls for a single action
    • unfeasible