Paper reading: SteP: Stacked LLM Policies for Web Actions

SteP: Stacked LLM Policies for Web Actions

Cornell and ASAPP research; CoLM 2024

Author’s claim: Specifying a large prompt to handle all possible behaviors and states is extremely complex; decomposition to distinct policies requires careful handling of control between policies. They propose Step, an approach to dynamically compose policies to solve a diverse set of web tasks.

Use Markov Decision Tree where the state is a stack of policies.

Enable dynamic control where any policy can choose to invoke any other policy

image-20240716193601410

Strengths

  • some sort of “mid-level intent” prompt for completing high-level tasks

Challenges

  • I didn’t realize how policies are constructed when reading the introduction and the abstract.
    • Okay, I see why they are trying to avoid saying that. policies are manually crafted.
  • Then the need for manual crafting policies is a big issue.