While large language models (LLMs) have demonstrated impressive performance on a range of decision-making tasks, they rely on simple acting processes and fall short of broad deployment as autonomous agents. We introduce LATS (Language Agent Tree Search), a general framework that synergizes the capabilities of LLMs in planning, acting, and reasoning. Drawing inspiration from Monte Carlo tree search in model-based reinforcement learning, LATS employs LLMs as agents, value functions, and optimizers, repurposing their latent strengths for enhanced decision-making. What is crucial in this method is the use of an environment for external feedback, which offers a more deliberate and adaptive problem-solving mechanism that moves beyond the limitations of existing techniques. Our experimental evaluation across diverse domains, such as programming, HotPotQA, and WebShop, illustrates the applicability of LATS for both reasoning and acting. In particular, LATS achieves 94.4% for programming on HumanEval with GPT-4 and an average score of 75.9 for web browsing on WebShop with GPT-3.5, demonstrating the effectiveness and generality of our method.
That’s true and it’s something I appreciate, but it is better when it is marked as a quotation in some way, with an initial “> ”, italics *...*, or maybe a first line saying something like “From the linked article:” or “Abstract:”.
- Combines reasoning (from chain-of-thought), acting (from ReAct), and planning (from tree-of-thought) into a general framework for LLM problem solving
- Adapts MCTS (from AlphaZero) for LLM high-level planning
- Strong performance on question-answering, programming, and web browsing
Is there a comparison of Language Agent Tree Search to Graph of Thoughts somewhere? They reference it only in passing while talking about "search algorithms", but I understand it's a fair bit more than that.
Currently I am creating different agent types for planned subtasks using langchain, so perhaps implementing a custom AgentExecutor? Or would I need to lift it up higher in the logic stack? I am not sure that I understand how the graph search and thought-action-reflection selection process is deciding when and how to reflect if a branch fails, and how it backpropogates the failure to other nodes?