Accessing Math Solutions via Monte Carlo Self-Refine with LLaMa-3 8B

thethirdone · 2024-06-13T01:30:23 1718242223

I am confused about how the MCTSr algorithm actually is. It is not clear how it is better than simply mutating potential answers (by LLM) and sorting by LLM self-eval.

I have a hard time understanding how many LLM evals MCTSr actually does. How the rollout limit is implemented is not described at all. It doesn't seem like it can mean the same thing as for normal MCTS because there is not any definitive "end" to the tree search. Additionally, MCTS is normally limited by the nodes expanded.

Aside from theoretical concerns, it is not clear why they have not include an > 8 rollout version in the tables or used an LLM stronger than Llama 3 8B. If the concept scales well it should be able to beat GPT-4 and friends by a rather large margin.

Obviously marrying search with LLMs is a ripe area for research, but I find it hard to actually take anything away from this paper.

EDIT:

Added a missing greater than sign

CGamesPlay · 2024-06-13T01:59:57 1718243997

Two differences in MCTS (or their MCTSr) from what you describe: First is that using just the self-eval as an authoritative score rather than a probability distribution increases the likelihood that a low-quality evaluation (a.k.a. an inaccurate evaluation) will throw off the entire search process. Second is that by using MCTS, a low-quality action (MCTSr -> solution) that doesn't turn out to be low-quality until several refinement rounds doesn't throw off the entire search, either, since the tree search can go back to the root and experiment with a different earlier branch.

A single rollout is the full process described in chapter 3. "The algorithm iterates through these stages until a termination condition T is met, including rollout constraints or maximum exploration depth"

I can't say as to why they didn't try using a bigger model than Llama 3 8B, or more rollouts.

[edit] counter-edit

thethirdone · 2024-06-13T02:22:13 1718245333

> They did include an 8-rollout version in the tables? I can't say as to why they didn't try using a bigger model than Llama 3 8B.

That was a typo. I meant a > 8 rollout version. It doesn't seem like they have hit massively diminishing returns yet.

> A single rollout is the full process described in chapter 3. "The algorithm iterates through these stages until a termination condition T is met, including rollout constraints or maximum exploration depth"

A rollout is not the entire process. The summary of normal MCTS correctly identifies rollouts as "random simulations by selecting moves arbitrarily until a game’s conclusion is reached, thereby evaluating the node’s potential". Nothing actually like rollouts is ever described.

Typically MCTS is limited by nodes expanded which is likely what they mean, but because they correctly described rollouts, it seems like I am missing something. Also they mention AlphaGo which replaces rollouts with a neural eval which maybe is relevant.

If rollouts means nodes expanded, 4 and 8 are both just really low numbers.

jjviana · 2024-06-12T23:58:49 1718236729

Looks great! Next step is to do like AlphaGo / AlphaStar and use the MCTS data to train a neural network to act as the value function.

isaacfrond · 2024-06-13T07:32:19 1718263939

I don't see any example solutions in the paper? If it is solving olympiad level problems, I want to see some!?

Hbarkh · 2024-06-17T02:07:36 1718590056

Has anyone implemented this successfully? I'm 90% of the way there but I'm not sure I fully understand the node expansion process. Results are so-so. Improved for sure, but not solving my toy math problem...

qrian · 2024-06-13T05:11:48 1718255508

What happens if proposed MCTSr algorithm is applied to GPT-4? It doesn't seem too hard to run that experiment.

david_shi · 2024-06-16T06:11:50 1718518310

I believe this is the code: https://github.com/trotsky1997/mathblackbox

imtringued · 2024-06-12T21:49:22 1718228962

Finally a paper that is improving Llama 3 by a significant margin!