MCTS-EF: 基于蒙特卡洛树搜索与自一致性的大语言模型数学推理增强框架

doi:10.15888/j.cnki.csa.010141

AIPUB归智期刊联盟

微信公众号

网站二维码

首页 > 过刊浏览>年第卷第期 >1-11. DOI:10.15888/j.cnki.csa.010141

PDF HTML阅读 XML下载导出引用引用提醒

MCTS-EF: 基于蒙特卡洛树搜索与自一致性的大语言模型数学推理增强框架
DOI:
                        10.15888/j.cnki.csa.010141
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:2023年度沈阳市科学技术计划 (23-407-3-29)

MCTS-EF: LLM Mathematical Reasoning Enhancement Framework Based on Monte Carlo Tree Search and Self-consistency

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

提升大语言模型的数学推理能力是当前研究重点, 其中思维链技术通过提示词优化显著增强了模型性能. 然而, 现有方法多依赖超大模型的蒸馏, 却忽视了模型自身的潜力. 本文提出了MCTS-EF推理增强框架, 利用大模型自身的验证反馈能力与蒙特卡洛树搜索(Monte Carlo tree search, MCTS)实现动态纠错, 并通过模型的自一致性缓解幻觉并增强模型推理能力, 无需依赖蒸馏、微调或外部奖励模型. 该框架通过自我奖励、验证反馈和上下文学习的协同机制, 结合MCTS与评估-反馈循环优化推理路径, 充分激发模型内在的数学推理潜力. 实验结果表明, Qwen2-7B在MATH-500数据集上的准确率从44%提升至68%, 超越Qwen2-72B模型性能, 其他模型的性能也有较大提升. 本文还系统分析了相关模型在该框架中不同情况下的表现, 为未来研究提供了方法论指导和技术路径.

Abstract:

Enhancing the mathematical reasoning ability of large language models is a central focus of current research. Among existing approaches, chain-of-thought techniques significantly improve model performance through prompt optimization. However, most existing methods heavily rely on the distillation of extremely large models, while the intrinsic potential of the models themselves is largely overlooked. To address this limitation, this study proposes MCTS-EF, a mathematical reasoning enhancement framework for language models based on Monte Carlo tree search (MCTS) and self-consistency. The framework leverages the model’s own validation feedback capability in conjunction with MCTS to achieve dynamic error correction. In addition, self-consistency is employed to mitigate hallucinations and further enhance reasoning performance, without relying on distillation, fine-tuning, or external reward models. Through the collaborative mechanism of self-feedback, verification feedback, and in-context learning, the framework integrates MCTS with an evaluation-feedback loop to optimize reasoning trajectories, thereby fully activating the inherent mathematical reasoning potential of the model. The experimental results show that the accuracy of Qwen2-7B on the MATH-500 dataset increases from 44% to 68%, surpassing the performance of the Qwen2-72B model. Significant improvements are also observed across other model variants. Furthermore, the performance of related models in different situations within the proposed framework is systematically analyzed, providing methodological insights and technical directions for future research.

参考文献

相似文献

引证文献

引用本文

卜立平,靳明飞,于碧辉,孙林壮. MCTS-EF: 基于蒙特卡洛树搜索与自一致性的大语言模型数学推理增强框架.计算机系统应用,,():1-11

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2025-10-15
最后修改日期:2025-11-14
录用日期:
在线发布日期: 2026-04-01
出版日期:

微信公众号

网站二维码

引用本文

分享

相关视频

文章指标

历史

文章二维码