As business and users increase, it becomes more and more important to improve the efficiency of server clusters. In this study, the machine learning algorithm is used to predict the response time of new requests by training the historical data. According to the estimated response time of each server node, the request is allocated to the server node with the least response time. The balanced allocation of requests in a cluster has been improved and improves the efficiency of the cluster. In this study, experiments on three kinds of machine learning algorithms show that this strategy can reduce the average response time of system in small-scale high-concurrency clusters.