At present, the research on video semantic segmentation is mainly divided into two aspects. The first one is how to improve the accuracy of image segmentation by using timing information between video frames, while the second one is how to use the similarity between the frames to determine the key frame, reduce the amount of calculation, and improve the running speed of the model. In terms of improving segmentation accuracy, new modules are generally designed and combined with existing CNNs. In terms of reducing computation load, the low-level feature correlation of frame sequence is used to select the key frame, which reduces computation load and operation time at the same time. Firstly, this paper introduces the development background and operation datasets Cityscapes and CamVid of video semantic segmentation. Secondly, the existing video semantic segmentation methods are introduced. Finally, it summarizes the current development of video semantic segmentation, and gives some prospects and suggestions for future development.