A lightweight network based on depthwise separable convolution and the attention mechanism is proposed for fast detection of surface defects on semiconductor wafers, and experiments are conducted on the WM-811K dataset. As the proportions of defects of nine different categories in this dataset are imbalanced, a data enhancement method is used to expand the data for defect categories with few data. The depthwise separable convolution in this model can reduce the number of parameters and improve the inference speed of the model. The attention mechanism can make the model pay more attention to the defective regions in the wafer image so that the model can achieve better classification results. The experiments show that the average accuracy of the proposed method on the WM-811K dataset is as high as 96.5%, which is improved to varying degrees compared with that of ANN, VGG16, and MobileNetv2. In addition, the number of parameters and the amount of operation are only 73.5% and 28.6% of those of the classical lightweight network MobileNetv2, respectively.