Due to the shaking of a handheld camera or the movement of targets, the video image data is subject to motion blur, which reduces the image quality of human perception. With regard to the problem, from how to obtain clear images from the original process to how to obtain clear images efficiently, a new model for real-time video image deblurring based on the lightweight Generative Adversarial Network (GAN) is proposed in this study. The model defines PatchGAN as a discriminant network and sets up a dual-scale discriminator for global images and local features on the basis of it; the generation network takes lightweight MobileNetV3 as the backbone network and introduces a feature pyramid for feature extraction to solve the problem of low utilization of feature information in the discrimination network and low inference efficiency of the generation network. This model uses an end-to-end approach to efficiently deblur the video image. After experiments on the GoPro and Kohler datasets, the results show that the sharp image deblurred by this model has a high peak signal-to-noise ratio and great structural similarity, and the inference speed reaches 1.7–127 times faster than that of other models.