Aiming at the problem of semi-focus images caused by improper focusing of far and near visual fields during digital image shooting, a multi-focus image fusion Network with Cascade fusion and enhanced reconstruction (CasNet) was proposed. Firstly, a cascade sampling module was constructed to calculate and merge the residuals of feature maps sampled at different depths for efficient utilization of focused features at different scales. Secondly, a lightweight multi-head self-attention mechanism was improved to perform dimensional residual calculation on feature maps for feature enhancement of the image and make the feature maps present better distribution in different dimensions. Thirdly, convolution channel attention stacking was used to complete feature reconstruction. Finally, interval convolution was used for up- and down-sampling during the sampling process, so as to retain more original image features. Experimental results demonstrate that CasNet achieves better results in metrics such as Average Gradient (AG) and Gray-Level Difference (GLD) on multi-focus image benchmark test sets Lytro, MFFW, grayscale, and MFI-WHU compared to popular methods such as SESF-Fuse (Spatially Enhanced Spatial Frequency-based Fusion) and U2Fusion (Unified Unsupervised Fusion network).