A light-adaptive image fusion algorithm based on gradient enhancement and text guidance was developed to address the limitations of existing fusion algorithms that cause loss of detailed information, edge degradation, and unclear salient feature under complex lighting environments. Firstly, a feature extraction module based on gradient enhancement and linear spatial equations was constructed to extract global feature with linear computational complexity along with enhancing edge gradient information. Secondly, scene description text was embedded to guide the fusion network to generate fused images of different styles in different lighting environments, so that the robustness of the fusion algorithm in complex lighting environments was improved. Finally, a Gradient Enhanced Fusion Module (GEFM) based on cross-attention mechanism was designed to achieve gradient enhancement and fusion of multimodal information. Experimental results on three benchmark datasets including TNO, MSRS (MultiSpectral Road Scenarios), LLVIP (Low-Light Visible-Infrared Paired) demonstrate that the proposed algorithm outperforms comparative algorithms such as LRRNet(Low-Rank Representation Network), CAMF(Class Activation Mapping Fusion), DATFuse(Dual Attention Transformer Fusion), UMF-CMGR(Unsupervised Misaligned Fusion via Cross-Modality image Generation and Registration) and GANMcC(GAN with Multi-classification Constraints) in five quantitative metrics. Specifically, the Spatial Frequency (SF) and Visual Information Fidelity (VIF) metrics were improved by 22%, 59%, 61% and 31%, 53%, 37%, respectively. The algorithm effectively reduces edge blurring and ensures that fused images maintain high clarity and contrast under different lighting environments.