Image data in actual business scenarios usually presents the characteristics of rich content and complex distortion performance, which is a great challenge to the generalization of objective Image Quality Assessment (IQA) algorithms. In order to solve this problem, a No-Reference IQA (NR-IQA) algorithm was proposed, which is mainly composed of three sub-networks: Feature Extraction Network (FEN), Feature Fusion Network (FFN), and Adaptive Prediction Network (APN). Firstly, the global view, local patch, and saliency view of the sample were input into the FEN together, and the global distortion, local distortion, and saliency features were extracted by Swim Transformer. Then, the cascaded Transformer encoder was used to fuse the global distortion features and local distortion features, and the potential correlation patterns of the two were explored. Inspired by the human visual attention mechanism, the saliency features were used in the FFN to activate the attention module, so that the module was able to pay additional attention to the visual salient region, so as to improve the semantic parsing ability of the algorithm. Finally, the prediction score was calculated by the dynamically constructed MultiLayer Perceptron (MLP) regression network. Experimental results on main stream synthetic and real-world distortion datasets show that compared with the DSMix (Distortion-induced Sensitivity map-guided Mixed augmentation) algorithm, the proposed algorithm improves the Spearman Rank-order Correlation Coefficient (SRCC) by 4.3% on TID2013 dataset, and the Pearson Linear Correlation Coefficient (PLCC) by 1.4% on KonIQ dataset. The proposed algorithm also demonstrates excellent generalization ability and interpretability, which can deal with the complex distortion performance in business scenarios effectively, and can make adaptive prediction according to the individual characteristics of the sample.