Existing methods for age estimation typically employ ordinal regression based on Convolutional Neural Network (CNN). However, when predicting adjacent ages, CNN is difficult in capturing global feature representations, resulting in a decrease in prediction accuracy. In order to solve the problem, an age estimation method was proposed, which combined an enhanced CloFormer model with ordinal regression. Compared to traditional CNN-based ordinal regression, CloFormer, when capturing image features, can better utilize self-attention mechanism to capture relationships between different regions in an image, thereby improving the learning of feature differences between adjacent ages. In the proposed method, firstly, the CloFormer model was optimized, and then the optimized CloFormer model was combined with ordinal regression to better utilize the age sequence information, achieving more precise age estimation. Subsequently, through end-to-end optimization training of the improved CloFormer model and ordinal regression model, the proposed method was able to better learn the relationships between facial features and age sequences. Finally, comparative experiments were conducted on multiple publicly available datasets. Experimental results show that on CACD, AFAD, and UTKFace datasets, the Root Mean Square Error (RMSE) of the proposed method is 7.36, 4.62, and 8.28, respectively. In comparison to existing age estimation methods such as Ordinal Regression with CNN (OR-CNN) and COnsistent RAnk Logits (CORAL), the RMSEs are reduced by 0.25 and 0.05 respectively on CACD dataset, 0.18 and 0.03 respectively on AFAD dataset, and 0.97 and 0.53 respectively on UTKFace dataset, illustrating that the proposed method has better age estimation results.