Aiming at the problem that in the dynamic evolution of information diffusion, it is difficult to effectively capture structural features, temporal features, and the interactive expression between them, an information diffusion prediction model based on Transformer and Relational Graph Convolutional Network (TRGCN) was proposed. Firstly, a dynamic heterogeneous graph composed of the social network graph and the diffusion cascade graph was constructed. The structural features of each node in this graph were then extracted using Relational Graph Convolutional Network (RGCN). Secondly, the time embedding of each node was re-encoded using Bi-directional Long Short-Term Memory (Bi-LSTM) network. Then a time decay term was introduced to give different weights to the nodes at different time positions, so as to obtain the temporal features of nodes. Finally, structural features and temporal features were input into Transformer and then merged. Finally, the spatial-temporal features were obtained for information diffusion prediction. The experimental results on three real datasets of Twitter, Douban and Memetracker show that compared with the optimal model in the comparison experiment, the Hits@100 of TRGCN increase by 3.18%, 5.96% and 3.34% respectively, the Map@100 of TRGCN increase by 11.60%, 19.72% and 8.47% respectively, proving its validity and rationality.