By comprehensively comparing the models of traditional knowledge graph representation learning, including the advantages and disadvantages and the applicable tasks, the analysis shows that the traditional single-modal knowledge graph cannot represent knowledge well. Therefore, how to use multimodal data such as text, image, video, and audio for knowledge graph representation learning has become an important research direction. At the same time, the commonly used multimodal knowledge graph datasets were analyzed in detail to provide data support for relevant researchers. On this basis, the knowledge graph representation learning models under multimodal fusion of text, image, video, and audio were further discussed, and various models were summarized and compared. Finally, the effect of multimodal knowledge graph representation on enhancing classical applications, including knowledge graph completion, question answering system, multimodal generation and recommendation system in practical applications was summarized, and the future research work was prospected.