As a critical task in the field of natural language processing, fact verification requires the ability to retrieve relevant evidences from large amount of plain text based on a given claim and use this evidence to reason and verify the claim. Previous studies usually use concatenation of evidence sentences or graph structure to represent the relationships among the evidences, but cannot represent the internal relevance among the evidences clearly. Therefore, a collaborative reasoning network model based on graph and text fusion — CNGT (Co-attention Network with Graph and Text fusion) was designed. The semantic fusion of evidence sentences was achieved by constructing evidence knowledge graph. Firstly, the evidential knowledge graph was constructed according to the evidence sentences, and the graph representation was learned by graph transformation encoder. Then, the BERT (Bidirectional Encoder Representations from Transformers) model was used to encode the claim and evidence sentences. Finally, the reasoning graph information and text features were fused effectively through the double-layer cooperative reasoning network. Experimental results show that the proposed model is better than the advanced model KGAT (Knowledge Graph Attention neTwork) on FEVER (Fact Extraction and VERification) dataset with Label Accuracy (LA) increased by 0.84 percentage points and FEVER score increased by 1.51 percentage points. It can be seen that the model pays more attention to the relationships among evidence sentences, demonstrating the interpretability of the model for the relationships among evidence sentences through the evidence graph.