Deep Subspace Clustering (DSC) is based on the assumption that the original data lies in a collection of low-dimensional nonlinear subspaces. In the multi-scale representation learning methods for deep subspace clustering, based on deep auto-encoder, fully connected layers are added between the encoder and the corresponding decoder for each layer to capture multi-scale features, without deeply analyzing the nature of multi-scale features and considering the multi-scale reconstruction loss between input data and output data. In order to solve the above problems, firstly, the reconstruction loss function of each network layer was established to supervise the learning of encoder parameters at different levels; then, a more effective multi-scale self-representation module was proposed based on the block diagonality of the sum of the common self-representation matrix and the unique self-representation matrices for multi-scale features; finally, the diversity of unique self-representation matrices for different scale features was analyzed in depth and the multi-scale feature matrices were used effectively. On this basis, an MSCD-DSC (Multiscale Self-representation learning with Consistency and Diversity for Deep Subspace Clustering) method was proposed. Experimental results on the datasets Extended Yale B, ORL, COIL20 and Umist show that, compared to the suboptimal method MLRDSC (Multi-Level Representation learning for Deep Subspace Clustering), the clustering error rate of MSCD-DSC is reduced by 15.44%, 2.22%, 3.37%, and 13.17%, respectively, indicating that the clustering effect of MSCD-DSC is better than those of the existing methods.