Motor Imagery ElectroEncephaloGraph (MI-EEG) signals play a significant role in non-invasive Brain-Computer Interface (BCI) and have been utilized in clinical rehabilitation training widely. As one of the subjective paradigms, MI-EEG has high sample collection costs and large individual differences with complex time variability and low signal-to-noise ratio, so that constructing cross-subject MI-EEG decoding models have become a critical research focus. However, most of the recent cross-subject decoding models adopt the single-stage adversarial learning strategy, and only consider to learn deep representations with marginal and conditional distribution minimization, which constrain the MI-EEG decoding performance seriously. Therefore, a Multi-Stage Distribution Adaptation (MSDA) model was proposed for cross-subject MI-EEG decoding. Firstly, sample covariance was employed to align marginal distribution differences between subjects. Secondly, marginal distribution-invariant deep representations were obtained through pre-trained feature extractor and domain discriminator. Finally, a joint distribution-invariant mapping of deep representations was constructed using L2-distance, and such mapping and classifiers were trained alternately to learn joint distribution-invariant deep representations and used for cross-subject MI-EEG decoding. In MSDA model, distribution adaptation between subjects were conducted in three stages, including sample’s marginal distribution, deep representations’ marginal distribution and deep representations’ joint distribution, thereby addressing the challenge of single-stage distribution adaptation effectively. Experimental results on the BCI competition IV-2a and BCI Competition IV-2b public datasets demonstrate that MSDA model surpasses the latest decoding models in both accuracy and Kappa coefficient. The above indicates that MSDA model enhances the learning ability of cross-subject domain-invariant deep representations, which offers a new option for building MI-BCI.