To improve the training stability of temporal convolutional networks (TCNs) under varying batch sizes and address the issue of low prediction accuracy caused by the inability of batch process quality prediction to capture long-term dependencies and global correlations, a Batch Group Normalization (BGN) and Mish activation function-enhanced residual structure TCN (BMTCN) combined with multi-head self-attention mechanism (MHSA) for batch process quality prediction (BMTCN-MHSA) was proposed. First, the three-dimensional data of the batch process was unfolded into a two-dimensional matrix form, and the data was normalized. Then, singular spectrum analysis (SSA) decomposition was introduced to reconstruct the data. Second, BGN was integrated into the residual part of the time-domain convolution to reduce the network model’s sensitivity to changes in batch size, the Mish activation function was introduced to enhance the model’s generalization ability, and the multi-head self-attention mechanism was utilized to associate and weight feature information from different positions in the sequence, thereby further extracting key feature information and interdependencies within the sequence, and better capturing the dynamic characteristics of the batch process. Finally, the model was validated using penicillin simulation experiment data. The experimental results show that compared to the TCN model, the BMTCN-MHSA model reduces the Mean Absolute Error (MAE) by 56.86%, the Mean Squared Error (MSE) by 48.80%, and achieves a coefficient of determination (R2) of 99.48%, indicating that the BMTCN-MHSA model improves the accuracy of quality prediction for batch processes.