Existing deep learning methods still face the following challenges in the research on high-precision imperfect grain recognition: the key discriminative features of imperfect grains are often distributed across image regions of varying scales and random positions, making it difficult to perceive these regions stably and comprehensively; meanwhile, the fine-grained discriminative features of multiple imperfect grains have diverse representations, and a unified modeling path struggles to optimize recognition performance of all categories simultaneously. To address these issues, a globally guided two-stage local feature learning framework was proposed based on TransNeXt. Deep representations of key discriminative regions were extracted under holistic perception and further refined through fine-grained modeling. Independently optimized network branches were designed for different categories, with all branches sharing the backbone to enable efficient adaptation and lightweight scalability. To support the above methods, an imperfect grain dataset covering multiple grain varieties with standardized category and discriminative region location annotations was constructed. Experimental results show that the proposed method achieves accuracy of 99.62% on the test set, verifying the framework's effectiveness and scalability in complex fine-grained image recognition tasks.