Abstract:Document image distortion often appears when captured by the camera, which may induce recognition mistakes by Optical Character Recognition (OCR) software. In this paper, the technology of connected components labeling was used to detect words and text lines, and then based on the information of the middle dots of the words, linear fitting was used to get the words baselines. Finally, according to the words baselines and the distance for vertical displace, words rotation and vertical displace were made to obtain the corrected image. Compared with the traditional method, the computation of the words baselines and the distance for vertical displace in this paper are independent of the documents content, so as to guarantee the precision of words slope and make all words be aligned with the same line. The computation complexity of the algorithm was discussed at the end of this paper, and comparative experiments with traditional method were made. The experimental results show the proposed method is of high efficiency and robustness.