To deal with the under-resourced labeled pronunciation data in mispronunciation detection, some other data were used to improve the discriminability of feature in the framework of Tandem system. Taking Chinese learning of English as object, unlabeled data, native Mandarin data and native English data which can be relatively easily accessed were selected as the assisted data. The experiments show that these types of data can effectively improve the performance of system, and the unlabeled data performs the best. And the effect to system performance was discussed with different length of frame context, the shallow and deep neural network typically represented by Multi-Layer Perception (MLP) and Deep Neural Network (DNN), and different structure of Tandem feature. Finally the strategy of merging multiple data streams was used to further improve the system performance, and the best system performance was achieved by combining the DNN based unlabeled data stream and native English stream. Compared with the baseline system, the recognition accuracy is increased by 7.96%, and the diagnostic accuracy of mispronunciation type is increased by 14.71%.
Single queue job scheduling algorithm in homogeneous Hadoop cluster causes short jobs waiting and low utilization rate of resources; multi-queue scheduling algorithms solve problems of unfairness and low execution efficiency, but most of them need setting parameters manually, occupy resources each other and are more complex. In order to resolve these problems, a kind of three-queue scheduling algorithm was proposed. The algorithm used job classifications, dynamic priority adjustment, shared resource pool and job preemption to realize fairness, simplify the scheduling flow of normal jobs and improve concurrency. Comparison experiments with First In First Out (FIFO) algorithm were given under three kinds of situations, including that the percentage of short jobs is high, the percentages of all types of jobs are similar, and the general jobs are major with occasional long and short jobs. The proposed algorithm reduced the running time of jobs. The experimental results show that the execution efficiency increase of the proposed algorithm is not obvious when the major jobs are short ones; however, when the assignments of all types of jobs are balanced, the performance is remarkable. This is consistent with the algorithm design rules: prioritizing the short jobs, simplifying the scheduling flow of normal jobs and considering the long jobs, which improves the scheduling performance.
The matching points cant be decided absolutely by its residuals just relying on epipolar geometry residuals, which influences the selection of optimum inlier set. So a novel fundamental matrix calculation algorithm was proposed based on three-view constraint. Firstly, the initial fundamental matrices were estimated by traditional RANdom SAmple Consensus (RANSAC) method. Then matching points existed in every view were selected, and the epipolar lines of points not in the common view were calculated in fundamental matrix estimation. Distances between the points in common view and the intersection of its matching points epipolar lines were calculated. Under judgment based on the distances, a new optimum inlier set was obtained. Finally, the M-Estimators (ME) algorithm was used to calculate the fundamental matrices based on the new optimum inlier set. Through a mass of experiments in case of mismatching and noise, the results indicate that the algorithm can effectively reduce the influence of mismatch and noise on accurate calculation of fundamental matrices. It gets better accuracy than traditional robust algorithms by limiting distance between point and epipolar line to about 0.3 pixels, in addition, an improvement in stability. So, it can be widely applied to fields such as 3D reconstruction based on image sequence and photogrammetry.