Search Result

Select

Novel speaker identification framework based on narrative unit and reliable label

Tianyu LIU, Ye TAO, Chaofeng LU, Jiawang LIU

Journal of Computer Applications 2025, 45 (4): 1190-1198. DOI: 10.11772/j.issn.1001-9081.2024030331

Abstract （69）

HTML （0）

PDF （2354KB）（353）

Save

Speaker Identification （SI） in novels aims to determine the speaker of a quotation by its context. This task is of great help in assigning appropriate voices to different characters in the production of audiobooks. However， the existing methods mainly use fixed window values in the selection of the context of quotations， which is not flexible enough and may produce redundant segments， making it difficult for the model to capture useful information. Besides， due to the significant differences in the number of quotations and writing styles in different novels， a small number of labeled samples cannot enable the model to fully generalize， and the labeling of datasets is expensive. To solve the above problems， a novel speaker identification framework that integrates narrative units and reliable labels was proposed. Firstly， a Narrative Unit-based Context Selection （NUCS） method was used to select a suitable length of context for the model to focus highly on the segment closest to the quotation attribution. Secondly， a Speaker Scoring Network （SSN） was constructed with the generated context as input. In addition， the self-training was introduced， and a Reliable Pseudo Label Selection （RPLS） algorithm was designed to compensate for the lack of labeled samples to some extent and screen out more reliable pseudo-label samples with higher quality. Finally， a Chinese Novel Speaker Identification corpus （CNSI） containing 11 Chinese novels was built and labeled. To evaluate the proposed framework， experiments were conducted on two public datasets and the self-built dataset. The results show that the novel speaker identification framework that integrates narrative units and reliable labels is superior to the methods such as CSN （Candidate Scoring Network）， E2E_SI and ChatGPT-3.5.

Table and Figures | Reference | Related Articles | Metrics

Select

Speaker-emotion voice conversion method with limited corpus based on large language model and pre-trained model

Chaofeng LU, Ye TAO, Lianqing WEN, Fei MENG, Xiugong QIN, Yongjie DU, Yunlong TIAN

Journal of Computer Applications 2025, 45 (3): 815-822. DOI: 10.11772/j.issn.1001-9081.2024010013

Abstract （177）

HTML （2）

PDF （1966KB）（365）

Save

Aiming at the problems that few people have combined research on speaker conversion and emotional voice conversion， and the emotional corpora of a target speaker in actual scenes are usually small， which are not enough to train strong generalization models from scratch， a Speaker-Emotion Voice Conversion with Limited corpus （LSEVC） was proposed with fusion of large language model and pre-trained emotional speech synthesis model. Firstly， a large language model was used to generate text with required emotion tags. Secondly， a pre-trained emotional speech synthesis model was fine-tuned by using the target speaker corpus to embed into the target speaker. Thirdly， the emotional speech was synthesized from the generated text for data augmentation. Fourthly， the synthesized speech and source target speech were used to co-train speaker-emotion voice conversion model. Finally， to further enhance speaker similarity and emotional similarity of converted speech， the model was fine-tuned by using source target speaker’s emotional speech. Experiments were conducted on publicly available corpora and a Chinese fiction corpus. Experimental results show that the proposed method outperforms CycleGAN-EVC， Seq2Seq-EVC-WA2， SMAL-ET2 and other methods when considering evaluation indicators — Emotional similarity Mean Opinion Score （EMOS）， Speaker similarity Mean Opinion Score （SMOS）， Mel Cepstral Distortion （MCD）， and Word Error Rate （WER）.

Table and Figures | Reference | Related Articles | Metrics

Select

Underwater image enhancement algorithm based on artificial under-exposure fusion and white-balancing technique

Ye TAO, Wenhai XU, Luqiang XU, Fucheng GUO, Haibo PU, Guangtong CHEN

Journal of Computer Applications 2021, 41 (12): 3672-3679. DOI: 10.11772/j.issn.1001-9081.2021010065

Abstract （372）

HTML （8）

PDF （2675KB）（186）

Save

Acquisition of clear and accurate underwater images is an important prerequisite to help people explore the underwater world. However， compared with regular images， underwater images always have problems such as low contrast， detail loss and color distortion， resulting in bad visual effect. In order to solve the problems， a new underwater image enhancement algorithm based on Artificial Under-exposure Fusion and White-Balancing technique （AUF+WB） was proposed. Firstly， the Gamma correction operation was used to process the original underwater image and generate 5 corresponding under-exposure images. Then， the contrast， saturation and well-exposedness were employed as fusion weights， and the multi-scale fusion method was combined to generate the fused image. Finally， the images compensated by various color channels were combined with the Gray-World white balance assumption respectively to generate the corresponding white balance images， and these obtained white balance images were evaluated by using the Underwater Color Image Quality Evaluation （UCIQE） and the Underwater Image Quality Measure （UIQM）. With selecting different types of underwater images as experimental samples， the proposed AUF+WB algorithm was compared with the existing state-of-the-art underwater image defogging algorithms. The results show that， the proposed AUF+WB algorithm has better performance than the comparison algorithms on both qualitative and quantitative analysis of image quality. The proposed AUF+WB algorithm can effectively improve the visual quality of underwater images by removing color distortion， enhancing contrast， and recovering details of underwater images.

Table and Figures | Reference | Related Articles | Metrics

Select

New security analysis of several kinds of high-level cryptographical S-boxes

ZHAO Ying, YE Tao, WEI Yongzhuang

Journal of Computer Applications 2017, 37 (9): 2572-2575. DOI: 10.11772/j.issn.1001-9081.2017.09.2572

Abstract （785）

PDF （761KB）（682）

Save

Focusing on the problem whether there are new security flaws of several kinds of high-level cryptographic S-boxes, an algorithm for solving the nonlinear invariant function of S-boxes was proposed, which is mainly based on the algebraic relationship between the input and output of the cryptographic S-boxes. Using the proposed algorithm, several kinds of S-boxes were tested and it was found that several of them had the same nonlinear invariant function. In addition, if these S-boxes were used to non-linear parts of the block cipher Midori-64, a new variant algorithm would be obtained. The security analysis was carried out by non-linear invariant attack. The analytical results show that the Midori-64 variant is faced with serious secure vulnerability. In other words, there exist 2 ⁶⁴ weak keys when nonlinear invariant attack is applied to the Midori-64 variant, meanwhile data, time and storage complexity can be neglected, consequently some high-level cryptographic S-boxes have security flaws.

Reference | Related Articles | Metrics