In recent years, the rapid development of Internet of Things (IoT) has spurred the emergence of the Internet of Behavior (IoB), which leverages IoT-derived data and information to achieve higher levels of knowledge and wisdom, rapidly evolving into a promising technology in various application potential. IoB involves extensive collection, processing, and utilization of user behavioral data, thereby exposing user data security and privacy to significant risks. Therefore, it is vital to protect the IoB user data with effective data management and control. After introducing the fundamental concepts and characteristics of IoB, its development trends and the security and privacy risks associated with user data were analyzed. Furthermore, the current situation of management and control of behavioral data was elaborated, the main problems and challenges existed in IoB were discussed, and the potential research directions to achieve user data management and control in IoB were proposed.
Aiming at the unsafety and being out of control problems caused by biases in the output of Large Language Model (LLM), research status, techniques, and limitations related to biases in the existing LLMs were sorted deeply and analyzed from three aspects: bias identification, evaluation, and mitigation. Firstly, three key techniques of LLM were summed up to study the basic reasons of LLMs’ inevitable intrinsic biases. Secondly, three types of biases in LLMs were categorized into linguistic bias, demographic bias, and evaluation bias, and characteristics and causes of the biases were explored. Thirdly, a systematic review of the existing LLM bias evaluation benchmarks was carried out, and the strengths and weaknesses of these general-purpose, language-specific, and task-specific benchmarks were discussed. Finally, current LLM bias mitigation techniques were analyzed in depth from both model bias mitigation and data bias mitigation perspectives, and directions for their future refinement were pointed out. At the same time, the research directions for biases in LLMs were indicated by analysis: multi-cultural attribute evaluation of bias, lightweight bias mitigation techniques, and enhancement of the interpretability of biases.
Current deep multi-view clustering methods have the following shortcomings: 1) When feature extraction is carried out for a single view, only attribute information or structural information of the samples is considered, and these two types of information are not integrated. Thus, the extracted features cannot fully represent latent structure of the original data. 2) Feature extraction and clustering were divided into two separated processes, without establishing the relationship between them, so that the feature extraction process cannot be optimized by the clustering process. To solve these problems, a Deep Fusion based Multi-view Clustering Network (DFMCN) was proposed. Firstly, the embedding space of each view was obtained by combining autoencoder and graph convolution autoencoder to fuse attribute information and structure information of samples. Then, the embedding space of the fusion view was obtained through weighted fusion, and clustering was carried out in this space. And in the process of clustering, the feature extraction process was optimized by a two-layer self-supervision mechanism. Experimental results on FM (Fashion-MNIST), HW (HandWritten numerals), and YTF (YouTube Face) datasets show that the accuracy of DFMCN is higher than those of all comparison methods; and DFMCN has the accuracy increased by 1.80 percentage points compared with the suboptimal CMSC-DCCA (Cross-Modal Subspace Clustering via Deep Canonical Correlation Analysis) method on FM dataset, the Normalized Mutual Information (NMI) of DFMCN is increased by 1.26 to 14.84 percentage points compared to all methods except for CMSC-DCCA and DMSC (Deep Multimodal Subspace Clustering networks). Experimental results verify the effectiveness of the proposed method.
In the field of Natural Language Processing (NLP), as an efficient method for sentence representation learning, contrastive learning mitigates the anisotropy of Transformer-based pre-trained language models effectively and enhances the quality of sentence representations significantly. However, the existing research focuses on English conditions, especially under supervised settings. Due to the lack of labeled data, it is difficult to utilize contrastive learning effectively to obtain high-quality sentence representations in most non-English languages. To address this issue, a cross-lingual knowledge transfer method for contrastive learning models was proposed, transferring knowledge across languages by aligning the structures of different language representation spaces. Based on this, a simple and effective cross-lingual knowledge transfer framework, TransCSE, was developed to transfer the knowledge from supervised English contrastive learning models to non-English models. Through knowledge transfer experiments from English to six directions, including French, Arabic, Spanish, Turkish, and Chinese, knowledge was transferred successfully from the supervised contrastive learning model SimCSE (Simple contrastive learning of sentence embeddings) to the multilingual pre-trained language model mBERT (Multilingual Bidirectional Encoder Representations from Transformers) by TransCSE. Experimental results show that model trained using the TransCSE framework achieves accuracy improvements of 17.95 and 43.27 percentage points on XNLI (Cross-lingual Natural Language Inference) and STS (Semantic Textual Similarity) 2017 benchmark datasets, respectively, compared to the original mBERT, proving the effectiveness of TransCSE. Moreover, compared to cross-lingual knowledge transfer methods based on shared parameters and representation alignment, TransCSE has the best performance on both XNLI and STS 2017 benchmark datasets.