Time series similarity measure based on Siamese neural network

doi:10.11772/j.issn.1001-9081.2018081837

Abstract

Abstract: In data mining such as time series classification, the similarity performance based on category of different datasets are significantly different from each other. Therefore, a reasonable and effective similarity measure is crucial to data mining. The traditional methods such as Euclidean Distance (ED), cosine distance and Dynamic Time Warping (DTW) only focus on the similarity formula of the data themselves, but ignore the influence of the knowledge annotation contained in different datasets on the similarity measure. To solve this problem, a learning method of time series similarity measure based on Siamese Neural Network (SNN) was proposed. In the method, the neighborhood relationship between the data was learnt from the supervision information of sample tags, and an efficient distance measure between time series was established. The similarity measurement and confirmatory classification experiments were performed on UCR-provided time series datasets. Experimental results show that compared with ED/DTW-1NN(one Nearest Neighbors), the overall classification quality of SNN is improved significantly. The Dynamic Time Warping (DTW)-based 1NN calssification method outperforms the SNN-based 1NN classification method on some data, but SNN outperforms DTW in complexity and speed of similarity calculation during the classification. The results show that the proposed method can significantly improve the measurement efficiency of the classification of dataset similarity, and has good performance for high-dimensional and complex time-series data classification.

Key words: time serie, similarity measure, neural network, Siamese Neural Network (SNN)

摘要： 在时间序列分类等数据挖掘工作中，不同数据集基于类别的相似性表现有明显不同，因此一个合理有效的相似性度量对数据挖掘非常关键。传统的欧氏距离、余弦距离和动态时间弯曲等方法仅针对数据自身进行相似度公式计算，忽略了不同数据集所包含的知识标注对于相似性度量的影响。为了解决这一问题，提出基于孪生神经网络（SNN）的时间序列相似性度量学习方法。该方法从样例标签的监督信息中学习数据之间的邻域关系，建立时间序列之间的高效距离度量。在UCR提供的时间序列数据集上进行的相似性度量和验证性分类实验的结果表明，与ED/DTW-1NN相比SNN在分类质量总体上有明显的提升。虽然基于动态时间弯曲（DTW）的1近邻（1NN）分类方法在部分数据上表现优于基于SNN的1NN分类方法，但在分类过程的相似度计算复杂度和速度上SNN优于DTW。可见所提方法能明显提高分类数据集相似性的度量效率，在高维、复杂的时间序列的数据分类上有不错的表现。

关键词: 时间序列, 相似性度量, 神经网络, 孪生神经网络

CLC Number:

TP391

JIANG Yifan, YE Qing. Time series similarity measure based on Siamese neural network[J]. Journal of Computer Applications, 2019, 39(4): 1041-1045.

姜逸凡, 叶青. 基于孪生神经网络的时间序列相似性度量[J]. 计算机应用, 2019, 39(4): 1041-1045.

References

[1] 崔婧, 赵秀娟, 宋吟秋.中日股价序列相似性的比较分析[J]. 系统工程理论与实践, 2009, 29(12):125-133. (CUI J, ZHAO X J, SONG Y Q. Similarity analysis on China's and Japan's security price series[J]. Systems Engineering - Theory and Practice, 2009, 29(12):125-133.)
[2] SIVARAKS H, RATANAMAHATANA C A. Robust and accurate anomaly detection in ECG artifacts using time series motif discovery[J]. Computational and Mathematical Methods in Medicine, 2015, 2015:453214.
[3] 陈海燕, 刘晨晖, 孙博.时间序列数据挖掘的相似性度量综述[J]. 控制与决策, 2017, 32(1):1-11. (CHEN H Y, LIU C H, SUN B. Survey on similarity measurement of time series data mining[J]. Control and Decision, 2017, 32(1):1-11.)
[4] BERNDT D J, CLIFFORD J. Using dynamic time warping to find patterns in time series[C]//AAAIWS 1994:Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining. Menlo Park, CA:AAAI Press, 1994, 10(16):359-370.
[5] FALOUTSOS C, RANGANATHAN M, MANOLOPOULOS Y. Fast subsequence matching in time-series databases[C]//SIGMOD 1994:Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data. New York:ACM, 1994:419-429.
[6] 李海林, 梁叶, 王少春.时间序列数据挖掘中的动态时间弯曲研究综述[J]. 控制与决策, 2018, 33(8):1345-1353. (LI H L, LIANG Y, WANG S C. Review on dynamic time warping in time series data mining[J]. Control and Decision, 2018, 33(8):1345-1353.)
[7] 沈媛媛, 严严, 王菡子.有监督的距离度量学习算法研究进展[J]. 自动化学报, 2014, 40(12):2673-2686. (SHEN Y Y, YAN Y, WANG H Z. Recent advances on supervised distance metric learning algorithms[J]. Acta Automatica Sinica, 2014, 40(12):2673-2686.)
[8] BROMLEY J, GUYON I, LECUN Y, et al. Signature verification using a "siamese" time delay neural network[C]//NIPS 1993:Proceedings of the 6th International Conference on Neural Information Processing Systems. San Francisco, CA:Morgan Kaufmann Publishers, 1994:737-744.
[9] CHOPRA S, HADSELL R, LECUN Y. Learning a similarity metric discriminatively, with application to face verification[C]//CVPR 2005:Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2005:539-546.
[10] WANG F Q, ZUO W M, LIN L, et al. Joint learning of single-image and cross-image representations for person re-identification[C]//CVPR 2016:Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2016:1288-1296.
[11] DONG Y, ZHEN L, SHENG L, et al. Deep metric learning for person re-identification[C]//Proceedings of the 201422nd International Conference on Pattern Recognition. Piscataway, NJ:IEEE, 2014:34-39.
[12] HUANG P S, HE X D, GAO J F, et al. Learning deep structured semantic models for Web search using clickthrough data[C]//Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management. New York:ACM, 2013:2333-2338.
[13] COVER T, HART P. Nearest neighbor pattern classification[J]. IEEE Transactions on Information Theory, 1967, 13(1):21-27.
[14] BATISTA G E, WANG X, KEOGH E J. A complexity-invariant distance measure for time series[EB/OL].[2018-05-10]. https://epubs.siam.org/doi/pdf/10.1137/1.9781611972818.60.
[15] MAATEN L, HINTON G. Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9:2579-2605.
[16] KINGMA D P, BA J. Adam:a method for stochastic optimization[EB/OL].[2018-05-10]. https://arxiv.org/pdf/1412.6980.
[17] CHEN Y, KEOGH E, HU B, et al. The UCR time series classification archive[DB/OL].[2018-05-10]. http://www.cs.ucr.edu/~eamonn/time_series_data/.