计算机应用 ›› 2015, Vol. 35 ›› Issue (11): 3092-3096.DOI: 10.11772/j.issn.1001-9081.2015.11.3092

• 2015年全国开放式分布与并行计算学术年会(DPCS 2015)论文 • 上一篇    下一篇

大规模InfiniBand网络自学习的故障诊断方法

胡银辉, 陈琳   

  1. 国防科学技术大学 计算机学院, 长沙 410073
  • 收稿日期:2015-06-17 修回日期:2015-07-28 发布日期:2015-11-13
  • 通讯作者: 胡银辉(1987-),男,贵州凤冈人,硕士研究生,主要研究方向:网络故障管理.
  • 作者简介:陈琳(1976-),女,福建陇海人,副教授,博士,主要研究方向:数据中心资源管理、网络故障管理.
  • 基金资助:
    国家863计划项目(2012AA01A50606).

Incremental learning method for fault diagnosis in large-scale InfiniBand network

HU Yinhui, CHEN Lin   

  1. College of Computer, National University of Defense Technology, Changsha Hunan 410073, China
  • Received:2015-06-17 Revised:2015-07-28 Published:2015-11-13

摘要: 针对大规模数据中心网络中如何有效监控网络异常事件、发现网络性能瓶颈和潜在故障点等问题,在深入分析InfiniBand(IB)网络的特性,引入了特征选取策略和增量学习策略的基础上,提出了一种面向大规模IB网络增量学习的故障诊断方法IL_Bayes,该方法以贝叶斯分类方法为基础,加入增量学习机制,能够有效提高故障分类精度.在天河2真实的网络环境下,对算法的诊断精度和误诊率进行了验证,结果表明IL_Bayes算法具有较高的故障分类精度和较低的误诊率.

关键词: 数据中心, InfiniBand, 故障诊断, 贝叶斯分类, 增量学习

Abstract: Aiming at how to effectively monitor the network abnormal events, find the bottleneck of network performance and potential point of failure in large-scale data center network, based on the deep analysis of the characteristics of InfiniBand (IB) network and introducing the feature selection strategy and incremental learning strategy, an incremental learning method of fault diagnosis for large-scale IB network (IL_Bayes) which based on the Bayes classification and added incremental learning mechanism was proposed. It could effectively improve the accuracy of fault classification. Through testing and verifying the diagnostic accuracy and the rate of misdiagnosis of this method in the Tianhe-2's real network environment, the result shows that the IL_Bayes method has higher classification accuracy and lower misdiagnosis rate.

Key words: data center, InfiniBand, fault diagnosis, Bayes classification, incremental learning

中图分类号: