基于Pytorch和神经网络的云数据中心故障检测
作者:
作者单位:

作者简介:

通讯作者:

基金项目:

国家电网有限公司2019年总部科技项目(5700-201917224A-0-0-00)


Cloud Data Center Fault Detection Based on Pytorch and Neural Network
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
    摘要:

    随着现代科技的迅猛发展, 数据中心已经成为信息化社会的IT基础设施, 存储管理大量关键数据. 当前, 数据中心的管理大多是依靠经验丰富的专业运维人员使用计算机自动监测机房设备各项指标, 并对设备做出多次检查, 耗时且繁琐. 深度学习和人工智能技术当前吸引了越来越多的注意力, 并在互联网和工业领域取得了许多成功应用. 本文设计了基于门控循环单元的深度学习框架对云数据中心机房设备故障进行自动化的诊断, 并联合时序信息基于过去设备的运行状态信息对未来状态进行预测. 其中, 序列数据以固定时间窗分割后输入双向GRU单元层, 使网络学习到数据点的前后时间依赖关系. 在GRU层输出基础上, 我们添加了自注意力层和embedding层, 让神经网络能够学习到对故障预测更有效的特征并进一步对特征进行降维. 最后, 多层感知机被用于对降维后的数据进行分类. 基于真实数据集的实验结果显示, 本文提出的基于GRU的深度学习框架相比LSTM, SVM和KNN等常用模型能够更准确地检测出云数据中心故障.

    Abstract:

    With the rapid development of modern technology, the data center has become the IT infrastructure of the information society, storing and managing a large amount of key data. At present, the management of data centers mostly relies on experienced professional operation and maintenance personnel to use computers to automatically monitor equipment room equipment indicators and make multiple inspections of equipment, which is time-consuming and tedious. Deep learning and artificial intelligence technologies are currently attracting more and more attention and have achieved many successful applications in the Internet and industrial fields. This study designs a Gated Recurrent Unit (GRU) based deep learning framework to automatically diagnose equipment failures in cloud data center equipment rooms and combines timing information to predict future states based on past equipment operating status information. Series data are split into fixed time windows as input to the bidirectional GRU layer which makes the network learn the time dependency relationship in data points. Besides, we add an attention layer and embedding layer after the output of GRU unit, to help the neural network learning more efficient features for prediction task and further dimension reduction. At last, multi-layer perception is used to classify the data. Experimental results based on real data sets show that proposed neural network framework based on GRU can accurately detect cloud data center faults compared with LSTM, SVM and KNN.

    参考文献
    相似文献
    引证文献
引用本文

来风刚,刘军,李济伟,王怀宇,牟霄寒,刘赛.基于Pytorch和神经网络的云数据中心故障检测.计算机系统应用,2020,29(11):40-46

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2020-04-02
  • 最后修改日期:2020-04-28
  • 录用日期:
  • 在线发布日期: 2020-10-30
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号