###
计算机系统应用英文版:2020,29(11):40-46
本文二维码信息
码上扫一扫!
基于Pytorch和神经网络的云数据中心故障检测
(1.国家电网有限公司信息通信分公司, 北京 100761;2.南瑞集团(国网电力科学研究院)有限公司, 南京 211106;3.南京南瑞信息通信科技有限公司, 南京 210003)
Cloud Data Center Fault Detection Based on Pytorch and Neural Network
(1.State Grid Information and Telecommunication Branch, Beijing 100761, China;2.Nari Group Corporation (State Grid Electric Power Research Institute), Nanjing 211106, China;3.Nanjing Nari Information Communication Technology Co. Ltd., Nanjing 210003, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 1136次   下载 2179
Received:April 02, 2020    Revised:April 28, 2020
中文摘要: 随着现代科技的迅猛发展, 数据中心已经成为信息化社会的IT基础设施, 存储管理大量关键数据. 当前, 数据中心的管理大多是依靠经验丰富的专业运维人员使用计算机自动监测机房设备各项指标, 并对设备做出多次检查, 耗时且繁琐. 深度学习和人工智能技术当前吸引了越来越多的注意力, 并在互联网和工业领域取得了许多成功应用. 本文设计了基于门控循环单元的深度学习框架对云数据中心机房设备故障进行自动化的诊断, 并联合时序信息基于过去设备的运行状态信息对未来状态进行预测. 其中, 序列数据以固定时间窗分割后输入双向GRU单元层, 使网络学习到数据点的前后时间依赖关系. 在GRU层输出基础上, 我们添加了自注意力层和embedding层, 让神经网络能够学习到对故障预测更有效的特征并进一步对特征进行降维. 最后, 多层感知机被用于对降维后的数据进行分类. 基于真实数据集的实验结果显示, 本文提出的基于GRU的深度学习框架相比LSTM, SVM和KNN等常用模型能够更准确地检测出云数据中心故障.
Abstract:With the rapid development of modern technology, the data center has become the IT infrastructure of the information society, storing and managing a large amount of key data. At present, the management of data centers mostly relies on experienced professional operation and maintenance personnel to use computers to automatically monitor equipment room equipment indicators and make multiple inspections of equipment, which is time-consuming and tedious. Deep learning and artificial intelligence technologies are currently attracting more and more attention and have achieved many successful applications in the Internet and industrial fields. This study designs a Gated Recurrent Unit (GRU) based deep learning framework to automatically diagnose equipment failures in cloud data center equipment rooms and combines timing information to predict future states based on past equipment operating status information. Series data are split into fixed time windows as input to the bidirectional GRU layer which makes the network learn the time dependency relationship in data points. Besides, we add an attention layer and embedding layer after the output of GRU unit, to help the neural network learning more efficient features for prediction task and further dimension reduction. At last, multi-layer perception is used to classify the data. Experimental results based on real data sets show that proposed neural network framework based on GRU can accurately detect cloud data center faults compared with LSTM, SVM and KNN.
文章编号:     中图分类号:    文献标志码:
基金项目:国家电网有限公司2019年总部科技项目(5700-201917224A-0-0-00)
引用文本:
来风刚,刘军,李济伟,王怀宇,牟霄寒,刘赛.基于Pytorch和神经网络的云数据中心故障检测.计算机系统应用,2020,29(11):40-46
LAI Feng-Gang,LIU Jun,LI Ji-Wei,WANG Huai-Yu,MOU Xiao-Han,LIU Sai.Cloud Data Center Fault Detection Based on Pytorch and Neural Network.COMPUTER SYSTEMS APPLICATIONS,2020,29(11):40-46