###
计算机系统应用英文版:2017,26(8):9-15
←前一篇   |   后一篇→
本文二维码信息
码上扫一扫!
基于Spark的油藏数据挖掘与分析
(中国石油大学(华东) 计算机与通信工程学院, 青岛 266580)
Reservoir Data Mining and Analysis Based on Spark
(Computer and Communication Engineering, China University of Pertroleum, Qingdao 266580, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 1112次   下载 1997
Received:December 09, 2016    
中文摘要: 为了方便油藏数据特征的分析和石油的勘探开发过程,本文利用Spark并行计算框架分析油藏数据,并通过数据挖掘算法分析油藏属性之间的潜在关系,对油藏的不同层段进行了分类和预测.本文的主要工作包括:搭建Spark分布式集群和数据处理、分析平台,Spark是流行的大数据并行计算框架,相对传统的一些分析方法和工具,可以实现快速、准确的数据挖掘任务;根据油藏数据的特点建立多维异常检测函数,并新增渗孔比判别属性Pr;在处理不平衡数据时,针对逻辑回归分类提出交叉召回训练模型,并优化代价函数,针对决策树,提出KR-SMOTE对小类别样本进行过采样扩充,这两种方法都可以有效处理数据不平衡问题,提高分类精度.
Abstract:In order to improve the analysis of reservoir properties and oil exploration and development process, this paper analyzes data and finds relationships between reservoir properties using Spark parallel computing framework and data mining algorithm, and classifies and predicts different reservoir segments. The main work in this paper includes: building the Spark distributed clustering and data processing and analysis platform, Spark being a popular big data parallel computing framework, which can achieve fast and accurate data mining tasks compared with some traditional analysis methods and tools; establishing a multidimensional outlier detection function according to the characteristics of reservoir data and adding a new discriminant attribute Pr; proposing a cross-recall training model and optimized cost function for logistic regression classification in dealing with the imbalanced data. KR-SMOTE is used to oversample for decession tree classification that both improve the classification precision.
文章编号:     中图分类号:    文献标志码:
基金项目:
引用文本:
武志军,夏盛瑜,王鹏.基于Spark的油藏数据挖掘与分析.计算机系统应用,2017,26(8):9-15
WU Zhi-Jun,XIA Sheng-Yu,WANG Peng.Reservoir Data Mining and Analysis Based on Spark.COMPUTER SYSTEMS APPLICATIONS,2017,26(8):9-15