中石化-IBM大数据方案介绍.ppt
《中石化-IBM大数据方案介绍.ppt》由会员分享,可在线阅读,更多相关《中石化-IBM大数据方案介绍.ppt(44页珍藏版)》请在三一办公上搜索。
1、,2012 IBM Corporation,2014年5月29日星期四,IBM大数据方案介绍曾翔IBM 信息管理 软件部,2012 IBM Corporation,2,议程 应用场景和启发 IBM的大数据平台,?TBs,3,何来大数据,每天20亿人浏览网页2011底 2012 IBM Corporation,30 亿/天 RFID标签数据(1.3B in 2005),46亿部智能电话,25+TBs 日志数据/每天7千6百万智能电表in 2009200M by 2014,12+TBs 每天智能手表、穿戴式电脑每年新增数亿GPS设备,2012 IBM Corporation,4,大数据的4维,数据
2、格式多,传输速度快,大数据量不确定性,2012 IBM Corporation,5,大数据为什么是现在?,2012 IBM Corporation,6,大数据带来什么表现优秀的企业使用分析技术的数量比表现较差的企业高五倍(见图1)。调查来自100多个国家、30多个行业的近3,000高管、经理和分析员,2012 IBM Corporation,7,大数据带来什么,消费意向,duke/unc and take it to the courtshttp:/,Im at Mickeys Irish Pub Downtown(2063rd St,Court Ave,Raleigh)w/2 othersh
3、ttp:/,silliesylvia good!U shouldnt!,Think about the important stuff,like ur 43rd birthday;),btw happy birthday Sylvia;),地址,silliesylvia I 3 your leatherleggings!Its so katniss!,年龄,个人属性,Sylvia Campbell,Female,In aRelationship32 years old,birthday on 7/17,Lives near Raleigh,NC,College graduate;Income
4、of 80-120k,喜爱和厌恶,Retweets BFs comments Interest in BBC shows:Downton Abbey,Sherlock,Fringe,(P&P?)Sherlock Holmes,Robert Downey,Jr.Hunger Games,Katniss/J.Lawrence兴趣/行为 Watch movies,tv shows Romance plots,“hero types”,strongwomenUses iPad 3,Redbox,HuluShopping,interest in sales/dealsDuke/UNC basketbal
5、l 2012 IBM Corporation,兴趣,bamagirl cant wait to,watch sherlock with you!Oh,robert downey jr,I still loveyou but bbc is so amazing,兴趣,silliesylvia$10 dollars saysmatthew&mary get marriednext season:)#downtownabbeyOMG OMG.just droppedmy new ipad3 crappola!预测消费88,态度John Carter Review Other than the cra
6、pcinematography and that it seems like alord of the flies in the thunderdome,itsstill disney and deserves at least atrilogy.Id be sad about the money,but Idid just pay to see American Reunion.,dear redbox please have kingsspeech for my new tv colin firthmovie marathon,360 度的客户视图,消费意向,Consumption,分析的
7、第一步:大数据中的信息提取行为Maybe our politicians should take aplaybook out of the rivalry between,9,Marketing,CampaignPerformance,Interactions,3rd PartyNewsSources,Social MediaActivity,ConsumerSubs&Distribution,Web&MobileAppBehavior,results in a roll-up view of millions ofaudience members,将多个来源的数据进行整合CRM,Demogr
8、aphic Data:age,gender,location,education,income,etcLifestage:maritalstatus,employment,family members,property ownership,etc,Product Affinity&Behaviors:Brands and product affinity,intent,andpurchases/ownershipMedia Affinity&Behaviors:Comprehensive view of contentpreferences and consumption-magazine,a
9、pps,TV,movies,music,games,etc,Lifestyle:hobbies,interests,activitiesAssociated Communities:Professional/educationalmemberships,social groups,and other associationsBrand Sentiment:Generalsentiment toward mediafranchises and competitivebrands,products 2012 IBM Corporation,现在我们对客户有了360度的全视角,分析的第二步:信息的整
10、合和挖掘,Analytics Complexity,2012 IBM Corporation,1010,Curated Panels Polling&ExtrapolationData Volume,360-degree Profiles Micro-segmentation Predict Behavior,Social Listening andMonitoring Sentiment Buzz Key influencers,Volume-Growing volume of socialmedia or other media source data Extract concepts f
11、rom several 100Mmessages per day100M+active users per source,Variety-Heterogeneous data Combine,correlate informationover 100s of sources(sites,forums,message boards,newswires),Velocity-Timely Decision,making Make decisions in near real-time over 10K+messages persecond,insightsunderstand jargon and
12、acronyms,eliminatespam,大数据量,数据格式多传输速度快,分析过程的挑战:Social Media Analytics:A Big Data Problem不确定性Veracity-From Noisy data to Trustworthy,2012 IBM Corporation,11,议程 应用场景和启发 IBM的大数据平台,12,PureData for Analytics基亍Netezza平台支持海量关系数据分析挖掘,PureData forTransactional Analytics基亍DB2数据仓库支持海量数据的实时分析,InfoSphere Streams
13、海量数据的实时分析平台,数据分析一体化平台,流计算平台,信息集成平台,InfoSphere InformationServer大数据量的数据集成与转换,IBM 大数据平台InfoSphere BigInsights基亍Hadoop平台,低延迟高性能分析平台支持非结构海量数据存储分析Hadoop平台,PureData for Hadoop基亍IBM BigInsight支持海量非结构化的数据分析 2012 IBM Corporation,2012 IBM Corporation,13,IBM 大数据平台解决的问题分析各种格式的大数据(Variety)Novel analytics on a br
14、oad set of mixedinformation that could not be analyzedbefore分析实时的大数据(Velocity)Streaming data analysisLarge volume data bursts and ad-hoc analysis分析极其海量的数据(Volume)Cost-efficiently process and analyze PBs of informationManage&analyze high volumes of structured,relationaldata分析和展现Ad-hoc analytics,data
15、discovery andexperimentation管理(Veracity)Enforce data structure,integrity and control toensure consistency for repeatable queries,14,PureData for Analytics基亍Netezza平台支持海量关系数据分析挖掘,PureData forTransactional Analytics基亍DB2数据仓库支持海量数据的实时分析,InfoSphere Streams海量数据的实时分析平台,关系型数据仓库平台,流计算平台,信息集成平台,InfoSphere In
16、formationServer大数据量的数据集成与转换,IBM 大数据平台InfoSphere BigInsights&Explorer基亍Hadoop平台,低延迟高性能分析平台支持非结构海量数据存储分析Hadoop平台,PureData for Hadoop基亍IBM BigInsight支持海量非结构化的数据分析 2012 IBM Corporation,2012 IBM Corporation,15,IBM BigInsight 基亍Hadoop的大数据分析平台 Hadoop 的计算模式 数据存在由便宜的计算机集群构成的分布式文件系统上 将应用功能分割在各个数据片上计算,然后汇总 可支持
17、几乎无限的节点扩展和PB级别的海量数据,1.Map Phase,(break job into small parts),2.Shuffle(transfer interim outputfor final processing)3.Reduce Phase(boil all output down toa single result set),Return a single result set,Result Set,public static class TokenizerMapperextends Mapper private final static IntWritableone=ne
18、w IntWritable(1);private Text word=new Text();public void map(Object key,Text val,ContextStringTokenizer itr=new StringTokenizer(val.toString();while(itr.hasMoreTokens()word.set(itr.nextToken();context.write(word,one);,public static class IntSumReducerextends ReducerText,IntWritable,Text,IntWrita,pr
19、ivate IntWritable result=new IntWritable();public void reduce(Text key,Iterable val,Context context)int sum=0;for(IntWritable v:val)sum+=v.get();.MapReduce Application,Distribute maptasks to clusterShuffle,Hadoop Data Nodes,2012 IBM Corporation,16,InfoSphere BigInsights 有别亍开源的Hadoop,2012 IBM Corpora
20、tion,17,GPFS-SNC并行文件系统 vs HDFS BigInsights底层存储GPFS-SNC基于GPFS发展而来,与HDFS相比,在性能、可靠性,易操作性方面具有巨大优势,是BigInsights强大的基石。,2012 IBM Corporation,18,增强的数据分析能力,企业级管理和处理能力的提升 SystemT文本分析器-基于Hadoop MapReduce文本分析,从非结构化的文本数据中抓取出结构化,半结构化的数据便于分析和处理。简单但是具有强大的扩展能力的JAQL语言。统计分析平台project R,以及机器智能学习systemML。可视化的工具BigSheet用于
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 石化 IBM 数据 方案 介绍
![提示](https://www.31ppt.com/images/bang_tan.gif)
链接地址:https://www.31ppt.com/p-5171469.html