An Introduction of Big Data[大数据的介绍](PPT34).ppt
《An Introduction of Big Data[大数据的介绍](PPT34).ppt》由会员分享,可在线阅读,更多相关《An Introduction of Big Data[大数据的介绍](PPT34).ppt(34页珍藏版)》请在三一办公上搜索。
1、An Introduction of Big Data,WEB GROUP2011.9.24,1,2,Outline,What is Big DataThe Framework of Big DataThe Applications of Big DataThe Challenges of Big DataResearch works related with Big DataConclusions,3,Information Explosion57%every year(IDC)Double every 1.5 years988EB(1EB=1024PB)data will be produ
2、ced in 2010(IDC)18 million times of all info in books IT850 million photos&8 million videos/day(Facebook)50PB web pages,500PB log(Baidu)Telco(Log,multimedia data)Enterprise Storage Public UtilitiesHealth Care(medical images-photos)Public Traffic(surveillance-videos),What is Big Data,4,DefinitionBig
3、data is the confluence of the three trends consisting of Big Transaction Data,Big Interaction and Big Data ProcessingQuestions?Big Data=Large-Scale Data(Massive Data),What is Big Data,Structural and Semi-Structural Transaction Data,.Unstructured dataInteraction Data,5,The properties of Big DataHugeD
4、istributedDispersed over many serversDynamicItems add/deleted/modified continuouslyHeterogeneousMany agents access/update dataNoisyInherentUnintentionalMaliciousUnstructured/semi-structuredNo database schemaComplex interrelationships,What is Big Data,6,Outline,What is Big DataThe Framework of Big Da
5、taThe Applications of Big DataThe Challenges of Big DataResearch works related with Big DataConclusions,7,The Framework of Big Data,8,Outline,What is Big DataThe Framework of Big DataThe Applications of Big DataThe Challenges of Big DataResearch works related with Big DataConclusions,9,The Applicati
6、ons of Big Data,Celestial bodyExobiology,Inheritance Sequence of cancer,AdvertisementFinding communities,SNAFinding communities,Data MiningConsuming habit,Changing router,10,Outline,What is Big DataThe Framework of Big DataThe Applications of Big DataThe Challenges of Big DataResearch works related
7、with Big DataConclusions,11,Efficiency requirements for AlgorithmTraditionally,“efficient”algorithmsRun in(small)polynomial time:O(nlogn)Use linear space:O(n)For large data sets,efficient algorithmsMust run in linear or even sub-linear time:o(n)Must use up to poly-logarithmic space:(logn)2Mining Big
8、 DataAssociation Rule and Frequent PatternsTwo parameters:support,confidenceClusteringDistance measure(L1,L2,L,Edit Distance,etc,.)Graph structureSocial Networks,Degree distribution(heavy trail),The Challenges of Big Data,12,Clean Big DataNoise in data distorts Computation resultsSearch resultsNeed
9、automatic methods for“cleaning”the dataDuplicate eliminationQuality evaluationComputing ModelAccuracy and ApproximationEfficiency,The Challenges of Big Data,13,Abstract Model of Computing,Computing Model of Big Data,13,Approximation of,Data,(n is very large),Approximation of f(x)is sufficient Progra
10、m can be randomized,Computer Program,Examples,Mean,Parity,Random Sampling,Computing Model of Big Data,14,Query a few data items,Data,Examples,MeanO(1)queries,Parityn queries,Approximation of,(n is very large),Computer Program,15,AdvantagesUltra-efficientSub-linear running time&space(could even be in
11、dependent of data set size)DisadvantagesMay require random accessDoesnt fit many problems,Random Sampling,Data Streams,Computing Model of Big Data,16,Data,Computer Program,Stream through the data;Use limited memory,Examples,MeanO(1)memory,Parity1 bit of memory,Approximation of,(n is very large),17,A
12、dvantagesSequential accessLimited memoryDisadvantagesRunning time is at least linearToo restricted for some problems,Random Sampling,Sketching,Computing Model of Big Data,18,Data1,Data2,Data1,Data2,Sketch2,Sketch1,Compress eachdata segment intoa small“sketch”Compute overthe sketches,Examples,Equalit
13、yO(1)size sketch,Hamming distanceO(1)size sketch,Lp distance(p 2)(n1-2/p)size sketch,Approximation of,(n is very large),19,Outline,What is Big DataThe Framework of Big DataThe Applications of Big DataThe Challenges of Big DataResearch works related with Big DataConclusions,20,Finding Maximal Cliques
14、 in Massive Networks by H*-Graph(Sigmod 2010)Large-Scale Collective Entity Matching(VLDB2011)Estimating Sizes of Social Networks via Biased Sampling(WWW 2011),Research works related with Big Data,21,Massive graph dataGraph is a powerful modeling tool for analyzing massive networks.Graph data is ever
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 大数据的介绍 An Introduction of Big Data大数据的介绍PPT34 Data 数据 介绍 PPT34
链接地址:https://www.31ppt.com/p-2366607.html