Google云计算技术MapReduce国外课件.ppt
《Google云计算技术MapReduce国外课件.ppt》由会员分享,可在线阅读,更多相关《Google云计算技术MapReduce国外课件.ppt(48页珍藏版)》请在三一办公上搜索。
1、MapReduce:Simplified Data Processing on Large Clusters,Jeffrey Dean&Sanjay GhemawatOSDI04,“The density of transistors on a chip doubles every 18 months,for the same cost”(1965),The Free Lunch Is Almost Over!,The Future is Multi-core!,Web graphic Super ComputerJanet E.Ward,2000,Cluster of Desktops,Th
2、e Future is Multi-core!,Replace specialized powerful Super-Computers with large clusters of commodity hardwareBut Distributed programming is inherently complex.,Googles MapReduce Paradigm,Platform for reliable,scalable parallel computingAbstracts issues of distributed and parallel environment from p
3、rogrammer.Runs over Google File Systems,What is MapReduce?,A programming model and an associated implementation(library)for processing and generating large data sets(on large clusters).A new abstraction allowing us to express the simple computations we were trying to perform but hides the messy deta
4、ils of parallelization,fault-tolerance,data distribution and load balancing in a library.,References,Jeffrey Dean,Sanjay Ghemawat:MapReduce:Simplified Data Processing on Large Clusters.OSDI 2004:137-150Also:Interpreting the Data:Parallel Analysis with Sawzall.Rob Pike,Sean Dorward,Robert Griesemer,S
5、ean Quinlan.Google Labs.,Google File Systems(GFS),Highly scalable distributed file system for large data-intensive applications.Provides redundant storage of massive amounts of data on cheap and unreliable computersProvides a platform over which other systems like MapReduce,BigTable operate.,GFS Arc
6、hitecture,MapReduce:Insight,”Consider the problem of counting the number of occurrences of each word in a large collection of documents”How would you do it in parallel?,One possible solution,MapReduce Programming Model,Inspired from map and reduce operations commonly used in functional programming l
7、anguages like Lisp.Users implement interface of two primary methods:1.Map:(key1,val1)(key2,val2)2.Reduce:(key2,val2)val3Many real world tasks are expressible in this model.Assumption:data has no correlation,or it is small.,Big picture,Map operation,Map,a pure function,written by the user,takes an in
8、put key/value pair and produces a set of intermediate key/value pairs.e.g.(docid,doc-content)Draw an analogy to SQL,map can be visualized as group-by clause of an aggregate query.,Reduce operation,On completion of map phase,all the intermediate values for a given output key are combined together int
9、o a list and given to a reducer.Can be visualized as aggregate function(e.g.,average)that is computed over all the rows with the same group-by attribute.,Example,The problem of counting the number of occurrences of each word in a large collection of documents.,Pseudo-code,map(String input_key,String
10、 input_value):/input_key:document name/input_value:document contents for each word w in input_value:EmitIntermediate(w,1);reduce(String output_key,Iterator intermediate_values):/output_key:a word/output_values:a list of counts int result=0;for each v in intermediate_values:result+=ParseInt(v);Emit(A
11、sString(result);,More Examples,Distributed grep:Map:(key,whole doc/a line)(the matched line,key)Reduce:identity function,More Examples,Count of URL Access Frequency:Map:logs of web page requests(URL,1)Reduce:(URL,total count),More Examples,Reverse Web-Link Graph:Map:(source,target)(target,source)Red
12、uce:(target,list(source)(target,list(source),MapReduce:Execution overview,Architecture,Master Data StructureTask state:idle,in-progress,completedIdentity of worker machine:for in-progress tasksLocation of intermediate file regions of map tasks.Receive from map tasksPush to reduce tasks.,Execution ov
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- Google 计算 技术 MapReduce 国外 课件
data:image/s3,"s3://crabby-images/532e2/532e286daae5226c7e05977ec6ea05f0cc30b41d" alt="提示"
链接地址:https://www.31ppt.com/p-6506594.html