云计算技术及应用课件.ppt
云计算技术及应用,大连理工大学计算机科学与技术学院2010年春季,基本情况,申彦明B810助教:齐恒B812Office hour:Fri 3:30-4:30 PMCourse website:http:/,教材内容,分布式系统的概况分布式与集群基本概念分布式数据库分布式文件系统GFS分布式编程MapReduce算法介绍搜索引擎与PageRank其它相关技术Data CenterBigTableAppEngine,Grading,HW:40%Final Project:60%Final project proposalProject reports12 teams,4-5 students,Syllabus(Subject to change),Week 2Mar 8:Lecture 1:Introduction Mar 10:Lecture 2:Map/Reduce Theory and Implementation,HadoopWeek 3Mar 15:Lecture 3&4:Guest Speaker(8:00 AM-11:35AM研教楼102)Mar 17:Lecture 5:Distributed File System and the Google File SystemWeek 4Mar 22:Lecture 6&7:Guest Speaker(8:00 AM-11:35AM研教楼102)Mar 24:Lecture 8:Distributed Graph Algorithms and PageRankWeek 5Mar 29:Lecture 9:Introduction to Some ProjectsMar 31:Lecture 10:Data Centers,Syllabus(Subject to change),Week 6Apr 5:Lecture 11:Some Google TechnologiesApr 7:Lecture 12:VirtualizationWeek 7Lecture 13&14:Project PresentationWeek 8:No class Week 9:Lecture 15&16:Project Presentation,Gartner Report,Top 10 Strategic Technology Areasfor 2009 VirtualizationCloud ComputingServers:Beyond BladesWeb-Oriented ArchitecturesEnterprise MashupsSpecialized SystemsSocial Software and Social NetworkingUnified CommunicationsBusiness IntelligenceGreen Information Technology,Top 10 Strategic Technology Areas for 2010Cloud Computing Advanced AnalyticsClient Computing IT for GreenReshaping the Data CenterSocial ComputingSecurity Activity Monitoring Flash MemoryVirtualization for AvailabilityMobile Applications,From Desktop/HPC/Grids to Internet Clouds in 30 Years,HPC moving from centralized supercomputers to geographically distributed desktops,clusters,and grids to clouds over last 30 yearsR/D efforts on HPC,clusters,Grids,P2P,and virtual machines has laid the foundation of cloud computing that has been greatly advocated since 2007Location of computing infrastructure in areas with lower costs in hardware,software,datasets,space,and power requirements moving from desktop computing to datacenter-based clouds,What is Cloud Computing?,1.Web-scale problems2.Large data centers3.Different models of computing4.Highly-interactive Web applications,1.“Web-Scale”Problems,Characteristics:Definitely data-intensiveMay also be processing intensiveExamples:Crawling,indexing,searching,mining the WebData warehousesSensor networks“Post-genomics”life sciences researchOther scientific data(physics,astronomy,etc.)Web 2.0 applications,How much data?,Google processes 20 PB a day(2008)“all words ever spoken by human beings”5 EBCERNs LHC will generate 10-15 PB a year,640K ought to be enough for anybody.,What to do with more data?,Answering factoid questionsPattern matching on the WebWorks amazingly wellLearning relationsStart with seed instancesSearch for patterns on the WebUsing patterns to find more instances,How do I make money?,Petabytes of valuable customer dataSitting idle in existing data warehousesOverflowing out of existing data warehousesSimply being thrown awaySource of data:OLTPUser behavior logsCall-center logsWeb crawls,public datasets Structured data(today)vs.unstructured data(tomorrow)How can an organization derive value from all this data?,2.Large Data Centers,Web-scale problems?Throw more machines at it!Centralization of resources in large data centersNecessary ingredients:fiber,juice,and landWhat do Oregon,Iceland,and abandoned mines have in common?Important Issues:EfficiencyRedundancyUtilizationSecurityManagement overhead,3.Different Computing Models,Utility computingWhy buy machines when you can rent cycles?Examples:Amazons EC2Platform as a Service(PaaS)Give me nice API and take care of the implementationExample:Google App EngineSoftware as a Service(SaaS)Just run it for me!Example:Gmail,“Why do it yourself if you can pay someone to do it for you?”,4.Web Applications,What is the nature of future software applications?From the desktop to the browserSaaS=Web-based applicationsExamples:Google Maps,FacebookHow do we deliver highly-interactive Web-based applications?AJAX(asynchronous JavaScript and XML)A hack on top of a mistake built on sand,all held together by duct tape and chewing gum?,Some Cloud Definitions,Ian Foster et al defined cloud computing as a large-scale distributed computing paradigm,that is driven by economics of scale,in which a pool of abstracted virtualized,dynamically-scalable,managed computing power,storage,platforms,and services are delivered on demand to external customers over the internet(云计算是一种商业计算模型。它将计算任务分布在大量计算机构成的资源池上,使各种应用系统能够根据需要获取计算力、存储空间和各种软件服务。)IBM experts consider clouds that can:Host a variety of different workloads,including batch-style backend interactive,user-facing applicationsAllow workloads to be deployed and scaled-out quickly through the rapid provisioning of virtual machines or physical machinesSupport redundant,self-recovering,highly scalable programming models that allow workloads to recover from HW/SW failuresMonitor resource use in real time to rebalance allocations on demand,Internet Cloud Goals,Sharing of peak-load capacity among a large pool of users,improving overall resource utilizationSeparation of infrastructure maintenance duties from domain-specific application developmentMajor cloud applications include upgraded web services,distributed data storage,raw supercomputing,and access to specialized Grid,P2P,data-mining,and content networking services,Three Aspects in Hardware that are New in Cloud Computing,The illusion of infinite computing resources available on demand,thereby eliminating the need for cloud users to plan far ahead for provisioningThe elimination of an up-front commitment by cloud users,thereby allowing companies to start small and increase hardware resources when neededThe ability to pay computing resources on a short-term basis as needed(e.g.,processors by the hour and storage by the day)and release them after done and thereby rewarding resource conservation,Some Innovative Cloud Services and Application Opportunities,Smart and pervasive cloud applications for individuals,homes,communities,companies,and governments,etc.Coordinated Calendar,Itinerary,job management,events,and consumer record management(CRM)servicesCoordinated word processing,on-line presentations,web-based desktops,sharing on-line documents,datasets,photos,video,and databases,etcDeploy conventional cluster,grid,P2P,social networking applications in cloud environments,more cost-effectivelyEarthbound Applications that Demand Elasticity and Parallelism rather data movement Costs,Operations in Cloud Computing,Users interact with the cloud to request serviceProvisioning tool carves out the systems from the cloud configuration or reconfiguration,or deprovision The servers can be either real or virtual machinesSupporting resources include distributed storage system,datacenters,security devices,etc.,Cloud Computing Instances,GoogleAmazonMicrosoft AzureIBM Blue Cloud,Google Cloud Infrastructure,Scheduler,Chubby,GFS master,Node,Node,Node,User,Application,Schedulerslave,GFSchunkserver,Linux,Node,MapReduceJob,BigTableServer,Google Cloud Infrastructure,Amazon Elastic Computing Cloud,SQS:Simple Queue ServiceEC2:Running Instance of Virtual MachinesEBS:Elastic Block Service,Providing the Block Interface,Storing Virtual Machine ImagesS3:Simple Storage Service,SOAP,Object InterfaceSimpleDB:Simplified Database,Microsoft Azure Platform,Developer,Monitoring,ApplicationServer,ProvisioningManager,User,Open Source Linux with Xen,Tivoli Monitoring Agent,IBM Blue Cloud,Cost Considerations:Power,Cooling,Physical Plant,and Operational Costs,Costtechnology costscost of securityetc.,Benefitsavailabilityopportunityconsolidationetc.,Cost Breakdown,+Storage($/MByte/year)+Computing($/CPU Cycles)+Networking($/bit),Research Challenges,Service availabilityS3 outage:authentication service overload leading to unavailabilityAppEngine partial outageprogramming errorGmail:site unavailable Solutions:The management of a Cloud Computing service by a single company results in a single point of failure(SPF).In the Internet,a large ISP uses multiple network providers so that failure by a single company will not take them off the air.Similarly,we need multiple Cloud Computing providers to support each other to eliminate SPF.,Research Challenges,Data SecurityCurrent cloud offerings are essentially public rather than private networks,exposing the system to more attacks such as DDoS attacks.Solutions:There are many well understood technologies such as encrypted storage,virtual local area networks,and network middle boxes.,Research Challenges,Data Transfer BottlenecksApplications continue to become more data-intensive.If we assume applications may be“pulled apart”across the boundaries of clouds,this may complicate data placement and transport.Both WAN bandwidth and intra-cloud networking technology are performance bottleneck.Industrial solutions:It is estimated that 2/3 of the cost of WAN bandwidth is consumed by high-end routers,whereas only 1/3 charged by fiber industry.We can lower the cost by using simpler routers built from commodity components with centralized control,but research is heading towards using high-end distributed routers.,Research Challenges,Software LicensingCurrent software licenses commonly restrict the computers on which the software can run.Users pay for the software and then pay an annual maintenance fee.Many cloud computing providers originally relied on open source software in part because the licensing model for commercial software is not a good match to Utility Computing.Some ideas:We can encourage sales forces of software companies to sell products into Cloud Computing.Or they can implement pay-per-use model to the software to adapt to a cloud environment.,Research Challenges,Scalable storageDifferences between common storage and cloud storageThe system is built from many inexpensive commodity components that often fail The system stores a modest number of large filesThe workloads primarily consist both large streaming reads and small random reads.The workloads many large,sequential writes that append data to files and once written,files are seldom modified again.The cloud storage(file)system needs to share many of the same goals as previous distributed file systems such as performance,scalability,reliability,and availability.In addition,its design needs to be driven by key observations of the specific workloads and technological environment,both current and anticipated,that reflect a marked departure from some earlier file system design assumptions.GFSFiles are divided into fixed-size chunks,Chunk size is one of the key design parameters.GFS chooses 64 MB,which is much larger than typical file system block sizes.The master stores three major types of metadata:the file and chunk namespaces,the mapping from files to chunks,and the locations of each chunks replicas.GFS supports the usual operations to create,delete,open,close,read,and write files.,Research Challenges,Transparent Programming ModelPrograms written for cloud implementation need to be automatically parallelized and executed on a large cluster of commodity machines.The run-time system should take care of the details of partitioning the input data,scheduling the programs execution across a set of machines,handling machine failures,and managing the required inter-machine communication.The programming model should allow programmers without many experiences with parallel and distributed systems to easily utilize the resources of a large distributed system.MapReduceScalable Data Processing on Large ClustersA web programming model implemented for fast processing and generating large datasets Applied mainly in web-scale search and cloud computing applications Users specify a map function to generate a set of intermediate key/value pairs Users use a reduce function to merge all intermediate values with the same intermediate key.,Research Challenges,Steve Ballmers View on the Future of Cloud,Cloud creates opportunities and responsibilitiesCloud learns and helps you learn,decide and take action Cloud enhances social and professional interactionsThe cloud wants smarter devicesCloud drives server advances that,in turn,drive the cloud,Cloud Computing Skepticism,CLOUD COMPUTING,Cloud computing is simply a buzzword used to repackage grid computing and utility computing,both of which have existed for decades.“Cloud computing is simply a buzzword used to repackage grid computing and utility computing,both of which have existed for decades.”,Definition of Cloud Computing,Larry Ellison,“The interesting thing about cloud computing is that weve redefined cloud computing to include everything that we already do.The computer industry is the only industry that is more fashion-driven than womens fashion.Maybe Im an idiot,but I have no idea what anyone is talking about.What is it?Its complete gibberish.Its insane.When is this idiocy going to stop?”,