The-Design-and-Architecture-of-the-Microsoft-Cluster-Service微软的集群服务设计和结构-精选课件.ppt
《The-Design-and-Architecture-of-the-Microsoft-Cluster-Service微软的集群服务设计和结构-精选课件.ppt》由会员分享,可在线阅读,更多相关《The-Design-and-Architecture-of-the-Microsoft-Cluster-Service微软的集群服务设计和结构-精选课件.ppt(30页珍藏版)》请在三一办公上搜索。
1、The Design and Architecture of the Microsoft Cluster Service (MSCS)- W. Vogels et al.,ECE 845 PresentationBySandeep TamboliApril 18, 2000,1,The Design and Architecture of,Outline,PrerequisitesIntroductionDesign GoalsCluster AbstractionsCluster OperationCluster ArchitectureImplementation ExamplesSumm
2、ary,2,OutlinePrerequisites2,Prerequisites,Availability = MTTF / (MTTF + MTTR)MTTF: Mean Time To FailureMTTR: Mean Time To RepairHigh Availability:Modern taxonomy of High Availability:A system having sufficient redundancy in components to mask certain defined faults, has High Availability (HA). IBM H
3、igh Availability Services:The goals of high availability solutions are to minimize both the number of service interruptions and the time needed to recover when an outage does occur. High availability is not a specific technology nor a quantifiable attribute; it is a goal to be reached. This goal is
4、different for each system and is based on the specific needs of the business the system supports. The presenter:May have degraded performance while a component is down,3,PrerequisitesAvailability = MT,MSCS(a.k.a. Wolfpack),Extension of Windows NT to improve availabilityFirst phase of implementationS
5、calability limited up to 2 nodesMSCS features:Fail overMigrationAutomated restartDifferences with previous HA solutions:Simpler User InterfaceMore sophisticated modeling of applications Tighter integration with the OS (NT),4,MSCS(a.k.a. Wolfpack)Extension,MSCS(2),Shared nothing cluster model:Each no
6、de owns a subset of cluster resourcesOnly one node may own a resource at a timeOn failure, another node may take the resource ownership,5,MSCS(2)Shared nothing cluster,Design Goals,CommodityCommercial-off-the-shelf nodesWindows NT serverStandard Internet protocolsScalabilityTransparencyPresented as
7、a single system to the clientsSystem management tools manage as if a single serverService and system execution information available in single cluster wide log,6,Design GoalsCommodity6,Design Goals(2),AvailabilityOn failure detectionRestart application on another nodeMigrate other resources ownershi
8、pRestart policy can specify availability requirements of the applicationHardware/software upgrades possible in phased manner,7,Design Goals(2)Availability7,Cluster Abstractions,Node: Runs an instance of Cluster ServiceDefined and activeResourceFunctionality offered at a nodePhysical: printerLogical:
9、 IP addressApplications implement logical resourcesExchange mail databaseSAP applicationsQuorum ResourcePersistent storage for Cluster Configuration DatabaseArbitration mechanism to control membershipPartition on a fault tolerant shared SCSI disk,8,Cluster AbstractionsNode: Runs,Cluster Abstractions
10、(2),Resource DependenciesDependency trees: Sequence to bring resources onlineResource GroupsUnit of migrationVirtual serversApplication runs within virtual server environmentIllusion to applications, administrators, and clients of a single stable environmentClient connects using virtual server nameE
11、nables many application instances to run on a same physical node,9,Cluster Abstractions(2)Resourc,Cluster Abstractions(3),Cluster Configuration DatabaseReplicated at each nodeAccessed through NT registryupdates applied using Global Update Protocol,10,Cluster Abstractions(3)Cluster,Cluster Membership
12、 Operation,11,Cluster Membership Operation11,Member Join,Sponsor broadcasts the identity of the joining nodeSponsor informs the joining node aboutCurrent membershipCluster configuration databaseJoining members heartbeats startSponsor waits for the first heartbeatSponsor signals the other nodes to co
13、nsider the joining node a full memberAcknowledgement is sent to the joining nodeOn failure, Join operation abortedJoining node removed from the membership,12,Member JoinSponsor broadcasts,Member Regroup,Upon suspicion that an active node has failed, member regroup operation is executed to detect any
14、 membership changesReasons for suspicion: missing heartbeatspower failuresThe regroup algorithm moves each node through 6 stagesEach node sends periodic messages to all other nodes, indicating which stage it has finishedBarrier synchronization,13,Member RegroupUpon suspicion t,Regroup Algorithm,Acti
15、vate: After a local clock tick, each node sends and collects status messagesNode advances if all responses collected or timeout occursClosing: It is determined if partitions exist and if current nodes partition should survivePruning: All nodes that are pruned for lack of connectivity, haltCleanup ph
16、ase one: All the surviving nodesInstall new membershipMark the halted nodes as inactiveInform the cluster network manager to filter out halted nodes messagesMake event manager invoke local callback handlers announcing node failuresCleanup phase two: A second cleanup callback is invoked to allow a co
17、ordinated two-phase cleanupStabilized: The regroup has finished,14,Regroup AlgorithmActivate: 14,Partition Survival,A partition survives if any of the following is satisfied:n(new membership) 1/2 * n(original membership) Following three conditions satisfied togethern(new membership) = 1/2* n(Origina
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- The Design and Architecture of Microsoft Cluster Service 微软 集群 服务 设计 结构 精选 课件
链接地址:https://www.31ppt.com/p-1289131.html