The-Design-and-Architecture-of-the-Microsoft-Cluster-Service微软的集群服务设计和结构-精选课件.ppt

资源ID：1289131 资源大小：684.68KB 全文页数：30页
资源格式： PPT 下载积分：20金币

快捷下载

会员登录下载

三方登录下载：

下载资源需要20金币

邮箱/手机：
温馨提示：	用户名和密码都是您填写的邮箱或者手机号，方便查询和重复下载（系统自动生成）
支付方式：
验证码：	换一换

加入VIP免费专享

账号：
密码：
验证码：	换一换
当日自动登录忘记密码？

友情提示

1、下载资料失败解决办法

2、PDF文件下载后，可能会被浏览器默认打开，此种情况可以点击浏览器菜单，保存网页到桌面，就可以正常下载了。

3、本站不支持迅雷下载，请使用电脑自带的IE浏览器，或者360浏览器、谷歌浏览器下载即可。

4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩，下载后原文更清晰。

5、试题试卷类文档，如果标题没有明确说明有答案则都视为没有答案，请知晓。

网站客服

侵权投诉

The-Design-and-Architecture-of-the-Microsoft-Cluster-Service微软的集群服务设计和结构-精选课件.ppt

The Design and Architecture of the Microsoft Cluster Service (MSCS)- W. Vogels et al.,ECE 845 PresentationBySandeep TamboliApril 18, 2000,1,The Design and Architecture of,Outline,PrerequisitesIntroductionDesign GoalsCluster AbstractionsCluster OperationCluster ArchitectureImplementation ExamplesSummary,2,OutlinePrerequisites2,Prerequisites,Availability = MTTF / (MTTF + MTTR)MTTF: Mean Time To FailureMTTR: Mean Time To RepairHigh Availability:Modern taxonomy of High Availability:A system having sufficient redundancy in components to mask certain defined faults, has High Availability (HA). IBM High Availability Services:The goals of high availability solutions are to minimize both the number of service interruptions and the time needed to recover when an outage does occur. High availability is not a specific technology nor a quantifiable attribute; it is a goal to be reached. This goal is different for each system and is based on the specific needs of the business the system supports. The presenter:May have degraded performance while a component is down,3,PrerequisitesAvailability = MT,MSCS(a.k.a. Wolfpack),Extension of Windows NT to improve availabilityFirst phase of implementationScalability limited up to 2 nodesMSCS features:Fail overMigrationAutomated restartDifferences with previous HA solutions:Simpler User InterfaceMore sophisticated modeling of applications Tighter integration with the OS (NT),4,MSCS(a.k.a. Wolfpack)Extension,MSCS(2),Shared nothing cluster model:Each node owns a subset of cluster resourcesOnly one node may own a resource at a timeOn failure, another node may take the resource ownership,5,MSCS(2)Shared nothing cluster,Design Goals,CommodityCommercial-off-the-shelf nodesWindows NT serverStandard Internet protocolsScalabilityTransparencyPresented as a single system to the clientsSystem management tools manage as if a single serverService and system execution information available in single cluster wide log,6,Design GoalsCommodity6,Design Goals(2),AvailabilityOn failure detectionRestart application on another nodeMigrate other resources ownershipRestart policy can specify availability requirements of the applicationHardware/software upgrades possible in phased manner,7,Design Goals(2)Availability7,Cluster Abstractions,Node: Runs an instance of Cluster ServiceDefined and activeResourceFunctionality offered at a nodePhysical: printerLogical: IP addressApplications implement logical resourcesExchange mail databaseSAP applicationsQuorum ResourcePersistent storage for Cluster Configuration DatabaseArbitration mechanism to control membershipPartition on a fault tolerant shared SCSI disk,8,Cluster AbstractionsNode: Runs,Cluster Abstractions(2),Resource DependenciesDependency trees: Sequence to bring resources onlineResource GroupsUnit of migrationVirtual serversApplication runs within virtual server environmentIllusion to applications, administrators, and clients of a single stable environmentClient connects using virtual server nameEnables many application instances to run on a same physical node,9,Cluster Abstractions(2)Resourc,Cluster Abstractions(3),Cluster Configuration DatabaseReplicated at each nodeAccessed through NT registryupdates applied using Global Update Protocol,10,Cluster Abstractions(3)Cluster,Cluster Membership Operation,11,Cluster Membership Operation11,Member Join,Sponsor broadcasts the identity of the joining nodeSponsor informs the joining node aboutCurrent membershipCluster configuration databaseJoining members heartbeats startSponsor waits for the first heartbeatSponsor signals the other nodes to consider the joining node a full memberAcknowledgement is sent to the joining nodeOn failure, Join operation abortedJoining node removed from the membership,12,Member JoinSponsor broadcasts,Member Regroup,Upon suspicion that an active node has failed, member regroup operation is executed to detect any membership changesReasons for suspicion: missing heartbeatspower failuresThe regroup algorithm moves each node through 6 stagesEach node sends periodic messages to all other nodes, indicating which stage it has finishedBarrier synchronization,13,Member RegroupUpon suspicion t,Regroup Algorithm,Activate: After a local clock tick, each node sends and collects status messagesNode advances if all responses collected or timeout occursClosing: It is determined if partitions exist and if current nodes partition should survivePruning: All nodes that are pruned for lack of connectivity, haltCleanup phase one: All the surviving nodesInstall new membershipMark the halted nodes as inactiveInform the cluster network manager to filter out halted nodes messagesMake event manager invoke local callback handlers announcing node failuresCleanup phase two: A second cleanup callback is invoked to allow a coordinated two-phase cleanupStabilized: The regroup has finished,14,Regroup AlgorithmActivate: 14,Partition Survival,A partition survives if any of the following is satisfied:n(new membership) 1/2 * n(original membership) Following three conditions satisfied togethern(new membership) = 1/2* n(Original membership)n(new membership) 2 tiebreaker node (new membership)Following three conditions satisfied togethern(original membership) = 2n(new membership) = 1quorum disk (new membership),15,Partition SurvivalA partition,Resource Management,Resource control DLL for each type of resourcePolymorphic design allows easy management of varied resource typesResource state transition diagram:,16,Resource ManagementResource co,Resource Migration: Pushing a group,Executed whenResource failure at the original nodeResource group prefers to execute at other nodeAdministrator moves the groupSteps involved:All resources taken to offline stateA new active host node selectedBrought online at the new node,17,Resource Migration: Pushing a,Resource Migration: Pulling a group,Executed whenThe original node failsSteps involvedA new active host node selectedBrought online at the new nodeNodes can determine the new owner hostswithout communicating with each otherwith the help of replicated cluster database,18,Resource Migration: Pulling a,Resource Migration: Fail-back,No automatic migration to preferred ownerConstrained by fail-back window:How long must the node be up and runningBlackout periodsFail-back deferred for cost or availability reasons,19,Resource Migration: Fail-backN,Cluster Architecture,20,Cluster Architecture20,Global Update Management,Atomic broadcast protocolIf one surviving member receives an update, all the surviving members eventually receive the updateLocker node has a central roleSteps in normal execution:A node wanting to start global update contacts the lockerWhen accepted by locker, the sender RPCs to each active node to install the update, in the order of node-ID starting with the node immediately after the lockerOnce global update is over, the sender sends the locker an unlock request to indicate successful termination,21,Global Update ManagementAtomic,Failure Conditions,If all the nodes that received update fail = update never occurredIf sender fails during the update operationLocker reconstructs the update and sends it to each active nodeNodes ignore the duplicate updateIf sender and locker both fail after sender installed the update at any node beyond the lockerThe next node in the update list is assigned as a new lockerThe new locker will complete the update,22,Failure ConditionsIf all the n,Support Components,Cluster Network: Extension to the basic OSHeartbeat managementCluster Disk Driver: Extension to the basic OSShared SCSI busCluster wide Event LoggingEvents sent via RPC to all other nodes (periodically)Time ServiceClock synchronization,23,Support ComponentsCluster Netw,Implementation Examples,MS SQL ServerA SQL Server resource group configured as Virtual Server2-node cluster can have 2 or more HA SQL ServersOracle serversOracle Parallel ServerShared disk modelUses MSCS to track cluster organization and membership notificationsOracle Fail-Safe serverEach instance of Fail-Safe database is a virtual serverUpon failure:The virtual server migrates to the other nodeThe clients reconnect under the same name and address,24,Implementation ExamplesMS SQL,Implementation Examples(2),SAP R/3Three-tier client/server systemNormal operation:One node hosts database virtual serverThe other provides application components combined in a serverUpon failure:The failed virtual server migrates to the surviving nodeThe application servers are failover awareMigration of the application server needs new login session,25,Implementation Examples(2)SAP,Scalability Issues:Join Latency, Regroup messages, GUP Latency, GUP throughput,26,Scalability Issues:Join Laten,Summary,A highly available 2-node cluster design using commodity componentsCluster is managed in 3 tiersCluster abstractionsCluster operationCluster Service components (interaction with OS)Design not scalable beyond about 16 nodes,27,SummaryA highly available 2-no,Relevant URLs,A Modern Taxonomy of High Availability interlog/resnick/HA.htmAn overview of Clustering in Windows NT Server 4.0, Enterprise Editionmicrosoft/ntserver/ntserverenterprise/exec/overview/clustering.aspScalability of MSCScs.cornell.edu/rdc/mscs/nt98/IBM High Availability Services as.ibm/asus/highavail2.htmlHigh-Availability Linux Projectlinux-ha.org/,28,Relevant URLsA Modern Taxonomy,Discussion Questions,Is clustering the only choice for HA systems?Why is MSCS in use today despite of its scalability concerns?Does performance suffer because of HA provisions? Why?Are geographical HA solutions needed (in order to take care of site disasters)? This is good for transaction oriented services. What about, say, scientific computing?Hierarchical clustering?,29,Discussion QuestionsIs cluster,Glossary,NetBIOS: Short for Network Basic Input Output System, an application programming interface (API) that augments the DOS BIOS by adding special functions for local-area networks (LANs). Almost all LANs for PCs are based on the NetBIOS. Some LAN manufacturers have even extended it, adding additional network capabilities. NetBIOS relies on a message format called Server Message Block (SMB). SMB: Short for Server Message Block, a message format used by DOS and Windows to share files, directories and devices. NetBIOS is based on the SMB format, and many network products use SMB. These SMB-based networks include Lan Manager, Windows for Workgroups, Windows NT, and LanServer. There are also a number of products that use SMB to enable file sharing among different operating system platforms. A product called Samba, for example, enables UNIX and Windows machines to share directories and files.,30,GlossaryNetBIOS: Short for Net,

注意事项

本文（The-Design-and-Architecture-of-the-Microsoft-Cluster-Service微软的集群服务设计和结构-精选课件.ppt）为本站会员（牧羊曲112）主动上传，三一办公仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若此文所含内容侵犯了您的版权或隐私，请立即通知三一办公（点击联系客服），我们立即给予删除！

温馨提示：如果因为网速或其他原因下载失败请重新下载，重复下载不扣分。