双活数据中心解决方案课件.pptx
双活数据中心与灾备解决方案-技术部分,议程,2,基于虚拟化技术的业务连续性解决方案概览,本地站点,灾备站点,基于虚拟化层的异步复制基于硬件设备的同异步复制自动化应用切换管理城域集群,应用感知的高可用性关键应用零停机保护在线迁移虚拟机,动态调配计算与存储资源VMotion and Storage VMotion,高效的数据备份与恢复可通过运行计划与脚本实现自动化操作,灾难恢复,本地高可用,数据保护,方案特点 与应用程序和操作系统无关与硬件设备无关 完善的保护 简单,经济,3,议程,4,双活数据中心在各个级别上全面保障可用性,5,双活数据中心总体架构,双活存储集群,站点A,站点B,延伸的vSphere集群, 200 km,行为与单个vSphere相同延伸距离最大200KM,通常小于50KM通过VMware HA与vMotion实现自动的DR保护需要双活存储集群,如EMC的vPlex,NetApp的MetroCluster等,6,计算资源设计,Making an Application Service Highly Available,vSphere HAvSphere App HA,8,VMware vFabric tc Server,vSphere App HA,Policy-based,Protect off-the-shelf apps,9,Fault Tolerance vs. High Availability,Fault toleranceAbility to recover from component lossExample: Hard drive failureHigh availability,X,10,支持多vCPU的容错技术,Instantaneous Failover,4 vCPU,4 vCPU,vSphere,Primary,Secondary,Fast Checkpointing,11,长距离vMotion,vSphere 6.0支持跨三层网络和跨vCenter Server的vMotions,12,vCenter Availability,Run vCenter Server application in a VMRun vCenter Server database in a VMRun both in same VM?Protect with vSphere HAvCenter and DB VM restart priority set to HighEnable guest OS and App monitoringApp HA can protect SQL Server databaseBack up vCenter Server VM and databaseImage-level backup for vCenter Server VMApp-level backup using agent for database backup,13,网络资源设计,双活数据中心网络架构,15,NSX vSphere Multi-Site Use Cases,NSX for vSphere supports 3 different Multi-Site Deployment ModelsVXLAN with Stretched Clusters (vSphere Metro Storage Cluster)VXLAN with Separate ClustersL2 VPNAll solutions provide L2 extension over an L3 network, enabling workload & IP mobility without the need to stretch VLANsLocal egress is supported, however it does add complexityThe appropriate deployment model will depend on customer requirements and their environment,NSX利用层叠网络实现双活数据中心,双活存储vSphere城域存储集群,数据存储1,数据存储2,vCenterServer,三层网络,站点A,站点B,17,VMware NSX Multi-Site Single VC, Stretched Cluster,Solution DetailRequires a supported vSphere Metro Storage Cluster configurationIn a vMSC deployment, storage is Active/Active and spans both sites. Examples of Active/Active storage are: EMC VPLEX, NetApp Metro Cluster (see VMware HCL for more information)Stretched clusters support Live vMotion of workloadsUse L3 for all VMkernel networks: Management, vMotion, IP StorageAll management components such as vCenter Server, NSX Manager and Controllers are located in Site ALatency and bandwidth requirements are dictated by vMSC storage vendor, eg 10ms RTT for VPLEX which also aligns with vMotion using Enterprise PlusvMSC enables disaster avoidance and basic Disaster Recovery (without the orchestration or testing capabilities of SRM)Loss of either NSX Components or the Datacenter Interconnect will results in a fallback to data plane based learning using existing network state. Therefore there is no outage to data forwarding and without vCenter Server, there are no VM provisioning or migration operationsNSX and vMSC are complimentary technologies that fit a sweet spot for NSX (Single vCenter Server),VMware NSX Multi-Site Single VC, Stretched Cluster,Cluster ConfigurationvMSC enables stretched clusters across two physical sitesIn an NSX deployment Management, Edge and Workload clusters are all stretchedUnder normal conditions all Management Components run in a Site A and are protected by vSphere HAThey are automatically restarted at Site B in the event of a site outage. The management network is not stretched and must be enabled on Site B as part of the recovery run bookDependent on design, NSX Edge Services Gateways are either active in both sites or a single site and can also leverage HAVMs in the Workload Clusters are automatically recovered,19,VMware NSX Multi-Site Single VC, Stretched Cluster,In a vMSC environment, DRS is used to balance resource utilization, provide site affinity, improved availability and ensure optimal traffic flowUse Should rules, rather than Must as this allows vSphere HA to take precedenceExample DRS Groups, Rules and Settings for NSX Edges:,VMware NSX Multi-Site Single VC, Stretched Cluster,NSX Configuration (Option 1 - Preferred)Transport Zone spans both Sites and VXLAN Logical Switches provide L2 connectivity to VMsDistributed Logical Routing is used for all VMs to provide consistent default gateway vMACLocal Egress is provided by using separate Uplink LIFs and Edge GWs per site. Hosts on Site A have DLR default gateway configured via Site A Edge GW using net-vdr CLI. While Site B DLR default gateway is via Site B Edge GWCaveat: Dynamic Routing cannot be enabled on DLR, or a static route set via NSX ManagerNSX Edge Gateways will have a static route for any networks directly connected to DLR. Consistent IPaddressing will simplify routing by allowing a supernet to be usedDFW provides vNIC policy enforcement independent of the VMs location,VM1,VM2,VM3,Web Logical Switch172.16.10.0/24,Site A,Site B,Distributed Logical Router,VM4,VM5,App Logical Switch172.16.20.0/24,Site A NSX Edge GW192.168.10.1,Site B NSX Edge GW192.168.20.1,Uplink Net A 192.168.10.0/29Uplink A LIF 192.168.10.2,Uplink Net B 192.168.20.0/29Uplink B LIF 192.168.20.2,VM6,VM7,DB Logical Switch172.16.30.0/24,Internal LIFs .1,VMware NSX Multi-Site Single VC, Stretched Cluster,NSX Configuration (Option 2)As per Option 1 Transport Zone spans both Sites and VXLAN Logical Switches provide L2 connectivity for VMsNSX Edge Gateways are deployed per site with the same internal IP addressNSX DFW L2 Ethernet Rules are defined to block ARP to the remote GW using MAC Sets, which provides Local Egress as only the site local Edge GW is learnt. Future enhancement planned to enable ESXi host object for DFW*Caveats:Traffic flow between application tiers may be asymmetric if they are split across sites and DRS rules arent used Does not leverage Distributed Logical Routing and is limited to 10 vNICs per EdgevMotion will result in network interruption as VM ARP cache entry for site specific GW needs to time outCan be used if Option 1 isnt a fit (eg, require Dynamic Routing or vSphere 5.1 support),Site A,Site B,VM1,VM2,VM3,VM3,Logical Switch192.168.10.0/24,VMware NSX Multi-Site Single VC, Separate Clusters (2),Datastore 1,Datastore 2,vCenterServer,L3 Network,Site A,Site B,Storage vMotion Required for VM Mobility,23,VMware NSX Multi-Site Single VC, Separate Clusters,Solution DetailSeparate vSphere Clusters are used at each site, therefore DRS rules & groups are not requiredStorage is local to a siteEnhanced vMotion (simultaneous vMotion and svMotion) can provide live vMotion without shared storageUse L3 for all VMkernel networks: Management, vMotion, IP StorageAll management components such as vCenter Server, NSX Manager and Controllers are located in Site ASupported latency requirement for Enhanced vMotion is 100ms RTT(vSphere 6). vMotion requires 250 Mbps of bandwidth per concurrent vMotion This solution provides Disaster Avoidance where live vMotion is supported, by enabling workloads to be moved proactively between sitesDoes not provide automated Disaster Recovery,VMware NSX Multi-Site Single VC, Separate Clusters,Cluster ConfigurationClusters do not span beyond a physical siteAll Management Components run in Site A, and will not be automatically recovered in the event of a site outage. Storage replication to a standby Cluster in Site B and a manual recovery process could be implementedSeparate Edge and Workloads Clusters are used per siteNSX Edge Services Gateways are active in a single site, with HA is local to the siteWorkloads are active across both sites and can optionally support live vMotionDRS affinity rules for workloads are not required,25,VMware NSX Multi-Site Single VC, Separate Clusters,NSX ConfigurationOption 1 with Distributed Logical Routing is unchanged from Stretched Cluster configuration and is still recommendedFor option 2, as vCenter objects are not shared we can leverage NSX DFW L2 Ethernet Rules with a scope of the Datacenter to provide Local Egress. as only the site local Edge GW is learnt. No enhancements requiredSame caveats with Option 2 for Stretched Clusters also apply,Site A,Site B,VM1,VM2,VM3,VM3,Logical Switch192.168.10.0/24,To Local Egress/Ingress or not to.,As a first step, ask the customer if they have stateful services for traffic entering and exiting the Datacenter ? This is generally the case and if so they will require a solution to provide Local Ingress for their applications. Eg,NATGSLBAnycastLISP, RHI etcIf they can address this, then a Multi-Site NSX solution providing Local Egress is a good fitIf they do not, other questions to ask are: Do they have high bandwidth between sites ? and is reducing operational complexity a goal ?An active NSX Edge Gateway at one site, with failover to the secondary site may meet the customers requirements and is much simpler than providing Local Egress & Ingress,VMware NSX Multi-Site L2 VPN (3),Datastore 1,Datastore 2,vCenterServer,Site A orOn Prem,Site B orOff Prem,vCenterServer,SSL,SSL,28,存储资源设计,存储需求,Site A,Site B,Dark Fiber,=200 km,Metro Cluster,DWDM,DWDM,Aggr X Plex1,时延要求:vSphere要求RTT100ms存储同步复制要求RTT5ms,30,Metro Storage的两种实现方式:Uniform与Non-Uniform,31,vSphere Metro Storage Cluster工作原理,vSphere HA Cluster,Stretched across campus or metro area,vMSC Certified Storage,Metro Cluster,Array basedsynchronousreplication,Plex0,Plex0,32,vSphere Metro Storage Cluster工作原理,Standard vMotion of Virtual Machines,vMotion,vMSC Certified Storage,Metro Cluster,Array basedsynchronousreplication,Plex0,vSphere HA Cluster,Plex0,33,vSphere Metro Storage Cluster工作原理,vSphere HA Cluster,vMSC Certified Storage,Metro Cluster,Plex1,Plex0,Plex0,Site shutdownfor maintenance,34,vSphere Metro Storage Cluster工作原理,vSphere HA Cluster,vMSC Certified Storage,Metro Cluster,Plex0,Plex1,Plex0,Automaticresync,Maintenance performed, site restored,35,vSphere Metro Storage Cluster工作原理,vSphere HA Cluster,vMSC Certified Storage,NetApp MetroCluster,Plex0,Plex1,Plex0,36,存储设备选型,兼容性网站:http:/,六类Metro Cluster Storage1, iSCSI2, FC3, NFS4, iSCSI-SVD5, FC-SVD6, NFS-SVD,37,EMC VPLEX for Stretched Metro Clusters,Roadmap,Stretched vSphere Cluster,Site A (Active),Site B (Active),10ms, IP or FC,vCenter,Established VPLEX Active-Active SolutionInstant vMotion across distanceVMware HA automatically restarts VMs at either site for system or site failureBalance workloads across both sites with VMware DRSSupports VMware FT out of the boxAdditional flexibility of VPLEX MetroDoesnt Require FC Cross-ConnectChoose IP or FC Connectivity between sitesThird Site IP connectivity to Witness VMNo SPOF If you lose a Director, no loss of access at any site,VPLEX,VPLEX,Dual Site DRS,Dual Site HA,Instant vMotion,Site C (Optional Witness),VPLEX Distributed Virtual Volumes,38,Stretched Storage with IBM SAN Volume Controller,Single system image across two sites provides single pane of glass management for day-to-day storage management activitySimplify management of your environment at same time as deploying active-active storageBased upon a rich and mature platformProvide Real-time Compression, Easy Tier, Non-disruptive migrations, Long distance replication40,000 engines installed worldwide, 11 years field experience250+ storage devices supported to provide back-end capacityRetain your existing investment in storage devicesKeep flexibility for the futureActive quorum device enables automatic failoverNo external management softwarePrevents split-brainSupports recovery in case of full unplanned site failure scenarios,Quorum,Storage Pool 1,Storage Pool 2,Site 1,Site 1,Site 2,Site 2,Site 3,SVCStretchedCluster,39,来自存储厂商的参考指南,Implementing VMware vSphere Metro Storage Cluster with HP LeftHand Multi-Site storagehttp:/ Implementing vSphere Metro Storage Cluster using HP 3PAR Peer Persistencehttp:/ Deploy VMware vSphere Metro Storage Cluster on Hitachi Virtual Storage Platformhttp:/ IBM SAN and SVC Stretched Cluster and VMware Solution Implementationhttp:/ VMware vSphere 5.5 vMotion on EMC VPLEX Metrohttp:/,40,VSAN for Metro Cluster 2015Q3 (计划),Site A,Fault Domain A,Fault Domain B,Fault Domain C,Virtual SAN Cluster,Site C,SIte B,vmdk,witness,vmdk,vmdk,witness,vmdk,从机架感知升级到站点感知:1,迷你容错站点专用于witness2,优先从本地站点读取数据以提升性能,41,议程,42,RTO, RPO, and MTD,Recovery Time Objective (RTO)How long it should take to recoverRecovery Point Objective (RPO)Amount of data loss that can be incurredMaximum Tolerable Downtime (MTD)Downtime that can occur before significant loss is incurredExamples: Financial, reputation,43,The Three Building Blocks For Disaster Recovery,vSphere,Virtual SAN,Ecosystem,VDP Advanced,vSphere Replication,Site Recovery Manager,VMware,Array-based,Backup copies,External Storage,Storage,Compute,Backup and Recovery,Replication,DR Orchestration,44,异地(同城)灾备解决方案总体架构,45,异地(同城)灾备解决方案多种映射关系,主备式切换,双活切换,双向切换,双活数据中心,Production,Recovery,Production,Recovery,Production,Production,最常见的场景花销较大,灾备架构主要用于测试,开发和培训等非生产应用有效降低开销,两个站点均有生产应用每个站点为对方提供容灾支持,两个站点的应用可以跨站点自由移动计划内事件零停机限制在城域范围内,Site 1,Site 2,Production,46,网络资源设计,“Protected” Site,“Recovery” Site,Storage,Storage,VMFS/NFS,VMFS/NFS,Storage,VMFS/NFS,VMFS/NFS,Replication,SRM with NSX for vSphere,Firewall Rules & Security Groups,48,SRM with NSX for vSphere,What has been validatedSRM can map VMs from one VXLAN Logical Switch on the Primary Site to a different Logical Switch on the Recovery SiteThese Logical Switches can be connected to pre-created NSX Distributed Logical Routers or NSX Edge Services GWsPlaceholder VMs can be added to Security Groups and in a DR event, when these VMs become active they are protected by DFWDynamic Routing can be used to advertise networks on the primary site. Using metric/weight these networks can be re-advertised on the recovery site if there is a site failoverThis maps very closely to the vCAC deployment model for pre-created networks which is used for production workloads. Test/Dev workloads using on-demand networking do not typically require DRCurrently being testedAutomate synchronization of NSX Distributed Firewall Ruleset and Security Groups between two NSX ManagersTie into SRM, so at the time VMs are added to a Protection Group the placeholder VMs are automatically added to the appropriate Security GroupsWorking closely with EMC as part of their Enterprise Private Cloud Reference Architecture project toturn this into a productized solution including vCAC,Logical Architecture View,192.168.0.0/24,192.168.0.1,2.2.2.2,2.2.2.0/28,192.168.0.0/24,3.3.3.0/28,No Network Readdressing (Dynamic Routing),VXLAN,VXLAN,VLAN,VLAN,vCenter + SRM,vCenter + SRM,Distributed Logical Router,Dynamic Routing(OSPF, BGP),Primary VMs,Placeholder VMs,192.168.10.2,192.168.10.1,192.168.0.1,3.3.3.3,Distributed Logical Router,Dynamic Routing(OSPF, BGP),192.168.10.2,192.168.10.1,VMFS,VMFS,“Protected” Site,“Recovery” Site,50,Primary VMs,SRM with NSX for vSphere,192.168.0.0/24,2.2.2.0/28,192.168.0.0/24,3.3.3.0/28,No Network Readdressing (Dynamic Routing),VXLAN,VXLAN,VLAN,VLAN,vCenter + SRM,vCenter + SRM,Dynamic Routing(OSPF, BGP),Primary VMs,Placeholder VMs,192.168.0.1,3.3.3.3,Distributed Logical Router,Dyna