《高性能计算导论》双语教学的实践.ppt
高性能计算导论双语教学的实践,王小鸽计算机科学与技术系国家实验室公共平台部,2,提纲,总体情况介绍课件实例介绍经验与体会展望未来,3,总体情况介绍,高性能计算导论课程三个发展阶段:专业课英语教学(必修课,1998-2002)定位:取代专业英语特点:以高性能计算为主线组织教学活动,使用英语为主要目标。专业课英语教学(选修课,2003-2006)定位:讲授专业知识为目标,采用英语为手段。特点:专业知识与英语训练并重。专业课双语教学(选修课,2007-)定位:讲授专业知识为目标,采用双语为手段。特点:专业知识与英语训练并重。更关注了专业知识的教学效果。,4,总体情况介绍,课程的组织情况介绍教材:原版教材教材而非专著国内有影印版课时:32学时/学期形式:讲授+讨论+实验+作业+考试,5,总体情况介绍,双语要求:对教师预告授课:英语+适当的中文解释出题:英语 对学生预习课上发言:英语+适当的中文解释作业:鼓励用英语(有加分)考试开卷,允许带字典和笔记英文+少量的中文注释,6,课件实例介绍,课程简介讲义实例作业实例考试题及答卷实例,7,课件实例介绍,课程简介讲义实例作业实例考试题及答卷实例,8,Introduction to High Performance Computing,Xiaoge Wang,9,Course Syllabus,Text book:1 Ian Forster,“Designing and Building Parallel Programs”(人民邮电出版社,英文版)References:1 Ananth Grama,Anshul Gupta,George Karypis,Vipin Kumar,“Introduction to Parallel Computing”(机械工业出版社,中、英)2 Timothy G.Mattson,Beverly A.Sanders,Berna L.Massingill,“Patterns for Parallel Programming”(清华大学出版社,翻译版)3 Michael Quinn,“Parallel Programming in C with MPI and OpenMP”(清华大学出版社,影印版),10,Course Syllabus,Objectives:Answer the questions:What is HPC?Why HPC?How to do HPC?Learn some basic concepts,algorithms and tools of HPC.Improve English skill.Instruction,discussion,homework,presentation,11,HPC,concepts,tools,algo.,MPI,OpenMP,HPF,Linear algebraSearchSort,Task partition,SchedulingPerformance Model,12,Course Syllabus,Grade Policy:Homework 45%Classroom Performance 15%Final exam 40%No tolerance to cheatingOffice Hour(English corner):Tuesday.8-9pm,FIT Building,room 3-412,13,课件实例介绍,课程简介讲义实例作业实例考试题及答卷实例,14,Lesson One:Introduction,15,Introduction,What is HPC?Current development of HPCOverview of concepts,16,What is HPC?,DefinitionComponentsApplications,17,What is HPC?-Definitions,Definitions of High Performance Computing on the Web:A branch of computer science that concentrates on developing supercomputers and software to run on supercomputers.A main area of this discipline is developing parallel processing algorithms and software:programs that can be divided into little pieces so that each piece can be executed simultaneously by separate processors.The field of high performance computing(HPC)comprises computing applications on(parallel)supercomputers and computer clusters.Most ideas for the new wave of grid computing were originally borrowed from HPC.,18,What is HPC?-Components,Hardware:Supercomputer,Cluster,switch,networkSoftware:OS,Shared/distributed memory management,file systems,parallel programming toolsAlgorithm:Parallel/distributed algorithm design,19,What is HPC?-Applications,Modern science and engineeringGrand challenges:quantum chemistry,cosmology,astrophysics,CFD,material design,biology,genome sequencing,global weather and environmental modeling,Information TechnologyWeb services,data mining,search engine,information retrieval,20,Current Development of HPC,Trends in computer designTrends in networkingTrends in software design,21,Current Development of HPC-Trends in Computer Design,High Performance is still an important goal.Multicore technology is maturing Multiprocessor is still the main architecture.Multicomputer is becoming the foundation of the large scale Cyber-Infrastructure(CI).,22,23,Earth Simulator,Based on the NEC SX architecture,640 nodes,each node with 8 vector processors(8 Gflop/s peak per processor),2 ns cycle time,16GB shared memory.Total of 5120 total processors,40 TFlop/s peak,and 10 TB memory.It has a single stage crossbar(1800 miles of cable)83,000 copper cables,16 GB/s cross section bandwidth.700 TB disk space 1.6 PB mass store Area of computer=4 tennis courts,3 floors,24,25,BlueGene/L,Site:DOE/NNSA/LLNLSystem Model:eServer Blue Gene SolutionVendor:IBMApplication area:ResearchMain Memory:32768 GBInstallation Year:2005Operating System:CNK/SLES 9Interconnect:ProprietaryProcessor:PowerPC 440 700 MHz(2.8 GFlops),26,BlueGene/L,BlueGene/L boasts a peak speed of over 360 teraFLOPS,a total memory of 32 tebibytes,total power of 1.5 megawatts,and machine floor space of 2,500 square feet.The full system has 65,536 dual-processor compute nodes.Multiple communications networks enable extreme application scaling:Nodes are configured as a 32 x 32 x 64 3D torus;each node is connected in six different directions for nearest-neighbor communications A global reduction tree supports fast global operations such as global max/sum in a few microseconds over 65,536 nodes Multiple global barrier and interrupt networks allow fast synchronization of tasks across the entire machine within a few microseconds 1,024 gigabit-per-second links to a global parallel file system to support fast input/output to disk,27,BlueGene/L by IBM,28,HPC in China,According to statistic of HPC Top500(June,2007):,The top one installed in China is listed as the 43th of Top500(IBM).See also:for China Top 100 super-computer list.,29,HPC Facilities in Tsinghua,TH-Discovery 3Architecture:Cluster with 128 nodes,256CPU,1.3TFLOP/sec peak performance.Node:HP Server rx2600,4DB PC2100 DDR-SDRAM memory quad(4x1GB DIMMs);Storage:200TB Software:Redhat Linux As3.0 ia64,kernel 2.4.21 20.EL LSF Job Management System.MPI for parallel programming Mathematical libraries ChinaGrid Monitor,30,Current Development of HPC-Networking,High speed inter-connectionsProprietary(IBM,Cray)Commercial products:InfiniBand,Ethernet,Myrinet,QuadricsInternet Internet usage in China,31,Internet Users in China,Data source:中国互联网络发展状况统计报告(2007。1)annual report of Chinas internet development status,Comparing to other two sets of data(from CIA the world factbook):World:1,018,057,389/6,602,224,175=15.4%US:208,824,428/301,139,947=69.3%China:137,000,000/1,321,851,888=10.4%,32,Internet Users in China,Facts:The rate of increase in#users is slowed down.Internet users is only 10.4%of total population(8.5%previous year),33,Internet Machines,*Internet hosts:China(43rd):232,780(2006)vs.US(1st):195,139,000(2005),34,Internet Machines,Facts:Grow rate is increased slightly.The dial-in and special connection is decrease,while the broad band connection increase.,35,Bandwidth for International Links,Total bandwidth going international reached 256,696 Mbps,increase by 120,590 Mbps.The growth rate is at 88.6%.,36,Current Development of HPC Grid Applications,Image processing GridMedical image for diagnosisRemote sensing image processing and applicationDigital humanBio-Informatics GridResources(computation power)sharingOnline-Courses GridOnline courseware sharingOnline course broadcastComputational Fluid Dynamic GridSoftware simulation tools sharingInformation Processing GridDigital Museum,37,History of Computing in Tsinghua,First computer degree program:1956Establishment of the Department of Computer Science and Technology:1978Establishment of the Computing Center:1975Single user computer:DJS130,Imported mainframes:Honeywell,Fujitsu,IBM*PC Labs and Campus information systems.Establishment of Common Platform Division in TNLIST:2004TH-Discovery3(2005),38,Computer Systems Research,Computers made in Tsinghua:1959-1964:J-911,vacuum tube 1960:Analog computer1966:J-112,transistor:1972-1974:J-724,a real-time computer.1974:DJS-100 Series,integrated circuit 1987:THUDS,concurrent computer,transputer1993:RISC processor1998:Linux Cluster,Peak 32Gflops2003:TH-MANS,a massive storage networked system.2005:TH-Discovery3,25th in China Top100.,39,HPC Activities,Computational Science and Engineering Research(23 ongoing projects by Jan.2006)Recombination rate estimation and hotspot detection in the human genomeThe molecular evolution of microRNAsMicrostructures and Thermo-physical Properties of Alloy Melts and Their Effects on Solidification StructuresParallel Computing of Fast Multipole Boundary Element MethodInvestigation on the integral equation method for the numerical computation of electromagnetic fieldsDNS of multiphase flow with mesh-less methodEfficient sub-graph mining algorithm and its applicationsTheoretical study of the catalytic dissociation of hydrogen on Ni-Fe alloy surfacesInvestigation on unfolding dynamics of the smallest proteinPattern recognition and molecular validation on alternatively spliced genesComputation optimization of thermo-acoustic engine,40,Overview of Concepts,Parallel Machine Models Parallel Programming models Parallel Algorithm examples,41,Parallel Machine Models:,The requirements General:Allow the study of algorithm and programming language to be independent from the improvement of architecture.Simple:To facilitate understanding and programmingRealistic:To ensure that programs developed for the model execute with reasonable efficiency on real computer.,42,The Von Neumann Computer:,A central processing unit(CPU)A storage unit(memory)A control unitI/O unit,43,The Multiplicity-From Von Neumann Machine to Modern Parallel Machines,Multiple computers:Multiple CPU:Multiple function unitsMultiple instruction execution:Multiple levels of cacheMultiple,44,Flynns Taxonomy,Data stream,single,multiple,Instruction stream,single,multiple,45,Parallel Programming Models,46,Additional Properties of Parallel Software:,Concurrency:each node execute its own program.Scalability:the number of nodes could vary.Locality:the cost of accesses to local memory is less than the cost of accesses to remote memory.,47,Parallel Program Requirements:,A good parallel program has:Concurrency:Ability to perform many actions simultaneously.Locality:High ratio of local memory access to remote memory access.Scalability:Resilience to increasing processor counts.,48,Example,Bridge construction:A bridge is to be assembled from girders being constructed at a foundry.,(a),(b),foundry,bridge,foundry,bridge,girders,girders,request,49,A Parallel Programming Model:,Tasks and channels:One or more tasks which could execute concurrently.A task encapsulates a sequential program,local memory and interface to its environment(in-ports and out-ports).Four additional function of a task:send and receive messages,create new tasks and terminate.Channels:message queues connecting in-port/out-port pairs.The mapping(tasks to physical processors)does not affect the semantics of a program.,50,Other Models:,Message-Passing:similar to the tasks and channels model.Shared-memory:Data parallel:A+B,2*A,.Other models:PRAM,BPS,C3,logP,.,foundry,bridge,storage,51,Parallel Algorithm Examples,52,Scientific Computing:,Mathematical model of real world problems:PDE,ODE,etc.Numerical solution of mathematical problems:Discrete methods:finite difference,finite elements,etc.Solving linear equations:Direct method or iterative method.Implementation of numerical methods.,53,Finite Differences:,To solve the equation:f(x)=0;Use finite difference method as:f(x+h)=(f(x)f(x+h)/h+O(h2)f(x+h)=f(x)f(x+h)/h+O(h2)f(x)=f(x-h)f(x)f(x)-f(x+h)/h2+O(h2)=f(x-h)2f(x)+f(x+h)/h2+O(h2);Discretize:f(xi-1)2f(xi)+f(xi+1)=0,i=0,1,n-1;Use iterative method to solve the equations:f(xi)(t+1)=f(xi-1)(t)+2f(xi)(t)+f(xi+1)(t)/4,t=1,2,T;i=0,1,n-1.,54,Finite Differences:,A vector X is used to contain N points of f(x)on the problem domain:Create N tasks for each point.Each task is given initial value f(xi)(0)and compute f(xi)(t),t=1,2,TSends its data f(xi)(t)on its left and right outports.Receives f(xi-1)(t)f(xi+1)(t)from its left and right inports,andUses these values to compute f(xi)(t+1),2,1,2,4,5,6,7,8,3,1,3,4,5,6,7,8,55,Pair-wise interactions:,The computation of all N(N-1)pair-wise interactions I(Xi,Xj),ij,between N data,X0,X1,Xn-1.Parallel algorithm:Create N tasks Task i is given Xi and responsible for computing interactions:I(Xi,Xj),ijQ:How many communication channels are needed?,56,Pair-wise interactions:,Answer#1:N(N-1)channels.Task i sends Xi to its N-1 outports and receives Xj,ji from its N-1 inports.,0,1,2,3,4,5,6,7,57,Pair-wise interactions:,Answer#2:N channels.Each task sends the most recently received data to its outport.Repeat N-1 times.,0,1,2,3,4,5,6,7,58,Pair-wise interactions:,Answer#3(symmetry case):N+N channels.Each task sends the most recently received data the associated accumulator to its outport.Repeat(N-1)/2 times.,0,1,2,3,4,5,6,7,59,Search:,procedure search(A)begin if(solution(A)then score=eval(A)report solution and score else foreach child A(I)of A search(A(I)of A endfor endifend,60,Search:,A single task is created for the root of the tree.Create a new task for each search call.Create a channel for each new task to return to its parent any solutions located in its sub-tree.Q:Can the search be terminated completely when a solution is found?,61,Search:,62,Parameter study:,A rang of different input parameters are read from an input fileThe same computation is performed using different input value.The results of different computations are written to an output file.,y=f(x),x1,x2,x3,y1,y2,y3,63,Parameter study:,Case 1:The execution time per problem is constant and each processor has the same computation power.,y=f(x),y=f(x),x1,x2,x3,x4,x5,x6,x7,x8,y=f(x),y=f(x),y1,y2,y3,y4,y5,y6,y7,y8,64,Parameter study:,Case 2:The execution time per problem is not constant and/or each processor does not have the same computation power,I,O,W,W,W,W,x1,x2,x3,x4,y?,65,Parameter study:,Non-deterministicQ:In what order are the computed results written?Q:On which processor is y5=f(x5)computed?PrefetchingQ:A worker that has sent a request to the input task has to wait for the parameter to arrive.Could the worker keep working while waiting for the response from input task?,66,Summary,Overview of this courseOverview of HPC development Overview of conceptsMachine modelProgram modelAlgorithms,67,Thats all for today.Next class:Programming with MPIThanks Good Bye,68,课件实例介绍,课程简介讲义实例作业实例考试题及答卷实例,69,课件实例介绍,课程简介讲义实例作业实例考试题及答卷实例,70,体会,(1)“原汁原味”“公元五世纪的鸠摩罗什,是把佛经译为汉文的最大翻译家之一,他说,翻译工作恰如嚼饭喂人。一个人若不能自己嚼饭,就只好吃别人嚼过的饭。不过经过这么一嚼,饭的滋味、香味肯定比原来乏味多了。”“一种翻译,终究不过是一种解释。”引自冯友兰先生著中国哲学简史(2)“熟能生巧”外语能力是“逼”出来的;外语水平是“练”出来的;外语潜力是“挖”出来的。,71,展望未来,外部发展条件越来越好原版教材出版课件资源更加丰富英语的需求增加主管部门的鼓励政策人员自身素质越来越高英语水平的普遍提高,请指正。谢谢!,