《高性能处理器》PPT课件.ppt
2023/5/21,USTC CS AN Hong,1,取指和取数都要访问同一个存储器Detection is easy in this case!(right half highlight means read,left half write),结构相关:由访存引起的结构相关,2023/5/21,USTC CS AN Hong,2,取指延迟一拍进行,结构相关的解决方案:阻塞,2023/5/21,USTC CS AN Hong,3,控制相关:Whats the Problem?,Need address here,Compute address here,Branch Delay,例:BEQ rs,rt,offset if Rrs=Rrt then PC-PC+offset,分支处理问题可划分为两个子问题决定分支的方向(分支条件相关)对需要跳转的分支,使执行延迟最小化-尽快获得转移的目标地址(分支地址相关),2023/5/21,USTC CS AN Hong,4,Stall:wait until decision is clearImpact:2 lost cycles(i.e.3 clock cycles per branch instruction)=slowMove decision to end of decodesave 1 cycle per branch,Control Hazard Solution#1:Stall,2023/5/21,USTC CS AN Hong,5,Predict:guess one direction then back up if wrongImpact:0 lost cycles per branch instruction if right,1 if wrong(right 50%of time)More dynamic scheme:history of 1 branch(90%),Control Hazard Solution#2:Predict,2023/5/21,USTC CS AN Hong,6,Delayed Branch:Redefine branch behavior(takes place after next instruction)Impact:0 clock cycles per branch instruction if can find instruction to put in“slot”(50%of time)As launch more instruction per clock cycle,less useful,Control Hazard Solution#3:Delayed Branch,2023/5/21,USTC CS AN Hong,7,Data Hazard on R1,Read After Write(RAW)InstrJ tries to read operand before InstrI writes itCaused by a“Dependence”(in compiler nomenclature).This hazard results from an actual need for communication.,2023/5/21,USTC CS AN Hong,8,add r1,r2,r3,sub r4,r1,r3,and r6,r1,r7,or r8,r1,r9,xor r10,r1,r11,Data Hazard on r1:Read after write hazard(RAW),2023/5/21,USTC CS AN Hong,9,Instr.Order,Time(clock cycles),add r1,r2,r3,sub r4,r1,r3,and r6,r1,r7,or r8,r1,r9,xor r10,r1,r11,IF,ID/RF,EX,MEM,WB,ALU,Im,Reg,Dm,Reg,Reg,Dm,Reg,Reg,Dm,Reg,Im,ALU,Reg,Dm,Reg,Data Hazard on r1:Read after write hazard(RAW),Dependencies backwards in time are hazards,2023/5/21,USTC CS AN Hong,10,Instr.Order,Time(clock cycles),add r1,r2,r3,sub r4,r1,r3,and r6,r1,r7,or r8,r1,r9,xor r10,r1,r11,IF,ID/RF,EX,MEM,WB,ALU,Im,Reg,Dm,Reg,Reg,Dm,Reg,Reg,Dm,Reg,Im,ALU,Reg,Dm,Reg,Data Hazard Solution:Forwarding,“Forward”result from one stage to another,2023/5/21,USTC CS AN Hong,11,Reg,Time(clock cycles),lw r1,0(r2),sub r4,r1,r3,IF,ID/RF,EX,MEM,WB,ALU,Im,Reg,Dm,Reg,Dm,Reg,Forwarding(or Bypassing):What about Loads?,Dependencies backwards in time are hazardsData Hazard Even with ForwardingCant solve with forwarding,Must delay/stall instruction dependent on loads,2023/5/21,USTC CS AN Hong,12,Reg,Time(clock cycles),lw r1,0(r2),sub r4,r1,r3,IF,ID/RF,EX,MEM,WB,ALU,Im,Reg,Dm,Stall,Forwarding(or Bypassing):What about Loads?,Dependencies backwards in time are hazardsData Hazard Even with ForwardingCant solve with forwarding,Must delay/stall instruction dependent on loads,2023/5/21,USTC CS AN Hong,13,Try producing fast code fora=b+c;d=e f;assuming a,b,c,d,e,and f in memory.Slow code:LW Rb,bLW Rc,cADD Ra,Rb,RcSW a,Ra LW Re,e LW Rf,fSUB Rd,Re,RfSWd,Rd,Software Scheduling to Avoid Load Hazards,Fast code:LW Rb,bLW Rc,cLW Re,e ADD Ra,Rb,RcLW Rf,fSW a,Ra SUB Rd,Re,RfSWd,Rd,Compiler optimizes for performance.Hardware checks for safety.,2023/5/21,USTC CS AN Hong,14,Data Hazard Solution(3):Out-of-Order Execution,Need to detect data dependences at run timeNeed of precise exceptions:Out-of-order execution,in-order completion,Time T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12sub$2,$1,$3 IF ID EX ME WBadd$14,$5,$4 IF ID EX ME WB sw$15,100($6)IF ID EX ME WB and$12,$2,$3 IF*ID EX ME WBor$13,$6,$2 IF ID EX ME WB,2023/5/21,USTC CS AN Hong,15,Data Hazard Solution(4):Data Speculation,In a wide-issue processors,e.g.8 12 instructions per clock cycleLarger than a basic block(5 7 instructions)Multiple branches use multiple-branch prediction(e.g.trace cache)Multiple data dependence chains very hard to execute them in the same clock cycleValue speculation is primarily used to resolve data dependences:In the same clock cycleLong latency operations(e.g.load operations),2023/5/21,USTC CS AN Hong,16,Data Hazard Solution(4):Data Speculation,Why is Speculation Useful?Speculation lets all these instruction run in parallel on a superscalar machine.addq$3$1$2addq$4$3$1addq$5$3$2What is Value Prediction?Predict the value of instructions before they are executedCp.Branch Prediction eliminates the control dependencesPrediction Data are just two values(taken or not taken)Value Predictioneliminates the data dependencesPrediction Data are taken from a much larger range of values,2023/5/21,USTC CS AN Hong,17,Data Hazard Solution(4):Data Speculation,Value Locality:likelihood of a previously-seen value recurring repeatedly within a storage locationObserved in any storage locations RegistersCache memoryMain memoryMost work focussing on value stored in registers to break potential data dependences:register value localityWhy Value Prediction?Results of many instructions can be accurately predicted before they are issued or executed.Dependent instructions are no longer bound by the serialization constraints imposed by data dependences.More parallelism can be explored.Prediction of values for dependant instructions can lead to beneficial speculative execution,2023/5/21,USTC CS AN Hong,18,冗余指令,若将程序执行期间生成的每条静态指令的动态实例进行缓存,则每条生成结果的动态指令可归为以下三种类型:新结果指令:首次生成新值的动态指令 5%重复结果指令:生成结果与对应静态指令的其它动态实例相同的动态指令 80%90%可推导型指令:生成结果能用先前的结果推导出来的动态指令 5%冗余指令重复型指令和可推导指令,2023/5/21,USTC CS AN Hong,19,Question:Where does value locality occur?,Single-cycle Arithmetic(i.e.addq$1$2)Single-cycle Logical(i.e bis$1$2)Multi-cycle Arithmetic(i.e.mulq$1$2)Register Move(i.e.cmov$1$2)Integer Load(i.e.ldq$1 8($2)Store with base register update FP Multiply FP Add FP MoveFP Load,Somewhat YesNoYesYesNoSomewhat Somewhat YesYes,How often does the same value result from the same instruction twice in a row?,Source of Value Locality(Sources of value predictability),2023/5/21,USTC CS AN Hong,20,Data redundancy:text files with white spaces,empty cells in spreadsheetsError checkingProgram constantsComputed branchesVirtual function callsGlue code:allow calling from one compilation unit to anotherAddressability:pointer tables store constant addresses loaded at runtimeCall contexts:caller-saved/callee saved registersMemory alias resolution:conservative assumptions from compiler regarding aliasingRegister spill code,Source of Value Locality(Sources of predictability),2023/5/21,USTC CS AN Hong,21,Three Generic Data Hazards,Write After Read(WAR)InstrJ writes operand before InstrI reads itCalled an“anti-dependence”by compiler writers.This results from reuse of the name“r1”.Cant happen in DLX 5 stage pipeline because:All instructions take 5 stages,and Reads are always in stage 2,and Writes are always in stage 5,2023/5/21,USTC CS AN Hong,22,Three Generic Data Hazards,Write After Write(WAW)InstrJ writes operand before InstrI writes it.Called an“output dependence”by compiler writersThis also results from the reuse of name“r1”.Cant happen in DLX 5 stage pipeline because:All instructions take 5 stages,and Writes are always in stage 5Will see WAR and WAW in more complicated pipes,2023/5/21,USTC CS AN Hong,23,总结:影响指令级并行性的因素,Pipeline CPI=Ideal pipeline CPI+Structural stalls+RAW stalls+WAR stalls+WAW stalls+Control stalls改进理想的CPI:多发射(静态/动态)克服流水线中的相关性结构相关:由资源冲突导致的相关解决办法:增加资源数据相关:由RAW、WAW、WAR导致的相关解决办法(用软件):编译器静态调度,循环展开,寄存器重命名,软流水(用硬件):forwarding技术,寄存器重命名,动态调度的乱序执行技术(记分板,Tomasulo算法)控制相关:由分支引起的相关解决方法:静态/动态预测和推测执行,2023/5/21,USTC CS AN Hong,24,总结:数据相关(又称数据依赖),在程序的一个基本块中存在的数据相关有以下几种情形:真数据依赖:两条指令之间存在数据流,有真正的数据依赖关系RAW(Read After Write)相关:对于指令i和j,如果(1)指令j使用指令i产生的结果,则称指令j与指令i为RAW相关;或者(2)指令j与指令i存在RAW相关,而指令k与指令j存在RAW相关,则称指令k与指令i为RAW相关伪数据依赖(又称名相关):指令使用的寄存器或存储器称为名。两条指令使用相同名,但它们之间不存在数据流,则它们之间是一种伪数据依赖关系,包括两种情形:WAR(Write After Read)相关:对于指令i和j,如果指令i先执行,指令j写的名是指令i读的名,则称指令j与指令i为WAR相关(又称反相关,anti-dependence)WAW(Write After Write)相关:对于指令i和j,如果指令i与指令j写相同的名,则称指令j与指令i为WAW相关(又称输出相关,output-dependence),2023/5/21,USTC CS AN Hong,25,总结:开发指令级并行性的技术,