DSP的发展毕业论文外文翻译.doc
英文文献DSPs Back to the Future13W. Patrick HaysPublished online: 23 June 2004Springer Science + Business Media B.V. 2004AbstractOur complex world is characterized by representation, transmission, and storage of information-and information is mostly processed in digital form. With the advent of DSPs (digital signal processors), engineers are able to implement complex algorithms with relative ease. Today we find DSPs all around us-in cars, digital cameras, MP3 and DVD players, modems, and so forth. Their widespread use and deployment in complex systems has triggered a revolution in DSP architectures, which in turn has enabled engineers to implement algorithms of ever-increasing complexity. A DSP programmer today must be proficient in not only digital signal processing but also computer architecture and software engineering.Keywords DSP;Motor control systemThe Intel 2920 included on-chip (D/A) digital/analog and A/D (analog/digital) converters but lacked a hardware multiplier and soon faded from the market.An NEC project resulted in the NEC µPD7720one of the most successful DSPs of all time.The Bell Labs DSP-1 and NEC µPD7720 were announced at ISSCC80.DSP-1,achieved 5-MHz clock speed,executing 1.25-MB multiply-accumulates per second at four clock cycles eachenough to allow Touch-Tone receiver filters to execute in realtime.The once formidable performance demands of the Touch-Tone receiver are now ludicrously easy, but new applications in turn arose throughout the last 20 years to put new demands on DSP technology (see figure 1).According to Will Strauss,president and principal analyst at Forward Concepts, “DSP shipments were up a healthy 24 percent in 2003, and we are forecasting a bit higher growth for 2004, at 25 percent. Longer term,we forecast a 22.6 percent compound growth rate through 2007.”So the game has been: Boost DSP performance, run the algorithm atan acceptable cost, and open up a new commercial market.It is perhaps too glib to project this trend indefinitely into the future,In fact, savvy analysts have periodically predicted the demise of the DSP.Will performance requirements outstripthe ability of programmable DSP architectures to keep up, thus demanding a new approach?or if DSPs are to maintain their historical growth curve,what kinds of tools and architectures are needed? Ultimately,these questions will be answered by creative architects,mar-ket competition, and application demands. The goal of this article is to illuminate current and future trends by reviewing how technology and application pressures have shaped DSP architecture in the past. WHAT IS A DSP?At the outset,it is important to distinguish between digital signal processing and digital signal processors. The techniques and applications of digital signal process-ing, as compared with analog signal processing,are well established and are more important commercially than ever. Throughout this article,DSP refers to the VLSI (very large-scale integration) processor component. Therefore,what special demands in digital signal processing make a DSP different from another programmable processor?In other words,what makes a DSP a DSP?The Realtime Requirement. The essential application characteristic driving DSP architecture is the requirement to process realtime signals.Realtime means that the signal represents physical or “real” events. DSPs are designed to process realtime signals and must therefore be able to process the samples at the rate they are generated and arrive. Adding significant delay,or latency,to the output can be objectionable. While high realtime rates often demand that DSPs be “fast” ,fast and realtime are different concepts.For example,simulations of VLSI designs must be fastthe faster the better but the application doesnt fail if the simulator completes a little slower. Conversely,a realtime application need not be fastfor example,a hospital room heart monitor doesnt need to be fast (30-Hz sample rate) but does need to be realtime; it would be disastrous if the processing of a sample took so long that after a few hours,the monitor was displaying five-minute-old data. Not all digital signal processing applications require realtime processing.Many applications are performed offline. For instance, encoding high-fidelity audio for mastering CD-ROMs uses sophisticated digital signal pro-cessing algorithms, but the work isnt done in realtime.Consequently,a DSP isnt requiredany old processor fast enough for the engineer to get home for dinner will do. To summarize, the most important distinguish-ing characteristic of DSPs is that they process realtime signalsthe signals can be fast or slow, but they must be realtime.Programmability. Do DSPs need to be programmable? No:its quite feasible to process digital signals without a programmable architecture.In this article, however, DSP refers to programmable DSPmore specifically,to user-programmable DSPs, because my bias is that thats where the most interesting architectural issues lie. Often,the most demanding applications have required nonpro-grammable architectures. For instance,first-generation programmable DSPs could execute a single channel of the 32-Kbps ADPCM/DLQ,(adaptive differential pulse code modulation/dynamic locking quantizer) codec,whereas a special custom-integrated circuit that was not program-mable but deeply pipelined could run eight channels in the same technology.The reason for this is that programmability comes at a cost:Every single operation in a programmable chipno matter how simplerequires fetch-decode-execute. Thats a lot of silicon area and power devoted to,say,shifting left by two bits. Nonprogrammable architectures succeed when the shift-left-by-two-bits function is a small building block,allowing other building blocks to operate simultaneously. Its easy to imagine many building blocks working simultaneously to achieve a 10x performance advantage in nonprogrammable logic.The problem with specialized DSP hardware is that you have to develop a new chip for each application. As development costs increase, the break-even point is constantly shifting in favor of using a programmable architecture.More Power. Higher clock speed permits more instructions to be executed during a fixed time interval. In 1980,the Bell Labs team struggled to run DSP-1 at 5 MHz; today in 130-nm technology,clock speeds greater than 500 MHz can be attained. The advantage of more instructions in a fixed time period can be used to achieve one or more of the following.1.At a fixed data rate more complicated algorithms can be programmed.2.At a fixed data rate, more channels of the same algo-rithm can be programmed.3.At a higher data rate,algorithms of similar complexity can be programmed.An example of the first case is G.729A,a CELP (coded-excited linear predictor) speech codec. It allows good quality at low data rates.The algorithm requires about 30 times more computations per sample than G.711 PCM. Examples of number 2 are VoIP (voice over IP) applica-tions where four channels are supported for SoHo (small office/home office) products, and up to 256 or more channels for CO (central office) products. Channel den-sity is the key metric for VoIP processing. An example of the third case is the MPEG-2 video compression algorithm applied to decode DVDs at dif-ferent picture resolutions.The computational power is directly proportional to the video resolution.Stretching MPEG-2 from NTSC (National Television System Commit-tee) resolution to high definition requires not only a six-fold increase in processing power but new blue laser DVD technology for faster readout of the data from the disk.In addition, advancing VLSI permits the program-mable architecture to reduce power and/or cost for a fixed algorithm at a fixed data rate.Advancing technology conspires in many ways to move the boundary in favor of programmable DSPs. Applications that require highly spe-cialized design today become programs for inexpensive DSPs tomorrow;costly power-hungry DSPs today become the jelly beans of tomorrow. The past 25 years has seen the ascendancy of the user-programmable DSP as the dominant architectural approach to implementing digital signal processing applications.DSP architecture is driven by a number of specialized application characteristics. Lets look at a few of these before returning to the architectureal influence of the all-important realtime constrain. The DSP program must sustain processing at the realtime rate under all circumstances.and.in fact,the programmer must somehow know that this has been accomplished so that the application doesnt fail in the field. In other words the DSP program must deterministically allocate realtime. Sources of indeterminacy,common in desktop CPUs,can be catastrophic for the DSP programmer. For example.page faults and cache misses can cause hundreds of cycles to be missed by the CPU as it is idled while the operation is satisfied.If you must sample a value every microsecond, then the page fault or cache miss could cause the window to be missed.As a result, DSPs need either fixed memories or caches that can be locked after the program is booted.Other less critical examples of indeterminacy include branch prediction and data-dependent termination of functions such as “divide” . Although nice for the average case, the DSP program must also allow for the worst case.Deterministic allocation of realtime not only must be achieved, but tradition-ally DSPs have made it straightforward to achieve. In newer DSPs, realtime allocation is indeed know-able at compile time, but very careful profiling and iterative programming are often required to achieve the desired outcome. ILLUSTRATION: THE TI TMS320C54XXTo bring the discus-sion down to earth, lets illustrate with a real DSP. Targeted at the cellphone, TIs TMS320C54xx was introduced in 1994; in a sense, it is the fruition of TIs 16-bit DSP product line, which started with the introduction of the TMS32010 in 1983 and moved through the C1x, C2x, C2xx, and C5x generations to the C54xx. Although strict compat-ibility wasnt maintained, the follow-on architectures were close enough for TI to migrate its growing customer base with each new product generation much as Intel has done with the x86 family. Earlier TI DSPs sacrificed much performance to improve ease of use. Numerous other shortcomings such as the lack of accumulator guard bits were also rectified over the years. Table 1 shows how the TMS320C54xx has addressed each of the DSP features discussed here. For later comparison the TMS320C62xx is also listed. THE 16-BIT DSP RUNS OUT OF GASAnother form of “accumulation” other than the multiply-accumulate arithmetic operation was taking place in the TI DSP product line: by the mid-1990s, the TI architecture had grown to more than 130 instructions. New specialized instructions are one way of improv-ing performancethe way early DSPs used to meet cost goals. It became difficult to pack new instructions into the TMS320C54xxs burdened instruction set. Clock speed can be increased over time but does not take full advantage of advancing technology if the CISC instruc-tion growth continues. Somehow the DSP architecture needed to find a way to use the extra transistors of later-generation IC technology to increase performance. Deeper pipelining gives little benefit because the deeper pipeline must benefit all critical paths and CISC instruc-tions have many complex critical paths. An alternative strategy, VLIW (very long instruction word) parallelism, boosts performance by executing multiple instructions in parallel. VLIW is relatively ineffective on CISC instruction sets because its difficult to identify instructions that are commonly executed in parallel. It is also important to note that compilers have had little success with complex 16-bit DSP instruction sets.Yet as higher clock speeds and larger local memories per-mit larger programs, the demand for good DSP compilers becomes paramount. Consequently, the 16-bit DSP is out of gas: Its too complicated to scale performance with Moores law and too complicated to support good compi-lation While the C54xx was running 160 MHz using full custom 0.15µm circuit design, the StrongARM RISC broke 600 MHz in 0.18µm.Faced with this crisis, in 1997 TI introduced the all-new 32-bit VelociTI instruction set with its TMS320C62xx architecture. The TMS320C62xx has had enormous publicity as an eight-issue VLIW architecture (thus, the real instruction length is 8 x 32 or 256 bits, and it is pos-sible to execute eight 32-bit instructions in parallel on the chip). Less remarked, but equally important, is that each instruction is a relatively simple 32-bit RISC-like instruc-tion. In fact, its ironic that RISCs have included multiply-accumulate instructions since the mid-1990s, but TIthe company that has shipped more multiply-accumulates than any vendorchose to “out-RISC the RISCs” by requiring a multiply followed by an add instruction to implement the common DSP kernel.I call the new RISC-like DSP instruction sets,”RISC-DSP” . For an illustration of RISC-DSP. lets return to the FIR filter program. We saw that the instruction count of the inner loop is nine times better in the DSP case than for the conventional RISC.Keep in mind, though,that clock speeds today are 100 times that of the 1980s when DSPs started down the CISC path. As a result.the RISC can execute the FIR almost 10 times faster than a 1980s DSP but one-tenth the speed of an optimized DSP architecture with the same clock speed-assuming of course that the in 1980 In 1980 the dial needed to be turned all the way to “DSP” in order to meet the minimal performance goals;today the architect can choose different points on the spectrum with less performance. At todays clock speeds, RISC-DSP perfor-mance will be sufficient for many applications and have other advantages as well.Sources and destinations from a general-purpose register file are easily encoded in a 32-bit RISC-DSP instruction, making compilers more successful. Decoupling data loads from execution permits higher clock speed because data can be preloaded into a general-purpose register file. For each special feature, careful study of the potential number of instructions saved,critical path impact, interrupt overhead,and,of course, compila-tion is required.To summarize, 32-bit RISC-DSP instruction sets have moved DSPs onto the historic RISC technology learning curve.APPLICATIONS OF RISC + DSPWeve seen that the need for good tools and continued performance scaling have forced DSP architects to break with the complex 16-bit instruction sets of the past.RISC-DSP. However,really comes to fruition in applications combining both “RISC tasks” and”DSP tasks” . Applica-tions of this type are proliferating commensurate with DSP applications on packet networks. An important example is the 3G wireless handset of the near future,with video communicat