基于FPGA的快速图像处理系统的毕业设计(中英文翻译).doc
-
资源ID:2325163
资源大小:91KB
全文页数:32页
- 资源格式: DOC
下载积分:8金币
友情提示
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。
5、试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
|
基于FPGA的快速图像处理系统的毕业设计(中英文翻译).doc
基于FPGA的快速图像处理系统的毕业设计(中英文翻译) 基于FPGA的快速图像处理系统的设计摘要我们评估改进硬件软件架构的性能目的是为了适应各种不同的图像处理任务这个系统架构采用基于现场可编程门阵列FPGA和主机电脑PC端安装Lab VIEW应用程序用于控制图像采集和工业相机的视频捕获通过USB20传输协议执行传输FPGA控制器是基于ALTERA的Cyclone II芯片其作用是作为一个系统级可编程芯片SOPC嵌入NIOSII内核该SOPC集成了CPU片内外部内存传输信道和图像数据处理系统采用标准的传输协议和通过软硬件逻辑来调整各种帧的大小与其他解决方案作比较对其一系列的应用进行讨论关键词软件硬件联合设计图像处理FPGA嵌入式1导言传统的硬件实现图像处理一般采用DSP或专用的集成电路ASIC然而随着对更高的速度和更低的成本的追求其解决方案转移到了现场可编程门阵列FPGA身上FPGA具有并行处理的特性以及更好的性能当一个程序需要实时处理如视频或电视信号的处理机械操纵时要求非常严格FPGA可以更好的去执行当需要严格的计算功能时如滤波运动估算二维离散余弦变换二维DCTs 和快速傅立叶变换 FFTs 时FPGA能够更好地优化在功能上FPGA更多的硬件乘法器更大的内存容量更高的系统集成度轻而易举地超越了传统的DSP以计算机为基础的成像技术的应用和基于FPGA的并行控制器这需要生成一个软硬件接口来进行高速传输本系统是一个典型的软硬件混合设计产品其中包括电脑主机中运行的LvbVIEW进行成像配备了摄像头和帧采集在另一端的Altera的FPGA开发板上运行图像滤波器和其他系统组件图像数据通过USB20进行高速传输各硬件部件和FPGA板的控制部分通过嵌入的NIOSII处理器进行关联并利用USB20作为沟通渠道2设计工具概述通过FPGA设计DSP系统往往采用高级别算法开发工具和硬件描述语言例如MATLAB它也可采用具有第三方知识产权的IP内核执行典型的DSP功能或高速通信协议在我们的应用中我们使用的模型设计工具例如Mathworks Simulink来建立DSP将其生成HDL代码后利用Quartus II与其他硬件设计文件综合SOPC-Builder作为一个工具驻留在Quartus环境中其作用是将NIOSII与外部逻辑硬件或标准外设融为一体SOPC-Builder提供了一个界面结构以互联NIOSII和外部存储器滤波器以及主机电脑3滤波器的模型和应用设计这个工作的主要目标就是评估主协处理器进行图像处理的性能包括嵌入式的NIOSII的性能以及电脑主机与FPGA板之间的USB20传输性能现有FPGA的性能可能会造成图像处理的局限性为了完成目标我们建立了一个典型的图像处理应用以针对FPGA协处理器包括一个噪声滤波器和一个边缘检测器降噪和边缘检测这两个基本过程运用到各种机器视觉中如目标识别医学成像下一代的汽车行进路线检测人员追踪控制系统等方面我们的噪声模型和边缘检测使用了Altera DSPBuilder Libraries in Simulink这方面有个例子可以从11找到利用高斯3 · 3 kernel降噪边缘检测利用典型的Prewitt或Sobel滤波器这些功能可用于合并一系列边缘检测后减少噪声图1为滤波器的设计框图图 1滤波器的设计框图除了噪声检测和边缘滤波还有中间处理逻辑关系的模块用于协调NIOS II数据和控制路径还有滤波模块工作时序这种中间的硬件结构定义为Avalon界面12这个接口不能在Simulink环境下仿真是相当于嵌入系统的Verilog文件Avalon执行由一个16位数据输入和输出的路径相应的读写控制信号和一个控制接口可以选择中间输出高斯滤波或边缘检测数据的输入输出在逻辑模块的帮助下存入FIFO寄存器每个接收到的图像帧存入外部SDRAM内存缓冲区并转换为适用于NIOSII操作的16位数据流的方式在第五和第六节将讨论NIOSII编码的问题传入的图像通过一个简单的二维数字有限脉冲响应卷积滤波器处理在3·3区域范围内相邻像素的灰阶强度产生缓冲的原理图如图2所示图 2我们假设图像大小为640480像素该缓冲电路以同样的方法来为滤波器提供缓冲空间如果改变帧的大小我们需要重新设计和编译延迟数量取决于块的大小延迟深度取决于每行有多少像素开发板上具有片外RAM因此不会消耗FPGA逻辑要素图3从左至右分别为原始图像高斯滤波图像边缘滤波图像图 34嵌入式系统设计协处理器执行上述所描述的做为组件的NIOSII处理器NIOSII处理器在这里的作用是处理数据流这种设计经常用于基础工业和学术项目一旦安装综合软件NIOSII将成为Quartus中的一个元件DSP-Builder将设计出来的模型转换成HDL编码以便适用于其他硬件组件通过综合软件滤波器可以很容易地集成到SOPC中并与NIOSII结合NIOSII软核与其他模块构成了一个完整的系统包括外部存储器控制器DMA通道以及一个定制的USB高速通信IP核VGA控制器可以将最终结果输出至屏幕诸如此类的功能可以通过获得开源的IP核来或是第三方公司提供的评估版IP核来实现USB20高速接口通过一块扩展板被添加到FPGA母板上做为系统级的解决方案通过Santa-Cruz周边设备连机器可以将扩展子板插入到任何的Altera母板上这个子板提供了一个基于PHY CY7C68000的USB20收发器一个符合UTMI规范的继承USB控制功能的NIOSII系统第8节我们将对IP核的实际性能进行评估图4为FPGA的流程图图6为FPGA开发板和图像采集部分图 4 FPGA设计流程图图 55NIOS软核设计NIOS配置完毕后将nios的代码下载利用C语言来写nios中的代码是有双重目的的a它控制硬件业务如硬件之间的DMA传输单元它还提供一个编程接口处理数据通道通过API命令如openreadwrite和close来控制b它允许系统进行简单的对输入信号进行软件处理而不是使用专用的硬件来处理例如nios指令代码可以用来转换图像阵列成为适合的一维数据流6Activity flow根据软件和硬件的活动其混合结构的功能可概括如下a图像流是从电脑主机经过usb20高速串行总线到达FPGA母板在下一个章节将会描述使数据通过usb输入输出的应用程序编程接口b内置的DMA数据总线将内存中的数据传送到nios中处理然后依靠Avalon传至硬件数字逻辑c通过硬件加速器来处理数据流d硬件逻辑对图像数据进行滤波后在通过DMA传送至存储器中e最终结果输出到VGA的数模转换通道上做为nios处理器的外围设备支持DMA传输方式然而做为VGA接口的数模转换芯片并不是实时执行所有数据的转换因此有一个比较可能的做法就是将数据通过usb返回至电脑主机再做进一步处理成为简单的图像数据需要指出的是这个设计不仅仅是为了做为黑盒子那样的专门应用这是代表了一种设计方法可广泛地定制应用7接口设计与应用基于PC的应用软件和部分视觉系统的的实施适用于各种工业应用这套系统包括了windowsXP操作系统奔腾4处理器usb20高速串行总线控制器和NI1408PCI图像采集卡主机的应用程序是基于LabVIEW虚拟仪器它用于控制图像采集并进行初步的图像处理图6为PC端LabVIEW控制界面图 6 LabVIEW控制界面图像采集卡最多可支持5个工业相机进行不同的任务我们的系统中应用CCD相机捕捉全帧大小为640480黑白画面但是最终采集后的是320240的这样可以生成更小的数据量易于持续传输LabVIEW主程序与USB之间的通信使用了API函数和动态链接库LabVIEW的优势在于其集成了一个图像处理平台能够进行快速的图像数据处理或预处理当FPGA板接收完一个完整的图像阵列后系统将图像送至滤波器经过滤波处理后将数据送至VGA控制器中的缓存模块8系统性能评估上面已经建立了一套图像捕获装置通过发送一些测试数据来测试USB对pc和FPGA实验板之间接收和发送性能经过测试我们发现主机和目标板之间的发送接收有效载荷为307200字节当nios的Hal驱动程序版本为12时接收速度达到65Mbitss传输速度达到80Mbps全速传输效率为9秒9与其他系统进行对比下面我们对比一下其他图像处理的解决方案以及性能和灵活性为此我们通过搭建其他解决方案并进行一系列实验来来获取对比数据我们设计了不同的滤波器来验证计算复杂性经过与结果相比在奔腾4处理器和512兆内存的计算机上结果如图7所示图 710结论本文提出了一个融合电脑主机和FPGA的设计方案并研究了基于此系统下的图像处理性能这也代表了一种设计方法可用于广泛的定制应用它是基于FPGA可编程器件并以内嵌nios处理器的形式执行Design and evaluation of a hardwaresoftware FPGA-based system for fast image processingJA Kalomiros a J Lygouras bAbstractWe evaluate the performance of a hardwaresoftware architecture designed to perform a wide range of fast image processing tasksThe system architecture is based on hardware featuring a Field Programmable Gate Array FPGA co-processor and a host computer ALabVIEW host application controlling a frame grabber and an industrial camera is used to capture and exchange video data with the hardware co-processor via a high speed USB20 channel implemented with a standard macrocell The FPGA accelerator is based on a Altera Cyclone II chip and is designed as a system-on-a-programmable-chip SOPC with the help of an embedded Nios II software processor The SOPC system integrates the CPU external and on chip memory the communication channel and typical image filters appropriate for the evaluation of the system performance Measured transfer rates over the communication channel and processing times for the implemented hardwaresoftware logic are presented for various frame sizes A comparison with other solutions is given and a range of applications is also discussedKeywords Hardwaresoftware co-design Image processing FPGA Embedded processor1 IntroductionThe traditional hardware implementation of image processing uses Digital Signal Processors DSPs or Application Specific Integrated Circuits ASICs However the growing need for faster and cost-effective systems triggers a shift to Field Programmable Gate Arrays FPGAs where the inherent parallelism results in better performance12 When an application requires real-time processinglike video or television signal processing or real-time trajectory generation of a robotic manipulator the specifications are very strict and are better met when implemented in hardware 35 Computationally demanding functions like convolution filters motion estimators two-dimensional Discrete Cosine Transforms 2D DCTs and Fast Fourier Transforms FFTs are better optimized when targeted on FPGAs 67 Features like embedded hardware multipliers increased number of memory blocks and system-on-a-chip integration enable video applications in FPGAs that can outperform conventional DSP designs28On the other hand solutions to a number of imaging problems are more flexible when implemented in software rather than in hardware especially when they are not computationall demanding or when they need to be executed sporadically in the overall process Moreover some hardware components are hard to be re-designed and transferred on a FPGA board from scratch when they are already a functional part of a computer-based system Such components are frame grabbers and multiple-camera systems already installed as part of an imaging application or other robotic control equipmentFollowing the above considerations we conclude that it is often needed to integrate components from an alreadyinstalled computer-based imaging application dedicated to some automation system with FPGA-based accelerators that exploit the low-level parallelism inherent in hardware structures Thus a critical need arises for an embedded softwarehardware interface that can allow for high-bandwidth communication between the host application and the hardware acceleratorsIn this paper we apply and evaluate the performance of an example mixed hardwaresoftware design that includes on the one side a host computer running a National Instruments NI LabVIEW imaging application equipped with a camera and a frame-grabber and on the other side a Altera FPGA board 9 running an image filter hardware accelerator and other system components The communication channel transferring image data from the host computer to the hardware board is a high-speed USB20 port by means of an embedded macrocell The various hardware parts and peripherals on the FPGA board are controlled and interconnected by a Nios-II embedded soft-processorAs a result of this evaluation one can explore the range of applications suitable for a hostco-processor architecture including an embedded Nios-II processor and utilizing an USB20 communication channelIn the following we first give a short account of the tools we used for system design We also present an overview of the particular image filtering application we embedded in the FPGA chip for the evaluation of the hostco-processor system architecture We describe the modular interconnection of different system parts and assess the performance of the system We examine the speed and frame-size limits of such a design when it is dedicated to image processingFinally we compare our mixed hostco-processorUSB-based design in terms of other architectures and other communications media2 Design tools overviewThe design of a DSP system with FPGAs often utilizes both high-level algorithm development tools and hardware description language HDL tools It can also make use of third-party intellectual property IP cores implementing typical DSP functions or high speed communication protocols1In our application we use model-based design tools like The Mathworks Simulink based on Mathworks MATLAB with the libraries of Alteras DSP-Builder The DSP-Builder uses model design to produce and synthesize HDL code which can then be integrated with other hardware design files within a synthesis tool like the Quartus II development environment In the present work we designed image filter components using DSP-Builder libraries and the resulting blocks were integrated with the rest of the system in Quartus System-On-a-Programmable-Chip SOPC BuilderSOPC-Builder design software resides as a tool in the Quartus environment Its purpose is to integrate an embedded software processor like Alteras Nios-II with hardware logic and custom orstandard peripherals within an overall system often called System-On-a-Programmable-Chip SOPC SOPC-Builder provides an interface fabric in order to interconnect the Nios-II processing path with embedded and external memory the filter co-processors other peripherals and the channels of communication with the host computerNios-II applications were written in ANSI C and were compiled and downloaded to the FPGA board by means of Alteras Nios II Integrated Development Environment IDE a tool dedicated to assemble code for Nios processorsThe purpose of Nios-II applications is to control processing and data streaming between the components of the system and its peripheralsOn the host side one may develop a control application by means of any suitable language like C We use Lab-VIEW software by National Instruments Corporation10which provides a very flexible platform for image acquisition image processing and industrial control3 Modeling and implementation of the filter designThe main target of this work is to evaluate the performance of a hostco-processor architecture including an embedded Nios-II processor and utilizing a communication channel between host and hardware board like a USB20 channel The task-logic performed by the embedded accelerator can be any image function within the limitations of existing FPGA devicesFor our purpose we built a typical image-processing application in order to target the FPGA co-processor It consists of a noise filter followed by an edge-detectorNoise reduction and edge detection are two elementary processes required for most machine vision applicationslike object recognition medical imaging lane detection in next-generation automotive technology people trackingcontrol systems etcWe model noise and edge filtering using the Altera DSPBuilder Libraries in Simulink An example of this procedure can be found in 11 Noise reduction is applied with a Gaussian 3 · 3 kernel while edge detection is designed using typical Prewitt or Sobel filters These functions can be applied combined in series to achieve edge detection after noise reduction The main block diagram of our filter accelerator is shown in Fig 1 Apart from noise and edge filter blocks there is also a block representing the intermediate logic between the Nio-II data and control paths and our filter task logic Such intermediate hardware fabric follows a specific protocol referred to as Avalon interface 12This interface cannot be modeled in the Simulink environment and is rather inserted in the system as a Verilog fileDesign examples implementing the Avalon protocol can be found in Altera reference designs and technical reports13 In brief our Avalon implementation consists of a 16-bit data-input and output path the appropriate Read and Write control signals and a control interface that allows for selection between the intermediate output from the Gauss filter or the output from the edge detector Data input and output to and from the task logic blocks is implemented with the help of Read and Write instances of a 4800 bytes FIFO registerEach image frame when received by the hardware board is loaded into an external SDRAM memory buffer and is converted into an appropriate 16-bit data stream by means of Nios-II instruction code Data transfer between external memory buffers and the Nios-II data bus is achieved through Direct Memory Access DMA operations controlled by appropriate instruction code for the Nios-II soft processor Nios-II code flow for this system is discussed in Sections 5 and 6 Fig 1 Incoming pixels are processed by means of a simple 2D digital Finite Impulse Response FIR filter convolution kernel working on the grayscale intensities of each pixels neighbors in a 3 · 3 region Image lines are buffered through delay-lines producing primitive 3 · 3 cells where the filter kernel applies The line-buffering principle is shown in Fig 2 A z1 delay block produces a neighboring pixel in the same scan line while a z640 delay block produces the neighboring pixel in the previous image scan lineWe assume image size of 640 · 480 pixels The line-buffer circuit is implemented in the same manner for both noise and edge filters Frame resolution is incorporated in the line-buffer diagram as a hardware built-in parameter If a change in frame size is required we need to re-design and re-compile The number of delay blocks depends on the size of the convolution kernel while delay line depth depends on the number of pixels in each line Each incoming pixel is at the center of the mask and the line buffers produce the neighboring pixels in adjacent rows and columnsDelay lines with considerable depth are implemented as dedicated RAM blocks in the FPGA chip and do not consume logical elements Fig 2 After line buffering pipelined adders and embedded multipliers calculate the convolution result for each central pixel Fig 3 shows the model-design for implementation of the 3 · 3 Gauss kernel calculations As is shown in Fig 3 model-based design transfers the necessary arithmetic into a parallel digital structure in a straightforward mannerLogic-consuming calculations like multiplications are implemented using dedicated multipliers available in medium-scale Altera FPGAs like the Cyclone II chip Fig 3 When the two filters work in combination the output of the Gaussian kernel is input to a 3 · 3 Sobe