High Performance, Efficient, and Power-Aware Hardware Design for Video Processing Systems

Despite the remarkable advancement in processing speed of conventional computers, the processing power of these conventional computers do not satisfy the demand for high throughput rates of data-intensive computer vision applications. On the other hand, application specific integrated circuits (ASIC) can be designed to solve the processing speed concerns for any particular computer vision applications; however, these ASIC devices are not very flexible that they can be used to support a variety of computer vision algorithms. In this project, we attempt to design a massively parallel architecture that will combine the high processing speed of an ASIC device and the flexibility of a general purpose processor. The parallel architecture design can achieve much higher processing rates than conventional computers because they exploit inherent parallelism – the ability to carry out many operations simultaneously – in computer vision applications. The designed architecture is expected to process high-resolution images (typically 2 Mpixels) in real-time (at least 30 frames per second). Some of the applications that would be supported by the designed architectures are developed within the laboratory and those applications include image enhancement, skin extraction, face detection, face tracking and face recognition.

Design of high performance systolic-pipelined architecture for real-time enhancement of color video stream based on an Illuminance-Reflectance model

A high performance digital architecture for the implementation of a nonlinear image enhancement technique is designed an implemented in FPGA environment. The image enhancement is based on an illuminance-reflectance model which improves the visual quality of digital images and video captured under insufficient or non-uniform lighting conditions. The algorithm contains a large number of complex computations and thus it requires specialized hardware implementation for real-time applications. Systolic, pipelined and parallel design techniques are utilized effectively in a FPGA-based architecture design to achieve real-time performance. Estimation techniques are introduced in the hardware algorithmic design to achieve high throughput. The video enhancement system is implemented using Xilinx’s multimedia development board that contains a VirtexII-X2000 FPGA and it is capable of processing approximately 63 Mega-pixels (Mpixels) per second.

Overview block diagram of FPGA-based video enhancement system

Original images

Enhanced images by software means

Enhanced images by hardware architecture

Demonstrations

Demo of video enhancement system (uniform lighting)

Demo of video enhancement system (nonuniform lighting)

Demonstrations

Design and implementation of an efficient and power-aware architecture for skin segmentation in color video stream

An efficient design for the high performance, power-aware architecture to extract skin-like regions in the video stream is presented. Skin segmentation is an important step in many image processing and computer vision applications such as face detection and hand gesture recognition. The design utilizes the high correlation and similarity of neighboring pixels in video streams to reduce switching activity (hence reducing dynamic power dissipation) in the arithmetic unit. The proposed design is implemented and fitted in the Altera’s Cyclone II FPGA which is available in the DE2 development and educational board. The pipelined system is capable of performing the skin segmentation procedure in real-time with a processing rate of 654 frames per second for video frames with standard size of 640×480. It is observed that the proposed design helps to reduce operations and switching activities in the processing unit up to 42 percent which results in lower dynamic power dissipation with low hardware overhead.

Overview block diagram of FPGA-based video skin segmentation system

Decision boundary used in the skin classification procedure

Percentage of number of pixels that have 2, 3 or 4 insignificant bits (equal to 0) in 22 video sequences

Demonstrations

Hardware design capabilities

Low Power Design Approach

Neighborhood dependency considerations for low-power design
Switching activity control (logic level) in low-power digital design
Data dependency considerations in low power design of discrete cosine transform architecture
Data dependency considerations in low power design of 2D convolution architecture for video processing systems

Design Module

Neighborhood dependency considerations for low-power design
Switching activity control (logic level) in low-power digital design
Data dependency considerations in low power design of discrete cosine transform architecture
Data dependency considerations in low power design of 2D convolution architecture for video processing systems

Design Module

Design of log and anti-log computational modules
Quadrant symmetry design approach for 2-D convolution module
Design of an efficient multiplier-less architecture for multi-dimensional convolution
Hardware module for normalized cross-correlation computation
Multilane architecture for modular-PCA implementation
Design of custom logic for video interface modules
Design of CORDIC based trigonometric functional modules
Unidirectional CORDIC modules for asynchronous applications
VLSI architecture for pre-computation of rotation bits in unidirectional Flat-CORDIC
VLSI efficient discrete time cellular neural network processor

Application Specific Architectures

Systolic implementation of feedforward neural networks for pattern recognition
Parallel-pipelined design approach for nonlinear enhancement of color images
Pipelined architecture for distortion correction in wide-angle camera images
High performance architecture for multi-sensor image fusion
High storage capacity architecture for pattern recognition using an array of Hopfield neural networks
Hardware implementation of Fuzzy-ART based image compression
Vector processor based architecture for gradient and normal computation in real-time volume rendering
A parallel VLSI architecture for real-time segmentation of images with complex background environment
A generalized cellular neural network architecture for high storage capacity pattern recognition
A multilevel architecture for FPGA based implementation of feed-forward neural network for pattern recognition
A modular architecture for a recurrent neural network for character recognition
Systolic array implementation of block based Hopfield neural network for pattern association
System level design of real time face recognition architecture based on composite PCA
A fully pipelined architecture for barrel-distortion correction based on back mapping and linear interpolation
A flexible and efficient hardware architecture for real time face recognition based on eigenface approach
A real-time parallel system for video skin segmentation
A modular approach for a face detection system for real-time applications

Vision Lab

High Performance, Efficient, and Power-Aware Hardware Design for Video Processing Systems

Vision Lab

Dr. Khan Iftekharuddin