Seria ELECTRONICĂ și TELECOMUNICAȚII TRANSACTIONS on ELECTRONICS and COMMUNICATIONS

Tom 53(67), Fascicola 1, 2008

# **FPGA Implementation of Morphological Decomposition** Filters for Image Contrast Enhancement

Radu Oprean, Alin Brindusescu, Ioan Jivet<sup>1</sup>

Abstract - The paper presents the results of a prototype FPGA implementation of a morphological multiple dimensional kernel filter for images contrast enhancement. The main objective of the work was the optimization of the silicon area and frame throughput. Operation in real time was the second constraint of the target application. A mixed schematic and VHDL/Verilog description of the decomposition filters was synthesized. The performance of the architecture was found adequate for real time conditions of operation. An extension of the architecture with a soft microprocessor for contrast enhancement calculation is also presented.

Keywords: morphological image filtering, multidimensional kernel, FPGA implementation.

# I. INTRODUCTION

The demand for high computational volume and speed in image processing is a well known issue. Real time applications using image processing algorithms can not be implemented based on software. Hardware acceleration solutions implemented in FPGA are becoming increasingly popular [1] [2] [3].

The major difficulty of using morphological filters in real time image processing applications is the their inherent complexity and very intensive calculations.

The paper presents a FPGA implementation of a morphological multiple dimensional kernel filter for images contrast enhancement.

In part 2 of the paper the principle of image enhancement used for implementation is presented. The target application was images known for their high dynamic content. For this class of images direct human perception is far from optimal.

The architecture of the FPGA implementation of the morphological image decomposition is described in part 3 of the paper. A single scale multiple dimension structuring element (SE) variant was chosen for implementation.

Successive multiple open-closing operations using a single elementary (3x3) SE was also considered. The amount of hardware resources for this solution was found to be very large compared to multiple dimension SE solution. The storage of the

intermediate images for successive morphological filtering was found to be the cause.

Part 4 presents the conclusions and further work outline.

II.MORPHOLOGICAL DECOMPOSITION FILTER

Contrast enhancement methods are abundant in recent literature [5], [6], [8]. It is also well known that for each application specific enhancement methods need to be developed.

The implementation of morphological filters in the present paper closely follows the recent work that proposed a human vision system (HVS) centered approach to contrast enhancement [4].

Hardware implementations of image processing algorithms are directed at sustaining the complexity of the algorithm and the high amount of computation necessary [9], [10], [11]. The FPGA solution chosen performs the decomposition of the image in a pyramid of content details. The decomposed images constitute the basis for local contrast enhancement before final assembly of the processed image.

Based on human vision adaptive and sensitivity property the method as proposed uses decomposition followed by a local contrast enhancement.

An outline of the principle of morphological multiresolution decomposition is presented in Fig. 1.



f(x,y) Original image

OC B Morphological opening-closing

d<sub>k</sub>(x,y) Detail image at level k

Fig. 1. Morphological filtering principle.

<sup>&</sup>lt;sup>1</sup> Facultatea de Electronică și Telecomunicații, Departamentul

Comunicații Bd. V. Pârvan Nr. 2, 300223 Timișoara, e-mail ioan.jivet@etc.upt.ro

The definition for the *dilation* and *erosion* operators for a structuring element (SE) **B** (**j**,**k**) are:

 $\mathbf{D}(\mathbf{A},\mathbf{B})(\mathbf{r},\mathbf{s}) = max_{B(j,k)} \quad (\mathbf{A}(\mathbf{r}\cdot\mathbf{j},\mathbf{s}\cdot\mathbf{k}) + \mathbf{B}(\mathbf{j},\mathbf{k})) \quad (1)$ 

$$\mathbf{E}(\mathbf{A},\mathbf{B})(\mathbf{r},\mathbf{s}) = min_{B(j,k)} \quad (\mathbf{A}(\mathbf{r}-\mathbf{j},\mathbf{s}-\mathbf{k}) - \mathbf{B}(\mathbf{j},\mathbf{k})) \tag{2}$$

A succession of dilation and erosion operators define the morphological closing and opening operations as follows:

$$Close (Image, B) = E(D (Image, B), B)$$
(3)

#### Open (Image, B) = D(E (Image, B), B)(4)

The multi-resolution representation followed in the present work is based on a morphological single scale decomposition. The classes of objects are selected at each level using a *closure – opening (CO)* morphological operator pair. It can be shown that this succession of operators is acting similar to a low pass filter at the level of details (area) of image objects. The selective detail sub-image is obtained by subtracting the filtered image from the original.

The size of the matrix of the structuring element (SE) spans the range of dimension from 3 to 33 with the following values  $\{3,5,9,17,33\}$ . The structuring element (SE) at one level is the dilation of the previous level with itself.

Five levels of detail have been shown to be sufficient for contrast enhancement at a certain image intensity range. For the larger objects in the image corresponding to SE of dimension larger then 33 it has been proven there is no need for contrast enhancement. The detail area covers more than 1<sup>o</sup> angle in the user field of view and are subject to eye background adaptive property in perception [4].

At each level the output is a portion of details (*objects*) from the original image. The output after the fifth filter level is a image with a yet considerable amount of information named *no detail image*.



Fig. 2. Example of the target application – contract enhancement for wide dynamic range images.

According to the original work [4] the spatial frequencies selected in each level of detail can be determined as optimal for the HVS perception. The sensitivity parameter values have been determined following a contrast perception study

An alternative to using a single scale and multiple dimensions SE is the use of single elementary structuring element The levels of detail filtering can be obtained by multiple successive applications of elementary erosion and dilation operator. Such an architecture is attractive for hardware implementation since it promises simple implementations.

The storage of intermediary images in waiting for successive elementary filtering increases the memory requirements. Since storage is the major resource bottleneck to the solution it loses its simplicity advantage. In fact the real draw back of this architecture is also exponential increase in the computation resources/time required.

# III. VERILOG IMPLEMENTATION AND SYNTHESIS RESULTS

In applications for real time the computational complexity and frame rates are the two constrains that need to be satisfied simultaneously.

The solution adopted for the implementation of the decomposition algorithm is outlined in Fig. 3. Only one line of the image covered by the SE is presented since the rest are similar in architecture.

Problems inherent to most filter algorithm that need examination are the margin effect and the unnecessary pixel value repeated processing for two windows at just one step distance [11], [12].

The problem of the margin effect has a simple solutions for gray level min/max calculations. Simply the missing parts of the window are initialized at either max respectively min of the range and thus do interfere with actual data.

The avoidance of the redundancy of the calculations was solved by an orthogonal decomposition method of the SE in vertical and horizontal directions.



Fig. 3. Block diagram of the FPGA implementation of the morphological decomposition.

The design of the evaluation module for the Min/Max calculations as proposed was found as the most appropriate for the architecture of the FPGA implementation. It is based on sequential comparison operations for the new column covered by the SE and the previous columns computed min/max values accumulated in a horizontal line queue.

The calculation of the min/max image matrix covered by the SE principle is presented in Fig. 4. The actual circuit is composed of a pyramid of comparators each requiring just eight slices of FPGA CLB per pixel.

The function of the circuit has the following main phases:

## 1. Column min/max

The comparators are cascaded in a pyramidal compare and store min/max values with a final accumulator for the result. The sequence sorts pixel values and outputs a result at each of the detail window line length of 3, 5, 9, 17, 33 in parallel.

#### 2. Line min/max

The latest previous column comparisons results for the current image matrix covered by the SE are processed in a second phase of the process determining the resulting min/max of the whole SE.

#### 3. Shift phase

The data in storage for each image line intermediate values are shifted one step to accommodate a next complete cycle of computation.



Fig. 4. The principle of operation of the SE comparator bank

The two most important parts of level extraction module are the register bank and the FIFO.

The image storage FIFO for the implementation was designed using a VERILOG description.

The description of the interface of the storage window to the comparator module is presented as an illustrative example below. It interfaces the original image data for the shift phase, the multiple pixel new column to the min/max module and the result for column partial value shift.

The FIFO was implemented in both a simplified version written from scratch and core generated version. For exemplification the FIFO filling part of the code is presented in the following.

module FIFO (rst, wrclk, rdclk, we, oe, d in, d out, eff, fff);

//FIFO width parameter FW=8; parameter FL=16; //FIFO length rst, wrclk, rdclk, we, oe, [FW-1:0] d in; input output [FW-1:0] d\_out; output eff; output fff; [FW-1:0] d [FL-1:0]; reg //empty FIFO flag reg eff; fff; //full FIFO flag reg reg [4:0] bc; //counter reg [FW-1:0] d\_out\_reg; //output register integer i;

//write process

always @(posedge wrclk or negedge rst) begin

```
if (rst==0 && we==1)
  begin
        d[0] \le d in;
        for(i = 1; i < FL-1; i = i+1)
        begin
           d[i] \le d[i-1];
        end
        bc \leq bc + 1;
      end
      else
      if (rst==1)
      begin
        for(i = 1; i < FL-1; i = i+1)
        begin
           d[i] \le 0:
        end
        bc \leq 0:
        d out reg \leq 0;
      end
end
```

//of write process



Fig. 5. ISE Xilinx synthesis results for a part of the circuit.

Table I summarizes the estimated resource count of the implementation of the image decomposition module in a Virtex 4 FPGA for 256 x 256 image.

## Table 1 Synthesis and performance

| Solution    | CLB usage | Step duration |
|-------------|-----------|---------------|
| Multiple SE | 32%       | 15 clk cycles |
| Single SE   | 20%       | 120clk cycles |



Fig. 6. The DMA extension to the PicoBlaze extended architecture file.

Computation for an adjusted contrast scale for contrast enhancement is just as intensive a computational task as the min/max filtering. In order to satisfy this requirement an extension of the image decomposition architecture was used.

The computations for the original local contrast assessment and adjustment of the image gray level values for improved contrast used 8 bit gray level values. The partial detail images need appropriate scaling to a contrast enhanced scheme.

A simple and speed optimized 8 bit microprocessor the PicoBlaze soft microprocessor from Xiinx was used. In Fig. 6 it is presented a special port extension of the microprocessor. The intended behavior is to provide the PicoBlaze with a Direct Memory Access capability for partial detail image processing.

The port is link directly the sliding window registers to the internal soft environment of the microprocessor. Input and output ports of the standard PicoBlaze architecture can be used as well but these are address based and require more instructions in execution and address management overhead.

The PicoBlaze power resides in the almost 40 MIPS at a FPGA resource usage of only a few % of total available on the IC. An array of 32 processors can be assembled and liked to the filter window for parallel processing.

A immediate estimation shows that for a image of 256x256 at 8 bit and a frame rate of 50 Hz there is still about 10 instructions long calculations room for

each pixel. A pre-computed shift procedure can implement at gray level adjustment on the fly.

## V. CONCLUSIONS

The results of the implementation in FPGA of a morphological multiple dimensional kernel filter. The study was centered of image contrast enhancement for visualization.

The focus of the study was the optimization of the silicon area and sample frame throughput to meat the real time constraint of the image viewing.

Synthesis results of a mixed schematic and HDL description of the decomposition filters indicate a low enough resource count to fit in a FPGA.

The performance of the architecture is proven to exceed real time conditions of operation.

A further extension of the architecture with a soft microprocessor for contrast enhancement calculation is also presented.

The results obtained so far are encouraging as far as decomposition performance is concerned. The calculation of a new contrast for each detailed image and assembly of the enhanced image remains as tasks for future work.

#### REFERENCES

[1] I Jivet, B Dragaoi, FPGA Implementation of the Curve Generator Algorithm for H/W Acceleration Applications, WSEAS Transactions on Circuits and Systems, jan 2008, pp7-12

[2] R. Andraka, A survey of the CORDIC algorithmsfor the FPGA computers, Proceedings of the ACM/SIGDA sixth international symposium on FPGA, 3/1998, pp 191-200.

[3]T. Vladimirova, H. Tiggeler, FPGA Implementation of Sine and Cosine Generators Using the CORDIC Algorithm, Proceedings of 2006 MAPLD International Conference, Washington D.C., Sept.26-28, 2006.

[4] P Tschirner Human Visual System-Based Image Contrast Enhancements, PhD Thesis, Universitat Bremen, 2004

[5] J A Stark Adaptive Image Contrast Enhancement using Generalizations of Histogram Equalization, IEEE Transactions on Image Processing Vol 9, Issue 5, May 2000 pp.889 – 896

[6] Caselles, V.; Lisani, J.L.; Morel, J.I.; Sapiro, G.; Shape preserving local contrast enhancement Proc. Int. Conference on Image Processing Vol. 1, 26-29 Oct. 1997 pp.314 – 317

[7] H. D. Cheng, and Huijuan Xu, A novel fuzzy logic approach to contrast enhancement, Pattern Recognition, Vol. 33, Issue 5, May 2000, pp. 809-819

[8] Oakley, J. P.; Bu, H., Correction of Simple Contrast Loss in Color Images, I EEE Transactions on Image Processing Vol. 16, Issue 2, Feb. 2007 pp.511 – 522

[9] D. Coltuc, I. Pitas, Fast computation of a class of running filters, Signal Processing, IEEE Transactions

on Acoustics, Speech, and Signal Processing, Vol 46, Issue 3, Mar 1998 pp.549 – 553.

[10] M. Van Droogenbroeck and M. J. Buckley, Morphological Erosions and Openings: Fast Algorithms Based on Anchors Journal of Mathematical Imaging and Vision, Springer Netherlands, ISSN 0924-9907 (Print) 1573-7683 (Online) Issue Vol. 22, No. 2-3 / May, 2005, pp.121-142.

[11] M. van Herk, A fast algorithm for local minimum and maximum filters on rectangular and octogonal kernels, Pattern Recognition Letters, vol. 13, no. 7, July 1992, pp. 517-521.

[12] K. Sivakumar, M.J. Patel, et al. A Constant-time Algorithm for Erosions/Dilations with Applications to Morphological Texture Feature Computation, Real-Time Imaging 6,2000, pp.223-239