ParallelKmeansImageCompressor
|
This project implements a parallel KMeans-based image colors compressor, aimed at reducing the number of colors in a natural image while preserving its overall visual appearance. The program clusters similar colors using the KMeans algorithm and applies parallel computing techniques to compress the image through the color quantization technique. It supports sequential, OpenMP, MPI, and CUDA implementations to explore different levels of performance and scalability.
5 September 2024
Leonardo Ignazio Pagliochini Master's Degree student in High-Performance Computing Engineering at Politecnico di Milano
GitHub: leonardopagliochini
Email: leona.nosp@m.rdoi.nosp@m.gnazi.nosp@m.o.pa.nosp@m.glioc.nosp@m.hini.nosp@m.@mail.nosp@m..pol.nosp@m.imi.i.nosp@m.t
Francesco Rosnati Master's Degree student in High-Performance Computing Engineering at Politecnico di Milano
GitHub: RosNaviGator
Email: franc.nosp@m.esco.nosp@m..rosn.nosp@m.ati@.nosp@m.mail..nosp@m.poli.nosp@m.mi.it
This project was developed for the course Advanced Methods for Scientific Computing,
Professor: Luca Formaggia
Assistant Professor: Matteo Caldana
Politecnico di Milano
The documentation of the project can be found here.
A comprehensive library for computer vision and image processing tasks. You can refer to the official page to download.
A C compiler wrapper for parallel programming with the MPI library. It is advised to use openmpi
, official page, as we experienced some bugs with mpich
due to experimental version of g++
.
A C++ API for parallel programming on shared-memory systems.
Boost is a versatile, cross-platform, and comprehensive collection of highly optimized, portable, reliable, and robust C++ libraries designed to enhance software development efficiency and extensibility.
Standard coloning with git clone
, no submodules are implemented in this repo.
Program can be built with or without CUDA, you obviously need nvcc
to be able to compile with CUDA. To compile the project, navigate to the project root directory in your terminal and run the following commands.
Program uses a version of g++
with very recent standards, that were not supported by mkmodules
, it is important to unload modules, including gcc-glibc
in order to succesfully compile.
For a standard run program will guide you to choose an image path, parallel method, configuration settings. See section "What to expect" for more infromation.
If you want to avoid having to input all the information through the prompts requested by the program, you can preconfigure the options in the .config
file.
Once the program is started, the following screen appears, through which it is possible to compress a new image or decompress an already compressed image.
If you choose the "Compress an image" option you will be asked to select:
nvcc
was used during the compiling process), the type of compression, and the path of the original image..config
file).After prompting all required settings the menu
executable will exploit boost/process
to launch a specific process (executable) relative to the chosen method, using the given settings as arguments.
The menu will launch the decoder
process (again with boost/process
), there you will be asked to choose which .kc
('kmeans-compressed') to decode and visualize. Program creates list of .kc
available to decode from the output
folder.
The project is organized as follows:
benchmarkImages
: Images used for benchmarking the program. It can be used to test the program's performance.outputs
: Contains the compressed images. After installing the program, you may notice that the outputs folder is not present. However, don't worry! It will be automatically created during the first execution of the program.include
: Header files of the project. These define the classes and functions that are used in the program.src
: Contains the source files of the project. These files contain the implementation of the classes and functions defined in the header files.build
: Object files generated during the compilation process.dependencyInstaller
: This folder contains the two scripts that can be used to install the required libraries.performanceEvaluation
: Python codes used to evaluate performance and scalability of the four type of executions.menu
: This is the executable file generated after compiling the project. It is the main program that can be executed to compress or decompress images.Makefile
: This file contains the instructions for compiling the project. It specifies the dependencies and the commands to compile the project..config
: This file contains the configuration of the program. It is used to store some hyperparameters that can be modified to change the behavior of the program.Doxyfile
: Is a configuration file used by Doxygen to customize the generation of documentation from annotated source code.Readme.md
: You already know that buddy ;)KMeans is a widely used clustering technique that partitions data into a given number K of clusters. In the context of image compression KMeans is employed to reduce the color palette by grouping similar colors (color quantization), possibly minimizing the data required to represent the image.
The algorithm begins with an initialization phase, where $k$ initial cluster centers (means) $\mu_1^0, \mu_2^0, \ldots, \mu_k^0$ are chosen. After initialization, the algorithm enters an iterative phase, often referred to as Lloyd’s algorithm, which repeatedly executes two main steps until convergence is reached.
It is important to note that K-means-based compression is a form of lossy compression. Unlike lossless compression techniques, where the original data can be perfectly reconstructed,lossy compression involves some level of data loss. In the context of K-means, each pixel's color is approximated by the nearest centroid among the $k$ chosen colors. This approximation inevitably leads to a loss of some color information, making the compression irreversible. The degree of perceptible data loss is often minimal when the number of clusters $k$ is adequately chosen, but it can become noticeable if $k$ is too low, resulting in a more significant approximation error
The program uses several parallelization techniques to enhance performance. These techniques include:
That is a really good question... as everithing in computer science it depends. Here you can see an overview of the execution time behaviour for increasing complexity tasks:
For any additional information on the methods and implementation, please refer to the report
For a thorough and detailed explanation of the code, please refer to the code manual (doxygen) in html or in pdf