Page 35 - Fister jr., Iztok, Andrej Brodnik, Matjaž Krnc and Iztok Fister (eds.). StuCoSReC. Proceedings of the 2019 6th Student Computer Science Research Conference. Koper: University of Primorska Press, 2019
P. 35
parison of clustering optimization for classification
with PSO algorithms

Klemen Berkovicˇ Uroš Mlakar Borko Boškovic´

University of Maribor, Faculty University of Maribor, Faculty University of Maribor, Faculty
of Electrical Engineering and of Electrical Engineering and of Electrical Engineering and

Computer Science Computer Science Computer Science
Koroška cesta 46 Koroška cesta 46 Koroška cesta 46
Maribor, Slovenia Maribor, Slovenia Maribor, Slovenia

klemen.berkovic uros.mlakar@um.si borko.boskovic@um.si
@student.um.si

Iztok Fister Janez Brest

University of Maribor, Faculty University of Maribor, Faculty
of Electrical Engineering and of Electrical Engineering and

Computer Science Computer Science
Koroška cesta 46 Koroška cesta 46
Maribor, Slovenia Maribor, Slovenia

iztok.fister@um.si janez.brest@um.si

ABSTRACT 1. INTRODUCTION

In this paper, we compare Particle Swarm Optimization In many fields, including physics, bioinformatics, engineer-
(PSO) algorithms for classification based on clustering. Clus- ing and economics, we can encounter problems, where from
tering is presented within the proposed PSO algorithms as a plethora of solutions we are interested in finding the most
an optimization problem, and captured in a so called fitness suitable. These so called optimization problems (OPs) can
function. Clusters are presented with centers that should be sub-categorized into: discrete, continuous, and/or mixed-
represent the classes hidden in data. With the help of PSO variable problems [18]. The difference between three men-
algorithms and proper fitness function, we optimize centers tioned groups of OPs are the type of variables in their solu-
of the clusters. Because clustering belongs to a field of unsu- tion space. Discrete problems work with discrete variables
pervised learning methods, we redefine the clustering prob- that are elements of N+ set, continuous problems have vari-
lem to supervised learning with a new fitness function that ables with elements of R set, while the mixed-variable prob-
helps PSO algorithms in finding clusters centers that can lems can capture variables of both aforementioned sets.
classify instances of data to the right class. Two experiments
are performed in our study. In the former, various fitness Since we are usually limited by various computational re-
functions for classification based on clustering are tested. sources (e.g., computational power and space, problem’s
The latter measures the performance of PSO algorithms us- constraints), the complexity of OPs are increased and there-
ing the best fitness function from the first experiment. For fore we are forced to be satisfied with a ”good enough” so-
testing the PSO algorithms, we use three different datasets. lutions for the real-world application. Definition of a ”good
Friedman non-parametric statistical test, and Nemenyi and enough” depends exclusively on the type of a problem. Ma-
Wilcoxon post-hoc tests are conducted to properly compare jority of the OPs have a huge, yet finite solution spaces,
the PSO algorithms. which can be solved approximately using meta-heuristics.
In our work we are focused on meta-heuristics algorithms
Categories and Subject Descriptors that are inspired by nature and are a part of Swarm Intel-
ligence (SI) algorithms [4]. From the field of SI algorithms,
I.5.3 [Pattern recognition]: Clustering; G.1.6 [Numerical we use be using PSO algorithms.
analysis]: Optimization
In this work, we are focused on minimization of single-objective
Keywords OPs that can be defined mathematically, as follows:

clustering optimization, classification, particle swarm opti-
mization, statistical comparison

f x∗ ≤ min f (x) , (1)

x∈Ω

where f : RD → R represents the fitness function of the
problem to be solved, x is D dimensional vector consisting
of problem variables and x∗ is a global minimum or the best

solution of the problem. The fitness function f is realized as
a mathematical formula or a set of empirical data that refer

to the problem of interest. In order to make the problem

simpler, we demand that the solutions should be in some

StuCoSReC Proceedings of the 2019 6th Student Computer Science Research Conference DOI: https://doi.org/10.26493/978-961-7055-82-5.35-42 35
Koper, Slovenia, 10 October
   30   31   32   33   34   35   36   37   38   39   40