# Classification Parameters & Algorithms

These tools are used to select an algorithm and maps to be used for classification.

## Chemical system: how to add/remove maps?

The goal is to set the map input of the classification function. It is not required to use all available maps for classification; maps containing only noise are usually excluded.

list of map used by the classification function is displayed in this text field.

button Add Maps for Classification adds all maps available in the intensity tab of the primary menu.

button Edit Selected Map can have to mode: add (plus icon) or eliminate (minus icon) depending if the map selected in the primary menu is already available in the list or not. Clicking “plus” adds the selected map, whereas clicking “minus” eliminates the selected map from the list.

## Algorithm selection:

Machine learning algorithm used for classification can be selected via the algorithm menu available in section classification parameters.

Classification algorithms will be described in a publication soon (Lanari & Tedeschi in prep); a short summary is provided below.

• Random Forest: An ensemble learning method for classification constructing a multitude of decision trees during training. The output of the random forest is the class selected by most trees (it is a majority vote!).
• Discriminant Analysis: Classification method that assumes that different classes generate data based on different Gaussian distributions. To train a classifier, it estimates the parameters of a Gaussian distribution for each class.
• Naive Bayes: Classification algorithm applying density estimation to the data and generating a probability model. Then decision rule is based on the Bayes theorem.
• Support Vector Machine: Data points (p-dimensional vector) are separated into n classes by separating them with a (p-1)-dimensional hyperplane. The algorithm chooses the hyperplane so that the distance from it to the nearest data point on each side is maximised.
• Classification Tree: A decision tree is used as predictive model to classify the input features into classes via a series of decision nodes. Each leaf of the tree is labelled with a class.
• k-Nearest Neighbour: An object is classified by a plurality vote of its neighbours, with the object being assigned to the class most common among its k nearest neighbours.
• k-Means: Classification method that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest cluster centroid, serving as a prototype of the cluster.

## Principal component analysis (PCA) – Optional

The principal components of a collection of points in a real coordinate space are a sequence of vectors consisting of best-fitting lines, each of them defined as one that minimizes the average squared distance from the points to the line. These directions constitute an orthonormal basis in which different individual dimensions of the data are linearly uncorrelated. The first principal component can equivalently be defined as a direction that maximizes the variance of the projected data. Principal component analysis (PCA) is the process of computing the principal components and using them to perform a change of basis on the data.

Button Generate Maps of the Principal Components (PCA) a map for each principal component is generated and stored in the section Other of the primary menu.

If the tick-box incl. PCA is selected, the maps of principal components are included as additional dimensions for the classification. Example: if 8 intensity maps are considered, an total of 14 maps of PC are added to the classification input, 7 for a normal PCA and 7 for a normalised PCA.