Performance Comparison of Multi-Class Classification Algorithms

7 min readDec 28, 2020

This article comprises the application and comparison of supervised multi-class classification algorithms to a dataset, which involves the chemical compositions (features) and types (four major types — target) of stainless steels. The dataset is quite small in numbers, but very accurate.

Stainless steel alloy datasets are commonly limited in size, thus restraining applications of Machine Learning (ML) techniques for classification. I explored the potential of 6 different classification algorithms, in the context of a small dataset of 62 samples, for outcome prediction in type classification.

In this article, multi-class classification was analyzed using various algorithms, with the target of classifying the stainless steels using their chemical compositions (15 elements). There are four basic types are of stainless steels and some alloys have very close compositions. Hyperparameter tuning by Grid Search was also applied for Random Forest and XGBoost algorithms in order to observe possible improvements on the metrics. Finally, the performances of the algorithms were compared. After the application of these algorithms, the successes of the models were evaluated with appropriate performance metrics.

The dataset was prepared using “High-Temperature Property Data: Ferrous Alloys”.

Wikipedia’s definition for multi-class classification is: “In machine learning, multiclass or multinomial classification is the problem of classifying instances into one of three or more classes (classifying instances into one of two classes is called binary classification).”

The following algorithms were used for classification analysis:

· Decision Tree Classifier,

· Random Forest Classifier,

· XGBoost Classifier,

· Naïve Bayes,

· Support Vector Machines (SVM),

· AdaBoost.

The following issues were the scope of this study:

Which algorithm provided the best results for multi-class classification?
Was hyperparameter tuning successful in improving the metrics?
Reason for poor (if any) metrics.
Is it safe to use these methods for multi-class classification of alloys.

Data Cleaning

The first step is to import and clean the data (if needed) using pandas before starting the analysis.

There are 25 austenitic (A), 17 martensitic (M), 11 ferritic (F) and 9 precipitation-hardening (P) stainless steels in the dataset.

There are 62 rows (stainless steels) and 17 columns (attributes) of data. 15 columns cover the chemical composition information of the alloys. The first column is the AISI designation and the last column is the type of the alloy. Our target is to estimate the type of the steel.

Descriptive statistics of the dataset are shown below.

Some element percentages are almost stable (Sulphur), but Chromium and Nickel percentages have a very wide range (and these two elements are the defining elements of stainless steels).

Correlation can be defined as a measure of the dependence of one variable on the other one. Two features being highly correlated with each other will provide too much and useless information on finding the target. The heatmap below shows that the highest correlation is between manganese and nitrogen. Manganese is present in every steel, but nitrogen is not even present in most of the alloys; so, I kept both.

The dataset is clean (there are no NaNs, Dtype are correct), so we will directly start by Train-Test-Split and then apply the algorithms.

Decision Tree Classifier

First algorithm is the Decision Tree Classifier. It uses a decision tree (as a predictive model) to go from observations about an item (represented in the branches) to conclusions about the item’s target value (represented in the leaves).

The results are very good; actually, only one alloy type was classified mistakenly.

Random Forest Classifier

Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean/average prediction (regression) of the individual trees.

Random forests generally outperform decision trees, but their accuracy is lower than gradient boosted trees. However, data characteristics can affect their performance [ref].

Hyperparameter Tuning with Grid Search

Even though I got satisfactory results with Random Forest Analysis, I applied hyperparameter tuning with Grid Search. Grid search is a common method for tuning a model’s hyperparameters. The grid search algorithm is simple: feed it a set of hyperparameters and the values to be tested for each hyperparameter, and then run an exhaustive search over all possible combinations of these values, training one model for each set of values. The algorithm then compares the scores of each model it trains and keeps the best one. Here are the results:

Hyperparameter tuning with Grid Search took the results to the perfect level — or overfitting.

XGBoost Classifier

XGBoost is well known to provide better solutions than other machine learning algorithms. In fact, since its inception, it has become the “state-of-the-art” machine learning algorithm to deal with structured data.

The results of the XGBoost Classifier provided the best results for this classification study.

Hyperparameter Tuning with Grid Search

Once again, I applied the hyperparameter tuning with Grid Search, even though the results were near perfect.

Naïve Bayes Classifier

Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and used for solving classification problems. Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which helps in building the fast machine learning models that can make quick predictions.

The results are shown below:

Support Vector Machines (SVM)

Support-vector machines (SVMs, also support-vector networks) are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis. An SVM maps training examples to points in space to maximize the width of the gap between the two categories. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall.

The results are shown below:

AdaBoost

AdaBoost, short for Adaptive Boosting, is a machine learning meta-algorithm, which can be used in conjunction with many other types of learning algorithms to improve performance. The output of the other learning algorithms (‘weak learners’) is combined into a weighted sum that represents the final output of the boosted classifier.

The results are shown below:

Conclusion

In this article, I used six different Supervised Machine Learning (Classification) algorithms with the purpose of classifying four types of stainless steels (multi-class) according to their chemical compositions comprised of 15 elements in the alloy. The dataset included 62 alloys; which made it a small, but a very accurate dataset (all the information was taken from ASM International Sources (formerly known as American Society of Metals)).

The analysis provides evidence that:

· Considering the f1 scores, Random Forest and XGBoost methods produced the best results (0.94).

· After hyperparameter tuning by Grid Search, RF and XGBoost f1 scores jumped to 100 %.

· Multiple tries of the same algorithm resulted different results with a huge gap — most probably due to the limited data size.

· The poorest f1 scores were mostly for the classification of the types that have the least-numbered groups; which were ferritic and precipitation-hardened steels.

· Finally, test classification accuracy of 95% achieved by 3 models (DT, RF and XGBoost) and 100% by 2 tuned models demonstrates that the ML approach can be effectively applied to steel classification despite the small number of alloys and heterogeneous input parameters (chemical compositions). Based on only 62 cases, the models achieved a very high level of performance for multi-class alloy type classification.

You can access to this article and similar ones here.

Performance Comparison of Multi-Class Classification Algorithms

Written by Gursev Pirge