A Comparative Study of Various Clustering Algorithms

This article involves the use of different cluster analysis algorithms, with the target of clustering/grouping the senators using their votes as features for 15 different bills in the 114th United States Congress. Finally, the performance of the algorithms in clustering will be compared. I used the following data and was inspired by the studies here.

Photo by Darren Halstead on Unsplash
  • Was the algorithm successful in detecting the clusters?
  • Comparison of the performances of these algorithms.
  • All three algorithms provided similar results showing that Republican Senators came out as a tight cluster, which means that their voting was quite similar.
  • Yellowbrick results implied a possible four clusters instead of two.
  • On the other hand, checking for more than 2 clusters proved that, for some bills, some Democrats had the tendency to vote with the opposing party, which most probably does not mean two more clusters. A reasonable explanation will be, not following party-line voting, but voting with the other group only for a few bills.
  • Scaling did not improve the results significantly for the K-Means Clustering.
  • Silhouette scores for K-Means Clustering and Hierarchical Clustering methods did not match.