Social Media Means
Photo by RODNAE Productions Pexels Logo Photo: RODNAE Productions

How accurate is k-means?

In the first attempt only clusters found by KMeans are used to train a classification model. These clusters alone give a decent model with an accuracy of 78.33%.

What is the purpose of a referral?
What is the purpose of a referral?

A referral provides information about you and your condition so that: the person you are being referred to does not have to ask so many questions....

Read More »
What job has the highest burnout rate?
What job has the highest burnout rate?

13 Stressful Jobs That Lead to Burnout Nurse. The median salary for registered nurses in the U.S. is under $80,000. ... Teacher. ... Construction...

Read More »

KMeans Clustering for Classification

Background

Clustering as a method of finding subgroups within observations is used widely in applications like market segmentation wherein we try and find some structure in the data. Although an unsupervised machine learning technique, the clusters can be used as features in a supervised machine learning model. Clustering is a type of unsupervised machine learning which aims to find homogeneous subgroups such that objects in the same group (clusters) are more similar to each other than the others. KMeans is a clustering algorithm which divides observations into k clusters. Since we can dictate the amount of clusters, it can be easily used in classification where we divide data into clusters which can be equal to or more than the number of classes. I’ll be using the MNIST dataset which comes with scikit learn which is a collection of labelled handwritten digits and use KMeans to find clusters within the dataset and test how good it is as a feature.

Implementation

I have created a class named clust for this purpose which when initialized takes in a sklearn dataset and divides it into train and test dataset. The function KMeans applies KMeans clustering to the train data with the number of classes as the number of clusters to be made and creates labels both for train and test data. The parameter output controls how do we want to use these new labels, ‘add’ will add the labels as a feature in the dataset and ‘replace’ will use the labels instead of the train and test dataset to train our classification model.

Results

In the first attempt only clusters found by KMeans are used to train a classification model. These clusters alone give a decent model with an accuracy of 78.33%. Let’s compare it with an out of the box Logistic Regression model. In this case I am only using the features (greyscale intensity values) to train a Logistic Regression model. It results in a much better model with an accuracy of 95.37%. Let’s add the clusters as a feature(column) and train the same Logistic Regression model. In our final iteration we are using the clusters as features, the results show an improvement over our previous model.

TakeAway

Clustering apart from being an unsupervised machine learning can also be used to create clusters as features to improve classification models. On their own they aren’t enough for classification as the results show. But when used as features they improve model accuracy. You can use the class I created to tweak and test different models, for eg test a Random Forest Classifier and share something I didn’t find in the comments.

Who started marketing?
Who started marketing?

One of the first theorists to consider the stages in the development of marketing thought was Robert Bartels, who in The History of Marketing...

Read More »
Is TikTok banned in the US?
Is TikTok banned in the US?

Oklahoma is one of several U.S. states that have passed TikTok bans. At least 14 states have restricted the use of TikTok on government devices,...

Read More »

How can I make $500 over night?
How can I make $500 over night?

18+ Ways to Make $500 Fast Sell Your Stuff. Rent Out Space on Airbnb. Rent Out Your Parking Space. Rent Out Your Storage Areas. Make Money Doing...

Read More »
What is the 5% retirement rule?
What is the 5% retirement rule?

The sustainable withdrawal rate is the estimated percentage of savings you're able to withdraw each year throughout retirement without running out...

Read More »
Is it better to have a Facebook page or profile?
Is it better to have a Facebook page or profile?

That's a very common question. The short answer is that a Facebook Profile is what users create for their personal needs, and a Facebook Page is...

Read More »
What is the newest social media platform?
What is the newest social media platform?

New social media apps and platforms marketers should watch in 2022 TikTok. In human years, TikTok is only a kindergartener. ... BeReal. ......

Read More »