The bias will be 0 when K=1, however, when it comes to new data (in test set), it has higher chance to be an error, which causes high variance. When we increase K, the training error will increase (increase bias), but the test error may decrease at the same time (decrease variance).
Day 3 — K-Nearest Neighbors and Bias–Variance Tradeoff Hello, new neighbors :-)
Today we’ll learn our first classification model, KNN, and discuss the concept of bias-variance tradeoff and cross-validation. Also, we could choose K based on cross-validation.
K-Nearest Neighbors (KNN) The k-nearest neighbors algorithm (k-NN) is a non-parametric, lazy learning method used for classification and regression. The output based on the majority vote (for classification) or mean (or median, for regression) of the k-nearest neighbors in the feature space. KNN is one of the simplest model since it is non-parametric and lazy learning method. What does it mean? When we say the model is non-parametric, it means that it does not make any assumption of the data distribution. It’s pretty useful in real world application, since most of the data may not follow any distribution. What about lazy learning? Opposed to eager learning, lazy learning is a learning method in which generalization of the training data is delayed until a query is made to the system. In other words, there is no explicit training stage, or it is very minimal, which also means that training is very fast in KNN. The intuition of KNN is pretty simple. There is an old saying, one takes the behavior of one’s company. Imagine a group of educated, young, and smart people. It’s not hard to imagine that their friend is also the same style, educated, young, and smart. Same kind of people tend to group together There are only 3 steps for KNN: Calculate distance (e.g. Euclidean distance, Hamming distance, etc.) Find k closest neighbors Vote for labels or calculate the mean 3 steps for KNN. Credit: https://www.datacamp.com/community/tutorials/k-nearest-neighbor-classification-scikit-learn Pretty easy, right? Now the problem is, what is K? How do we choose K? Different K could have different results.
Bias–Variance Tradeoff Before we choose K, I want to explain an important concept in machine learning, bias-variance tradeoff. First of all, what is bias? And what is variance? The bias is an error from erroneous assumptions in the learning algorithm. High bias can cause an algorithm to miss the relevant relations between features and target outputs. In other words, model with high bias pays very little attention to the training data and oversimplifies the model. The variance is an error from sensitivity to small fluctuations in the training set. High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs. In other words, model with high variance pays a lot of attention to training data and does not generalize on the data which it hasn’t seen before. More specifically, we use mathematical terms to define bias and variance. Again, if you are not comfortable with math, you can skip this. I’ll explain by graph in the next paragraph. Bias is the difference between the true label and our prediction, and variance is defined in Statistics, the expectation of the squared deviation of a random variable from its mean. Here, f represents the model in true world. There exists random noise that we could not avoid, which we represent ϵ. The true label is represented by And we can compute the error. Let’s use a graph to explain. Imagine that the center of the target (red part) is the correct values of the data. As we move away from that region, the error becomes larger and larger. In this case, we get higher bias. Now imagine we get a number of separate predictions, which may be based on the variability in the training data. In this case, we get a higher variance. Graphical illustration of bias and variance. Credit: http://scott.fortmann-roe.com/docs/BiasVariance.html There are other two terms related to bias and variance, underfitting and overfitting. Underfitting means the model does not fit, in other words, does not predict, the (training) data very well. On the other hand, overfitting means that the model predict the (training) data too well. It is too good to be true. If the new data point comes in, the prediction may be wrong. Normally, underfitting implies high bias and low variance, and overfitting implies low bias but high variance. Dealing with bias-variance problem is about dealing with over- and under-fitting. Bias is reduced and variance is increased in relation to model complexity. Why? If the model is more complex, it means it has more power to capture the distribution of the data, which fits in the training set perfectly, in other word, overfitting. Bias-variance tradeoff compared to model complexity. The left model is more complicated, which captures all the data points but has high variance. The middle model is simplest, and has high bias. The right model has low variance and low bias, which is what we want. Credit: https://towardsdatascience.com/understanding-the-bias-variance-tradeoff–165e6942b229 The relationship between bias-variance and model complexity. Credit: http://scott.fortmann-roe.com/docs/BiasVariance.html Now we have another problem. How do we know if we are under- or over-fitting? In real world application, we would not know the ground truth of the test set. How do we compute the error if we don’t know the answer?
Cross-validation The answer is pretty simple, just split the training set. Say we want to split the training set to A (80% of data) and B (20% of data). We then train our model based on A and test the model on B, since we know the ground truth of B now. B is called the validation set. We could use the validation set to tune our parameters, such as K in KNN. Remember that we could not use the ground truth of the validation set when we are training. We could use a more stable method called cross-validation to test the result. We rotate our validation set and use the rest of the data to train, say, split the data into K fold, and train on (K–1) folds and test on 1 fold as validation set, which is called K-fold cross validation. After that, we average all the errors to get the final accuracy. K-fold cross validation. When K equals to the number of training set, it’s called Leave-one-out CV, since we only test on one sample each time and use the rest of the data to train.
What are low stress jobs for people with anxiety?
Here are other low-stress job opportunities and ideas for someone with anxiety: Fitness trainer. Massage therapist. Video editor. Jun 13, 2021
Back to KNN Let’s go back to KNN, how do the bias and variance relate to KNN? I just say that KNN is lazy learning at the beginning, how does it relate to model complexity? Now, consider an extreme case, K=1, what will it happen? The training data will be perfectly predicted, right? The bias will be 0 when K=1, however, when it comes to new data (in test set), it has higher chance to be an error, which causes high variance. When we increase K, the training error will increase (increase bias), but the test error may decrease at the same time (decrease variance). We can think that when K becomes larger, since it has to consider more neighbors, its model is more complex. Now we can split the data into training and validation set and decide what K should be like. Error rate and K. Credit: http://sameersingh.org/courses/gml/fa17/sched.html
Programming it We use Iris dataset to train our model. It has 3 classes which represent 3 different kinds of iris. Our goal is to classify which iris it is. Iris dataset There are 4 features in this dataset. We do a little bit data exploration. It seems that we can separate the class by only 2 features. However, since we want to see how K affects the result, we choose feature 1 and 2 because it has more overlap. Different pair of features We can build our model now! Remember, KNN does not have training step, so we predict the data directly. There are 3 steps for KNN, calculate the distance, find K nearest neighbors, and count the number of each label. We test different K and visualize the decision boundary. Visualize the result based on different K We can see that when K is small, there are some outliers of green label are still green, and outliers of red label are still red. When K becomes larger, the boundary is more consistent and reasonable. Second, we use sklearn built-in KNN model and test the cross-validation accuracy. There is only one line to build the model. knn = KNeighborsClassifier(n_neighbors=k) And one line for cross-validation test. cross_val_score(knn_model, X, y, cv=k-fold, scoring='accuracy') The result shows that we could choose K around 13 or 20, which we’ll get the highest cross-validation accuracy. The cross-validation accuracy based on different K You can find the whole implementation through this link. Feel free to play around with it!
Summary Today we learn about KNN model, which has 3 steps, Calculate distance (e.g. Euclidean distance, Hamming distance, etc.) Find k closest neighbors Vote for labels or calculate the mean And few pros and cons about KNN. Pros: No assumptions about data distribution, useful in real world application
How much does 1k views on Instagram pay?
Life Marketing did a study that showed that the average CPM was $5.14 (so, they pay $5.14 for 1,000 views) and the average CPC ranged from 20 cents...