Photo: Monstera

Does increasing K increase bias?

Let's observe its effect. If we consider different values of k, we can observe the trade-off between bias and variance. As k increases, we have a more stable model, i.e., smaller variance, however, the bias is also increased. As k decreases, the bias also decreases, but the model is less stable.

teazrq.github.io - K Nearest Neighbor and the Bias-variance Trade-off

Is social media a full time job?

Social media may be free, but it isn't easy. There are no short cuts. Just like with anything else, you get out of it what you put into it. If...

Last updated Jan 1, 2022

What software is Truth Social using?

Truth Social is based on Mastodon, and that's perfectly acceptable. Mastodon's open-source license allows developers to use, modify, and distribute...

Last updated Mar 25, 2019

Promotion

(K) Nearest Neighbor and the Bias-variance Trade-off Ruoqing Zhu Abstract This is the supplementary R file for (k) Nearest Neighbor in the lecture note “KNN”. This is the supplementaryfile forNearest Neighbor in the lecture note “KNN”. Basic Concepts (K) nearest neighbor is a simple nonparametric method. Suppose we collect a set of observations ({x_i, y_i}_{i=1}^n), the prediction at a new target point is [widehat y = frac{1}{k} sum_{x_i in N_k(x_0)} y_i,] where (N_k(x_0)) defines the (k) samples from the training data that are closest to (x_0). As default, closeness is defined using distance measures, such as the Euclidean distance. Here is a demonstration of the fitted regression function. library(kknn) set.seed(1) # generate training data with 2*sin(x) and random Gaussian errors x <- runif(15, 0, 2*pi) y <- 2*sin(x) + rnorm(length(x)) # generate testing data points where we evaluate the prediction function test.x = seq(0, 1, 0.001)*2*pi # "1-nearest neighbor" regression k = 1 knn.fit = kknn(y ~ x, train = data.frame(x = x, y = y), test = data.frame(x = test.x), k = k, kernel = "rectangular") test.pred = knn.fit$fitted.values # plot the data par(mar=rep(2,4)) plot(x, y, xlim = c(0, 2*pi), pch = "o", cex = 2, xlab = "", ylab = "", cex.lab = 1.5) title(main=paste(k, "-Nearest Neighbor Regression", sep = ""), cex.main = 1.5) # plot the true regression line lines(test.x, 2*sin(test.x), col = "deepskyblue", lwd = 3) box() # plot the fitted line lines(test.x, test.pred, type = "s", col = "darkorange", lwd = 3) Tuning (K) and the Bias-variance Trade-off (K) is a crucial tuning parameter for (K)NN. Let’s observe its effect. # generate more data set.seed(1) n = 200 x <- runif(n, 0, 2*pi) y <- 2*sin(x) + rnorm(length(x)) test.y = 2*sin(test.x) + rnorm(length(test.x)) # 1-nearest neighbor k = 1 knn.fit = kknn(y ~ x, train = data.frame(x = x, y = y), test = data.frame(x = test.x), k = k, kernel = "rectangular") test.pred = knn.fit$fitted.values par(mar=rep(2,4)) plot(x, y, xlim = c(0, 2*pi), pch = 19, cex = 1, axes=FALSE, ylim = c(-4.25, 4.25)) title(main=paste(k, "-Nearest Neighbor Regression", sep = "")) lines(test.x, 2*sin(test.x), col = "deepskyblue", lwd = 3) lines(test.x, test.pred, type = "s", col = "darkorange", lwd = 3) box() Overall, we can evaluate the prediction error of this model: # prediction error mean((test.pred - test.y)^2) ## [1] 2.097488 If we consider different values of (k), we can observe the trade-off between bias and variance. par(mfrow=c(2,3)) for (k in c(1, 5, 10, 33, 66, 100)) { knn.fit = kknn(y ~ x, train = data.frame(x = x, y = y), test = data.frame(x = test.x), k = k, kernel = "rectangular") test.pred = knn.fit$fitted.values par(mar=rep(2,4)) plot(x, y, xlim = c(0, 2*pi), pch = 19, cex = 0.7, axes=FALSE, ylim = c(-4.25, 4.25)) title(main=paste("K =", k)) lines(test.x, 2*sin(test.x), col = "deepskyblue", lwd = 3) lines(test.x, test.pred, type = "s", col = "darkorange", lwd = 3) box() } As (k) increases, we have a more stable model, i.e., smaller variance, however, the bias is also increased. As (k) decreases, the bias also decreases, but the model is less stable. Formally, the prediction error (at a given target point (x_0)) can be broken into three parts: the irreducible error, the bias squared, and variance. [egin{aligned} EBig[ ig( Y - widehat f(x_0) ig)^2 Big] &= E Big[ ig( Y - f(x_0) + f(x_0) - E[widehat f(x_0)] + E[widehat f(x_0)] - widehat f(x_0) ig)^2 Big] \ &= E Big[ ig( Y - f(x_0) ig)^2 Big] + E Big[ ig(f(x_0) - E[widehat f(x_0)] ig)^2 Big] + EBig[ ig(E[widehat f(x_0)] - widehat f(x_0) ig)^2 Big] + ext{Cross Terms}\ &= underbrace{EBig[ ( Y - f(x_0))^2 ig]}_{ ext{Irreducible Error}} + underbrace{Big(f(x_0) - E[widehat f(x_0)]Big)^2}_{ ext{Bias}^2} + underbrace{EBig[ ig(widehat f(x_0) - E[widehat f(x_0)] ig)^2 Big]}_{ ext{Variance}} end{aligned}] Cross-Validation Tuning the parameter (k) can be a tricky task since we cannot know the bias in practice. Cross-validation can be used to directly estimate the prediction error. A 10-fold cross validation works as follows # 10 fold cross validation nfold = 10 infold = sample(rep(1:nfold, length.out=length(x))) mydata = data.frame(x = x, y = y) # we many consider a set of possible k values allk = c(1:5, seq(10, 100, 10)) # save prediction errors of each fold errorMatrix = matrix(NA, length(allk), nfold) # loop across all possible k and all folds for (l in 1:nfold) { for (k in 1:length(allk)) { knn.fit = kknn(y ~ x, train = mydata[infold != l, ], test = mydata[infold == l, ], k = allk[k]) errorMatrix[k, l] = mean((knn.fit$fitted.values - mydata$y[infold == l])^2) } } # plot the results for each fold plot(rep(allk, nfold), as.vector(errorMatrix), pch = 19, cex = 0.5, xlab = "k", ylab = "prediction error") points(allk, apply(errorMatrix, 1, mean), col = "red", pch = 19, type = "l", lwd = 3) # which k is the best? average across all folds allk[which.min(apply(errorMatrix, 1, mean))] ## [1] 30

What is the new social media?

10 New Social Media Platforms & Apps TikTok. TikTok has quickly become a major player with its detailed algorithms and unique content creation...

Last updated Jul 21, 2020

What online jobs pay the best?

Highest Paid Online Jobs Marketing Automation Specialist. Salary range: $54,500-$82,000 per year. ... Contract Graphic Designer. Salary range:...

Last updated May 25, 2021

Promotion

KNN for Classification For hard classification, KNN is not really much different from the regression model, except that a majority voting is used instead of averaging. For 1NN, it is a Voronoi tessellation. # knn for classification: library(class) ## Warning: package 'class' was built under R version 4.1.2 set.seed(1) # generate 20 random observations, with random class 1/0 x <- matrix(runif(40), 20, 2) g <- rbinom(20, 1, 0.5) # generate a grid for plot xgd1 = xgd2 = seq(0, 1, 0.01) gd = expand.grid(xgd1, xgd2) # fit a 1-nearest neighbor model and get the fitted class knn1 <- knn(x, gd, g, k=1) knn1.class <- matrix(knn1, length(xgd1), length(xgd2)) # plot the data par(mar=rep(0.2,4)) plot(x, col=ifelse(g==1, "darkorange", "deepskyblue"), pch = 19, cex = 3, axes = FALSE, xlim= c(0, 1), ylim = c(0, 1)) symbols(0.7, 0.7, circles = 2, add = TRUE) points(0.7, 0.7, pch = 19) box() # Voronoi tessalation plot (1NN) library(deldir) par(mar=rep(0.2,4)) z <- deldir(x = data.frame(x = x[,1], y = x[,2], z=as.factor(g)), rw = c(0, 1, 0, 1)) w <- tile.list(z) plot(w, fillcol=ifelse(g==1, "bisque", "cadetblue1")) points(x, col=ifelse(g==1, "darkorange", "deepskyblue"), pch = 19, cex = 3) Example 1: An artificial data We use artificial data from the ElemStatLearn package. library(ElemStatLearn) x <- mixture.example$x y <- mixture.example$y xnew <- mixture.example$xnew par(mar=rep(2,4)) plot(x, col=ifelse(y==1, "darkorange", "deepskyblue"), axes = FALSE) box() The decision boundary is highly nonlinear. We can utilize the contour() function to demonstrate the result. # knn classification k = 15 knn.fit <- knn(x, xnew, y, k=k) px1 <- mixture.example$px1 px2 <- mixture.example$px2 pred <- matrix(knn.fit == "1", length(px1), length(px2)) contour(px1, px2, pred, levels=0.5, labels="",axes=FALSE) box() title(paste(k, "-Nearest Neighbor", sep= "")) points(x, col=ifelse(y==1, "darkorange", "deepskyblue")) mesh <- expand.grid(px1, px2) points(mesh, pch=".", cex=1.2, col=ifelse(pred, "darkorange", "deepskyblue")) As a comparison, we use a linear regression to fit the data and obtain a naive decision boundary. # using linear regression to fit the data (not logistic) lm.fit = glm(y~x) lm.pred = matrix(as.matrix(cbind(1, xnew)) %*% as.matrix(lm.fit$coef) > 0.5, length(px1), length(px2)) par(mar=rep(2,4)) plot(mesh, pch=".", cex=1.2, col=ifelse(lm.pred, "darkorange", "deepskyblue"), axes=FALSE) abline(a = (0.5 - lm.fit$coef[1])/lm.fit$coef[3], b = -lm.fit$coef[2]/lm.fit$coef[3], lwd = 2) points(x, col=ifelse(y==1, "darkorange", "deepskyblue")) title("Linear Regression of 0/1 Response") box()

teazrq.github.io - K Nearest Neighbor and the Bias-variance Trade-off

Do YouTube pay daily?

The earnings are paid out by the 21st-26th of the current month as long as your total balance has reached the payment threshold and if you have no...

Last updated Jun 5, 2018

What is new media and its importance?

New media technologies are popularly referred to as Web 2.0 technologies. These technologies use web technologies like online social networking,...

Last updated Apr 16, 2021

Promotion

What is a low stress job?

A low-stress job may not mean a slow-paced one but indicators of low-stress work may include job security, low travel requirements, and a non-...

Last updated Apr 17, 2020

Promotion

How much does 10k Instagram followers make?

Most micro-influencers who have 5-10k followers make an average of ₹6,531 per post. ... How much can you make from Instagram? Creator Estimated...

Last updated Sep 11, 2021