Today we are going to learn about the new feature of Scikit-learn version 0.22 called KNN Imputation. This feature now enables us to support imputation for completing missing values using k-Nearest Neighbours (KNN). To track our tutorials on other new releases from scikit-learn, read our blog here and here.
Each sample’s missing values are imputed using the mean value from nearest neighbours found in the training set. Two samples are close if the features that are neither missing are close. By default, a Euclidean distance metric that supports missing values, nan_euclidean_distances, is used to find the nearest neighbours.
Input and Output
So, what we do first is to import libraries like NumPy and run them. Then we create as many rows as we wish to. Then we run the function KNN Imputer and we can decide how many neighbours we want. We first, as is the procedure to use scikit-learn goes, create an object and then run it. Then we can directly put the input values in imputer.fit_transform and get the output values in the form of patterns detected in the input values.
The code sheet for this tutorial is provided in a Github repository here
For more on this do watch the video attached herewith. This tutorial was brought to you by DexLab Analytics. DexLab Analytics is a premiere Machine Learning institute in Gurgaon.
Watch the video here.