In the last article, we discussed the MUSESelector. This kydavra selector performs feature selection based on a data frame. The biggest drawback of this method is that it is good only for binary classification problems. There comes into play an extension of this method — M3U — (Minimum Mean Minimum Uncertainty), implemented in kydavra as M3USelector, for multiclass classification.
If you still haven’t installed Kydavra just type the following in the following in the command line.
pip install kydavra
If you already have installed the first version of kydavra, please upgrade it by running the following command.
pip install --upgrade…
One of the most intuitive ways to select features would be to find how much the distribution of the classes is different from each other. However on some intervals, the distribution of the feature by the classes can be different, but on other intervals, it can be practically the same. So, we can deduce that the features that have the most intervals where the distribution of classes differ are the best features. This logic is implemented in Minimum Uncertainty and Sample Elimination (or shortly MUSE) implemented in kydavra as MUSESelector.
If you still haven’t installed Kydavra just type the following…
PCA — more than just dimensional redution.
Principal Component Analysis is known as one of the most popular dimension reduction techniques. However few know that it has a very interesting property — the reduced data can be brought back to the original dimension. Even more, the data brought back to its original size is more cleaned. So, at Sigmoid we decided to create a module, to easily apply this property on pandas data frames.
Principal Component Analysis is a dimensional reduction technique that reduces your data frame into n predefined columns, however, unlike LDA it doesn’t take into account the…
Many times, we have some features that are strongly correlated with the target column. However, sometimes they are correlated with each other, generating in such a way the problem of multicollinearity. One way is to reduce one of these columns. But, we at sigmoid want to propose to you a new solution to this problem implemented in kydavra.
Linear Discriminant Analysis is a dimensional reduction technique that reduces your data frame into n predefined columns, however, unlike PCA it takes into account the target vector.
At sigmoid, we thought what would be if instead of reducing the whole data frame…
We all know the Occam’s Razor:
From a set of solutions took the one that is the simplest.
This principle is applied in the regularization of the linear models in Machine Learning. L1-regularisation (also known as LASSO) tend to shrink wights of the linear model to 0 while L2-regularisation (known as Ridge) tend to keep overall complexity as simple as possible, by minimizing the norm of vector weights of the model. One of Kydavra’s selectors uses Lasso for selecting the best features. So let’s see how to apply it.
If you still haven’t installed Kydavra just type the following in…
So how we said in previous articles about Kydavra library, Feature selection is a very important part of Machine Learning model development. Unfortunately, there is not only one unique way to get the ideal model, mostly because of the fact that data almost every time has different forms, but this also implies different approaches. In this article, I would like to share a way to select the categorical features using Kydavra ChiSquaredSelector created by Sigmoid.
As always, for those that are there mostly just for the solution to their problem their are the commands and the code:
To install kydavra…
Almost every person in data science or Machine Learning knows that one of the easiest ways to find relevant features for predicted value y is to find the features that are most correlated with y. However few (if not a mathematician) know that there are many types of correlation. In this article, I will shortly tell you about the 3 most popular types of Correlation and how you can easily apply them with Kydavra for feature selection.
Pearson correlation.
Pearson’s correlation coefficient in the covariance of two variables divided by the product of their standard deviations.
Maths almost always have a good answer in questions related to feature selection. However, sometimes good-old brute force algorithms can bring into the game a better and more practical answer.
Genetic algorithms are a family of algorithms inspired by biological evolution, that basically use the cycle — cross, mutate, try, developing the best combination of states depending on the scoring metric. So, let’s get to the code.
To install kydavra just write the following command in terminal:
pip install kydavra
Now you can import the Selector and apply it on your data set a follows:
from kydavra import GeneticAlgorithmSelectorselector…
Very often we can look at classification as a problem of finding differences between 2 groups. Before Machine Learning Statisticians were doing it quite a lot. Mostly they used such metrics as mean, variation, and standard deviation. However, it was a time-consuming process, and couldn’t be done with many groups. And there came the famous statistician Sir Ronald Aylmer Fisher, who proposed a method named Analysis of variance (shortly ANOVA). However we at Sigmoid think that a simple thing can be simpler, so we added ANOVASelector kydavra.
For those that are there mostly just for the solution to their problem…
Face ID is a technology, that allows devices to give access tho themselves only to allowed identities. It uses this looking at the person’s face and if it is recognized as a person with access, the systems give it. For a long time, it was implementing throw different technologies, but nowadays after the massive development of the ANN and achieving high performance in computer vision by CNNs, the Neural Network has become a more attractive way to do that.
The most straightforward way to do that would be to build a simple CNN that would classify the user’s face as…
A young and passionate student about Data Science and Machine Learning, dreaming of becoming one day an AI Engineer.