Image for post
Image for post

In the last article, we discussed the MUSESelector. This kydavra selector performs feature selection based on a data frame. The biggest drawback of this method is that it is good only for binary classification problems. There comes into play an extension of this method — M3U — (Minimum Mean Minimum Uncertainty), implemented in kydavra as M3USelector, for multiclass classification.

Using MUSESelector from Kydavra library.

If you still haven’t installed Kydavra just type the following in the following in the command line.

pip install kydavra

If you already have installed the first version of kydavra, please upgrade it by running the following command.

pip install --upgrade…


Image for post
Image for post

One of the most intuitive ways to select features would be to find how much the distribution of the classes is different from each other. However on some intervals, the distribution of the feature by the classes can be different, but on other intervals, it can be practically the same. So, we can deduce that the features that have the most intervals where the distribution of classes differ are the best features. This logic is implemented in Minimum Uncertainty and Sample Elimination (or shortly MUSE) implemented in kydavra as MUSESelector.

Using MUSESelector from Kydavra library.

If you still haven’t installed Kydavra just type the following…


PCA — more than just dimensional redution.

Image for post
Image for post

Principal Component Analysis is known as one of the most popular dimension reduction techniques. However few know that it has a very interesting property — the reduced data can be brought back to the original dimension. Even more, the data brought back to its original size is more cleaned. So, at Sigmoid we decided to create a module, to easily apply this property on pandas data frames.

Using PCAFilter from Kydavra library.

Principal Component Analysis is a dimensional reduction technique that reduces your data frame into n predefined columns, however, unlike LDA it doesn’t take into account the…


Image for post
Image for post

Many times, we have some features that are strongly correlated with the target column. However, sometimes they are correlated with each other, generating in such a way the problem of multicollinearity. One way is to reduce one of these columns. But, we at sigmoid want to propose to you a new solution to this problem implemented in kydavra.

Using LDAReducer from Kydavra library.

Linear Discriminant Analysis is a dimensional reduction technique that reduces your data frame into n predefined columns, however, unlike PCA it takes into account the target vector.

At sigmoid, we thought what would be if instead of reducing the whole data frame…


Image for post
Image for post

We all know the Occam’s Razor:

From a set of solutions took the one that is the simplest.

This principle is applied in the regularization of the linear models in Machine Learning. L1-regularisation (also known as LASSO) tend to shrink wights of the linear model to 0 while L2-regularisation (known as Ridge) tend to keep overall complexity as simple as possible, by minimizing the norm of vector weights of the model. One of Kydavra’s selectors uses Lasso for selecting the best features. So let’s see how to apply it.

Using Kydavra LassoSelector.

If you still haven’t installed Kydavra just type the following in…


Image for post
Image for post

So how we said in previous articles about Kydavra library, Feature selection is a very important part of Machine Learning model development. Unfortunately, there is not only one unique way to get the ideal model, mostly because of the fact that data almost every time has different forms, but this also implies different approaches. In this article, I would like to share a way to select the categorical features using Kydavra ChiSquaredSelector created by Sigmoid.

Using ChiSquaredSelector from Kydavra library.

As always, for those that are there mostly just for the solution to their problem their are the commands and the code:

To install kydavra…


Image for post
Image for post

Almost every person in data science or Machine Learning knows that one of the easiest ways to find relevant features for predicted value y is to find the features that are most correlated with y. However few (if not a mathematician) know that there are many types of correlation. In this article, I will shortly tell you about the 3 most popular types of Correlation and how you can easily apply them with Kydavra for feature selection.

Pearson correlation.

Pearson’s correlation coefficient in the covariance of two variables divided by the product of their standard deviations.


Image for post
Image for post

Maths almost always have a good answer in questions related to feature selection. However, sometimes good-old brute force algorithms can bring into the game a better and more practical answer.

Genetic algorithms are a family of algorithms inspired by biological evolution, that basically use the cycle — cross, mutate, try, developing the best combination of states depending on the scoring metric. So, let’s get to the code.

Using GeneticAlgorithmSelector from Kydavra library.

To install kydavra just write the following command in terminal:

pip install kydavra

Now you can import the Selector and apply it on your data set a follows:

from kydavra import GeneticAlgorithmSelectorselector…


Classification and regression with one shot.

Image for post
Image for post
Image created by “Sigmoid” public association.

Very often we can look at classification as a problem of finding differences between 2 groups. Before Machine Learning Statisticians were doing it quite a lot. Mostly they used such metrics as mean, variation, and standard deviation. However, it was a time-consuming process, and couldn’t be done with many groups. And there came the famous statistician Sir Ronald Aylmer Fisher, who proposed a method named Analysis of variance (shortly ANOVA). However we at Sigmoid think that a simple thing can be simpler, so we added ANOVASelector kydavra.

Using ANOVASelector from Kydavra library.

For those that are there mostly just for the solution to their problem…


Machine Learning

Understanding more weird networks.

Image for post
Image for post
Icon from flaticon by Nikita Golubev

Face ID is a technology, that allows devices to give access tho themselves only to allowed identities. It uses this looking at the person’s face and if it is recognized as a person with access, the systems give it. For a long time, it was implementing throw different technologies, but nowadays after the massive development of the ANN and achieving high performance in computer vision by CNNs, the Neural Network has become a more attractive way to do that.

First Way — simple CNN.

The most straightforward way to do that would be to build a simple CNN that would classify the user’s face as…

Vasile Păpăluță

A young and passionate student about Data Science and Machine Learning, dreaming of becoming one day an AI Engineer.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store