Edge-ML benefits explained to...
Data Scientists, Data Engineers and Computer Scientists
A generic tool, useful for all your projects
Edge-ML is a generic tool which makes all your Machine Learning projects more efficient: banking scoring, marketing targeting, text / e-mail classification, online ad targeting, sentiment analysis, etc. (see Use Case)
What benefits you draw?
- Go / No go projects: Using Edge ML from the beginning of a Machine Learning project has several advantages. A first baseline model is obtained very quickly and effortlessly (no data cleaning, no recoding of the categorial variables, no optimization of the model's hyper-parameters by using a grid-search ...). This baseline model is accurate and very robust, it gives a valuable indication on the feasibility of the project. If Edge-ML fails to learn a good model, this means that the input dataset is not informative enough. Most of the time, the Go/No go is performed by using a non-optimized model (with default values on its hyper-parameters). In this case, there is a risk of making a wrong decision due to an overfitting problem. It will never be the case with Edge-ML.
- Reduction in the project's duration: There are two possible uses of Edge ML: i) in automatic mode; ii) jointly with the regular Machine Learning algorithms.
In automatic mode, your projects are extremely fast thanks to the MODL approach which automatically prepares the data (discretization and grouping models) and which directly trains an ensemble clasifier without optimizing any hyper-parameters. Most of the time, the trained model has an accuracy comparable to a random forest and this model has an extraordinary robustness. Thus, the automatic mode is very helpful when the robustness of the model must be guaranteed and when its use into the production environment must be secured.
Used jointly with the regular Machine Learning approaches, Edge ML brings valuable pieces of information and allows you to accelerate your projects. For instance, Edge ML is able to filter uninformative variables (not correlated to the target) without making any assumptions on the distribution of data - which is not the case with a statistical test. Edge ML allows you to evaluate the drift of variables between the train and deploy sets. This step is particularly important for detecting the variables which are not stable over time and which represent a risk when the model is applied on new data. In addition, Edge ML allows you to calibrate any binary classifier, in order to correct the output probabilities which are estimated in a distorted manner by the classifier. Finally, the agility of the automatic mode allows you to imagine and evaluate a large number of features without time constraints. Used with the regular Machine Learning approaches, Edge ML accelerates and secures your work :-)
- Reduction of used hardware resources: Edge ML is very hardware-efficient because it is based on a disruptive mathematical approach. MODL is a Bayesian model selection approach which is at the crossroads of Machine Learning and Information Theory (for more information, here is a series of 4 videos on the MODL approach). The MODL approach is regularized, it purely avoids overfitting problems without optimizing any hyper-parameters. The learning algorithm of Edge ML is executed only once: there is no grid-search, which considerably reduces the duration of the learning phase. A deeply optimized C++ implementation reduces the processing time and the used memory. So, tens of millions of rows can be processed by using a standard server (ex: Xeon 8 core and 64 Go of RAM). Edge-ML saves you heavy investment in computing infrastructure (Hadoop clusters, HPC ...).
- Robust and almost optimal model: The bias-variance tradeoff is a very important notion of Machine Learning. According to this law, there is a compromise between the accuracy of the models (low bias) and their robustness (low variance). In other words, when a model is extremely optimized by adjusting its hyper-parameters very finely, the observed gain of accuracy is generally accompanied by a decrease in reliability. In this case, the model may degrade its performance when applied on new data - especially if small fluctuations occur in the data (eg variance, noise, drift ...). The MODL approach favors the robustness while providing accurate models - the accuracy is generally comparable to a random forests. The MODL approach is exploited at all stages of the automated pipe of Machine Learning that is implemented by Edge ML: data preparation (discretization and grouping models); extraction of sequential rules; selection of variables; learning of an ensemble classifier.
- Interpretable models: Edge-ML provides highly interpretable models. There are two rankings for the importance of variables: (i) the univariate importance, which indicates how each variable explains the target on its own; ii) the multi-varied importance, which indicates how each variable explains the target jointly with the others variables. These two rankings make it possible to identify at a glance the variables which benefit from the interactions with the other variables.
During the data preparation stage, the numerical variables are discretized in a supervised way and the levels of the categorical variables are grouped together. These distretization / grouping models can be visualized and intuitively illustrate the distribution of the class values. Your interactions with the business and marketing entities are facilitated :-) Finally, the extraction of sequential rules provides simple and interpretable patterns which constitue robust and informative new features.