We provide you with examples of datasets and scripts in order to reproduce some experiments. Thus, you can easily assess the use and the performances of Edge-ML in real life conditions.
If you do not have time to reproduce these experiments, you can find for each use-case a short description of the used dataset, the exploited hardware resources and the reached performances
The aim of the project is to predict customers' interest to contract a caravan insurance policy (without knowing which customers actually own a caravan). Three types of variables are used to describe the clients: i) their use of other insurance products; (ii) socio-demographic information; (iii) local information derived from zip area. The ROI of a telemarketing campaign can be optimized by using a model to target the most interested customers.
The learning stage is almost instantaneous for this dataset size. The learned model is both very accurate (AUC close to 1) and very robust (there is no significant difference between the AUC measured on the train and test sets)
The goal of the project is to build a network intrusion detector by using Machine Learning techniques. 22 types of known attacks have been simulated in a military network environment. The input variables technically describe the connections to the network (eg used protocol, connection time, etc.). This use-case shows the security of a network can be improved continuously thanks to Auto ML techniques.
Edge ML is optimized to process large amount of data and requires minimal hardware resources. Here, the model is learned on 3 million rows in 2 hours and 35 minutes, by using a common laptop.
The aim of the project is to optimize the choice of tree species for a reforestation purpose, from cartographic data. This dataset contains forest cells described by altitude, slope, exposure, distance to the nearest water point, soil type ... The variable to be predicted is one of the 7 types of cover present on the cells. Edge ML solves this problem with a very hight accuracy.
The model must predict one of the 7 forest cover types: this is a multi-class learning problem. Edge ML natively learns a multi-class classifier (based on the MODL approach) without resorting to the usual heuristic which consists in learning several classifiers (one versus all). The learning stage is faster and the model is easier to use.
The aim of the project is to detect the type of activity (run or walk) from sensor data. This dataset contains measurements which are collected every 10 seconds by using the gyroscope and the accelerometer of an iPhone 5S. The model has been learned by exploiting minimalist hardware resources: a Raspberry Pi 2 (resources lower than a current smartphone). The learned model is accurate and very robust. Edge-ML paves the way for learning secure models directly on devices!
The model is learned in 30 seconds by using a Raspberry Pi 2. Thus, it is possible to learn the models directly on the devices, without externalizing the data collected by the smartphones and the IOT . Edge ML paves the way for new uses of Machine Learning 100% privacy-friendly!
The goal is to estimate the click rate of an online ad when it is presented to a particular user. The volume of collected data for online ads re-targeting is very important, thus it is necessary to use scalable Machine Learning algorithms. Edge ML pushes the limites by processing several tens of millions of rows on a standard server (i.e. Xeon 8 core / 64 GB RAM).
This use case allows you to evaluate the scaling capability of Edge ML by using several training sets with different sizes. Using a standard server, the model is learned in 13 minutes on 1 million examples, in 44 minutes on 2 million examples, in 1h30 on 3 million examples and in 14h on 10 million examples. Edge ML pushes the limits of your hardware :-)
The goal of the project is to automatically classify e-mails into 10 categories. In this case, the e-mails are characterized by several "sequencial" variables. The object and body of each e-mail are considered as sequences of words. The sender and recipient information (organizations, countries, etc.) are encoded as sets. Edge ML automaticaly prepares these complex kind of data by extracting relevant and robust sub-sequences and sub-sets.
Edge ML processes sequential variables seamlessly: a single command line is enough to extract the relevant sequential rules and learn an ensemble classifier. The slight decrease in robustness is due to the fact that the rules extracted from textual data are generally not independent ( in this case, the option '-lessRule' is recommended).
This project aims to predict the ratings of the products which are sold on an e-commerce web site, based on the written opinions of customers. These textual data is processed in their raw state, as sequences of words (no pre-processing is performed - eg lemmatization). Edge ML automatically extracts sub-sequences that are both relevant and robust, and then automatically learns a predictive model. Edge ML solves this problem with great precision, while providing an easily interpretable model and rules.
The extracted rules are easily interpretable and constitute a valuable help for the Features Engineering step. For instance, Edge ML extracts the following rules: