Statistical Machine Learning deals with data analysis. Basically it considers three kind of problems:
- Unsupervised classification or clustering, where the objective is to find the natural groups that can be found in a dataset. It is an exploratory task.
- Supervised classification tries to build predictive models from a dataset of classified objects. Basically given a dataset, the objective is to construct a probabilistic model able to predict the value of unseen cases.
- Feature subset selection. In this case the objective is to select the most relevant characteristics to construct a classification model.
Probabilistic graphical models are the most commonly used formalism to deal with probabilistic modeling in the fields of Artificial Intelligence in general and Machine Learning in particular. They are mathematical tools that allow to carry out the main operations with probability distributions (estimation and inference) in an efficient way.
There are several challenges related with this research line. The first one has to do with the nature of the learning problem: It is a NP-hard problem. This means that there is not known polynomial algorithm that can solve all the instances of it. Therefore we will have to find a balance between the efficiency of the developed algorithms and its range of applicability scenarios. A second challenge is the conception of parallel algorithms with theoretical guarantees. In order to deal with high amount of data, one possibility is to straightforward parallelize the sequential algorithms, however this is not usually the most efficient approach. To excel in this field one has to design specific algorithms that run much faster (than parallelized sequential versions) but at the same time lose some of its theoretical properties. The design of parallel learning algorithms trying to maximize speed and minimize theoretical guarantees loss is a challenge research line.
Engineering & Technology
- Advanced manufacturing