VASS

Semi-Supervised Algorithm Visualizer

A little bit of theory...

Semi-supervised learning is the branch of machine learning that refers to the simultaneous use of data both labeled and unlabeled to perform learning tasks. It sits between supervised and unsupervised learning.

Among all semi-supervised algorithms, the core of this project is based on inductive methods. Their idea is very simple: they try to create a classifier that predicts labels for new data. The proposed algorithms have this objective, albeit with a bit more specificity: wrapper methods. The well-known wrapper methods are based on pseudo-labeling, which is the process where classifiers trained with labeled data generate labels for the unlabeled ones. Once this process is completed, the classifier is retrained, incorporating these new labels.

In the following cards, four of the most representative algorithms of semi-supervised learning are presented: Self-Training, Co-Training, Tri-Training and Democratic Co-Learning.

Internally, each algorithm utilizes one or several classifiers (wrapper methods). Additionally, the number of views on the data is distinguished (a view being the subset of attributes of the dataset that the algorithm uses to learn the model). Unlike Single-view, a Multi-view algorithm views the dataset attributes as multiple subsets. For example, in the Co-Training algorithm, the first classifier might only "see" half of the attributes while the second one sees the other half. Each of them will work with its specific subset of attributes.

Objective

The aim of this tool is to facilitate, through visualizations, the understanding of how the main semi-supervised algorithms actually work when combined with theoretical concepts.

By selecting any of the algorithms, you will be redirected to load the dataset. Subsequently, you can configure the algorithm with the desired parameters and finally obtain a visualization of the training process.

Self-Training

One classifier
Single-view

Select

Co-Training

Two classifiers
Multi-view

Select

Tri-Training

Three classifiers
Single-view

Select

Democratic Co-Learning

Three or more classifiers
Single-view

Select

La investigación realizada para el desarrollo de este software ha sido parcialmente financiada por la Junta de Castilla y León (proyecto BU055P20), por el Ministerio de Ciencia e Innovación de España (proyectos PID2020-119894GB-I00 y TED 2021-129485B-C43).