Semantic competence matching
Semantic matching of competences and skills.
- artificial intelligence
- machine learning
Description of the problem
VDAB has a database of over 11000 competences, which describe an employee within his or her function. This database keeps on growing. One the biggest challenges is avoiding duplicate competences and grouping competences which resemble each other. Given the size of the database it is no longer possible to perform this manually. The aim of this project was to build an AI based solution which automatically links similar competences.
Our matching algorithm works in several steps. Using the Google Translate API we first translate everything from Dutch/French to English. This gives us four different "languages", i.e., Dutch, French, English (translated from Dutch) and English (translated from French). Next, we clean up and normalise all text. This includes converting all punctuation marks to their correct value and also converting all upper case letters to lower case.
After this phase we can start we the word embeddings. We've chosen fastText from Facebook for this as it allows to perform subword matching. FastText uses both common crawl data as well as models trained on Wikipedia. Next, we use both classical methods as well as machine learning based methods for the sentence embeddings.
Finally, we rank each of these sentence embeddings and based on the continuous feedback from VDAB we retrain our models to ensure a correct matching.