Semantic competence matching

VDAB

Semantic matching of competences and skills.

  • artificial intelligence
  • machine learning
  • Brainjar
Semantic competence matching

Description of the problem

VDAB has a database of over 11000 competences, which describe an employee within his or her function. This database keeps on growing. One the biggest challenges is avoiding duplicate competences and grouping competences which resemble each other. Given the size of the database it is no longer possible to perform this manually. The aim of this project was to build an AI based solution which automatically links similar competences.

Our solution

Our matching algorithm works in several steps. Using the Google Translate API we first translate everything from Dutch/French to English. This gives us four different "languages", i.e., Dutch, French, English (translated from Dutch) and English (translated from French). Next, we clean up and normalise all text. This includes converting all punctuation marks to their correct value and also converting all upper case letters to lower case.

After this phase we can start we the word embeddings. We've chosen fastText from Facebook for this as it allows to perform subword matching. FastText uses both common crawl data as well as models trained on Wikipedia. Next, we use both classical methods as well as machine learning based methods for the sentence embeddings.

Finally, we rank each of these sentence embeddings and based on the continuous feedback from VDAB we retrain our models to ensure a correct matching.

Want to hear more?

Hi, my name is
You can reach me at