Master thesis project in Weak Supervision

The Wholesale Banking Advanced Analytics team is a large team of data scientists, data engineers, software developers and many more, that are focused on bringing data, machine learning and statistical modeling into the products that we build for our clients or internal users. The data scientists in WBAA furthermore have a strong desire to keep up with and be part of the latest developments in the fields of AI, tooling and statistics. Which they do by working closely together with master’s students on a variety of topics to solve academic yet practical problems.

Supervised machine learning relies on the availability of large quantities of labelled data, which is often unavailable in practice. Manual annotation is time consuming and costly, especially when domain experts are required.

The Data Programming paradigm combines multiple noisy labels to learn probabilistic labels that can be used to supervise machine learning models. Instead of manually labeling each example, here the annotator provides labeling functions (LF) that noisily label the data (for instance using domain heuristics, external knowledge bases).

Data Programming through the popular Snorkel framework has been widely used in industry (see However, the Snorkel approach has two unknown dependencies (the dependency structure of the weak labels as well as the class balance) that are of importance and not trivial to find, especially for unbalanced problems, such as fraud detection, or when highly correlated LFs are provided. The recently proposed End-to-End Weak Supervision claims to be robust in both situations, however more research is needed to understand how well this method can be applied in practice. During your thesis work you will experiment with, validate and improve on these methods.

Our team has extensive experience with weak supervision. A previous thesis internship at ING from the UvA led to the publication of a paper on the topic.

Are you a master’s student looking for a thesis project and are you interested in this one.

Do you furthermore

  • Have solid experience with Python?
  • Have machine learning experience?
  • Have solid skills in statistics and linear algebra (matrix rank, singular values, matrix decomposition, …)?
  • Get at least six months to do your thesis project?
  • Aim to go for a publication?
  • Bring good vibes to your fellow data scientists?

Then we offer a master thesis project, a compensation of 600 euros per month, close supervision, and a tight community of data scientists to interact with and learn from.

