Internship - Master Thesis Project in: Efficient Neural Models for Large-Scale Entity Matching

at ING
EUR 8,400 per year
INTERN
βœ… Hybrid
βœ… Visa Sponsorship

πŸ•™ 36 hours per week

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ?

Python @ 3 Statistics @ 3 GitHub @ 3 Machine Learning @ 3 Data Science @ 3 Communication @ 3 BI @ 3

Details

As the largest bank in the Netherlands, ING sees the majority of payments made by Dutch entities. Through billions of payments, millions of entities form a large network. For ING clients (accounts with ING) we have information, but for accounts at other banks that send or receive payments to/from ING accounts we generally do not. This project aims to use payment sequences to distill information about account holders.

Project description

This master thesis is a research project to investigate whether compact neural architectures can meaningfully outperform ING's current TF-IDF-based entity-matching system (the strongest production baseline) under realistic deployment constraints. The work covers the end-to-end research cycle: literature exploration, hypothesis formulation, and designing, implementing, and empirically evaluating lightweight model families such as cross-encoders, bi-encoders, and other efficient variants for short-text entity normalization. The student will analyze trade-offs between predictive performance, inference speed, memory footprint, and large-scale catalogue feasibility. The expected outcome is a scientifically grounded comparative analysis and a research-driven recommendation for a production-ready architecture balancing performance and efficiency.

Relevant link: https://github.com/ing-bank/EntityMatchingModel

The team

You will work within Wholesale Banking Advanced Analytics (WBAA) at ING β€” a team of data scientists, data engineers, and software developers that brings data, machine learning and statistical modeling into products. WBAA has experience supervising master students and has open-sourced solutions for this problem domain.

Responsibilities

  • Conduct in-depth literature review on entity matching and efficient neural architectures.
  • Formulate research hypotheses and design experiments.
  • Implement lightweight model families (cross-encoders, bi-encoders, and other efficient variants).
  • Empirically evaluate models on predictive performance, inference latency, memory footprint, and scalability.
  • Produce a comparative analysis and a production-oriented recommendation.
  • Optionally aim to produce a scientific publication.

Requirements

  • Be a master’s student (enrolled at a Dutch university, or EU-university for EU passport holders is mandatory during the internship).
  • Available for at least six months to perform the thesis project.
  • Solid experience with Python.
  • Machine learning experience.
  • Solid skills in statistics and linear algebra (matrix rank, singular values, matrix decomposition, …).
  • Interest in research and potential publication.
  • Good collaboration and communication within a data science team.

Benefits / Rewards

  • Compensation: 700 EUR per month (internship allowance) based on a 36 hours work week.
  • Internship allowance (700 EUR/month) and an internship position with close supervision.
  • Your own work laptop.
  • Hybrid working (blend of home working and office working).
  • Personal growth opportunities and informal working environment.
  • Many former interns move into permanent roles or trainee programs, though no guarantees.

Practical information

  • Location: CDR (Amsterdam - Cedar) β€” hybrid working.
  • Working hours: 36 hours per week (internship allowance is based on 36 hours/week).
  • Duration: at least six months (master's thesis project).
  • Mandatory: must be enrolled at a Dutch university (or EU-university for EU passport holders) during the internship.

How to apply / Contact

Contact the recruiter attached to the advertisement or apply directly by uploading your CV and motivation letter using the Apply button.