Internship - Master Thesis Project in: Efficient Neural Models for Large-Scale Entity Matching
π 36 hours per week
SCRAPED
Used Tools & Technologies
Not specified
Required Skills & Competences ?
Python @ 3 Statistics @ 3 GitHub @ 3 Machine Learning @ 3 Data Science @ 3 Communication @ 3 BI @ 3Details
As the largest bank in the Netherlands, ING sees the majority of payments made by Dutch entities. Through billions of payments, millions of entities form a large network. For ING clients (accounts with ING) we have information, but for accounts at other banks that send or receive payments to/from ING accounts we generally do not. This project aims to use payment sequences to distill information about account holders.
Project description
This master thesis is a research project to investigate whether compact neural architectures can meaningfully outperform ING's current TF-IDF-based entity-matching system (the strongest production baseline) under realistic deployment constraints. The work covers the end-to-end research cycle: literature exploration, hypothesis formulation, and designing, implementing, and empirically evaluating lightweight model families such as cross-encoders, bi-encoders, and other efficient variants for short-text entity normalization. The student will analyze trade-offs between predictive performance, inference speed, memory footprint, and large-scale catalogue feasibility. The expected outcome is a scientifically grounded comparative analysis and a research-driven recommendation for a production-ready architecture balancing performance and efficiency.
Relevant link: https://github.com/ing-bank/EntityMatchingModel
The team
You will work within Wholesale Banking Advanced Analytics (WBAA) at ING β a team of data scientists, data engineers, and software developers that brings data, machine learning and statistical modeling into products. WBAA has experience supervising master students and has open-sourced solutions for this problem domain.
Responsibilities
- Conduct in-depth literature review on entity matching and efficient neural architectures.
- Formulate research hypotheses and design experiments.
- Implement lightweight model families (cross-encoders, bi-encoders, and other efficient variants).
- Empirically evaluate models on predictive performance, inference latency, memory footprint, and scalability.
- Produce a comparative analysis and a production-oriented recommendation.
- Optionally aim to produce a scientific publication.
Requirements
- Be a masterβs student (enrolled at a Dutch university, or EU-university for EU passport holders is mandatory during the internship).
- Available for at least six months to perform the thesis project.
- Solid experience with Python.
- Machine learning experience.
- Solid skills in statistics and linear algebra (matrix rank, singular values, matrix decomposition, β¦).
- Interest in research and potential publication.
- Good collaboration and communication within a data science team.
Benefits / Rewards
- Compensation: 700 EUR per month (internship allowance) based on a 36 hours work week.
- Internship allowance (700 EUR/month) and an internship position with close supervision.
- Your own work laptop.
- Hybrid working (blend of home working and office working).
- Personal growth opportunities and informal working environment.
- Many former interns move into permanent roles or trainee programs, though no guarantees.
Practical information
- Location: CDR (Amsterdam - Cedar) β hybrid working.
- Working hours: 36 hours per week (internship allowance is based on 36 hours/week).
- Duration: at least six months (master's thesis project).
- Mandatory: must be enrolled at a Dutch university (or EU-university for EU passport holders) during the internship.
How to apply / Contact
Contact the recruiter attached to the advertisement or apply directly by uploading your CV and motivation letter using the Apply button.