Master Thesis Project On Research On Confidence Scoring For LLM Answers

at ING

📍 Amsterdam, Netherlands

EUR 8,400 per year

INTERN

✅ Hybrid

✅ Visa Sponsorship

🕙 36 hours per week

SCRAPED

Used Tools & Technologies

Not specified

Required Skills & Competences ^?

Python @ 3 Statistics @ 3 Machine Learning @ 3 QA @ 3 LLM @ 3 ChatGPT @ 3

Details

We propose a Master thesis project on obtaining confidence scores for answers generated by Large Language Models (LLM), in particular for answer generation using Retrieval Augmented Generation (RAG) pipelines.

Generative AI models like LLMs generate answers with varying levels of accuracy and can produce factually inaccurate answers. Validation and evaluation of LLM results is therefore an important research area. At ING Wholesale Banking Analytics (WBA) we will research and develop techniques to calculate confidence scores for LLM answers provided to question prompts and apply these to real-world RAG use-cases in our department.

Project context

We use RAG pipelines that couple a search engine to an LLM to enable generative question-answering (QA) grounded in retrieved document passages.
RAG projects aim to automate extraction of information from unstructured documents: fixed question sets answered across many documents to produce structured summary datasets (e.g., automated form-filling).
The core problem: are generated answers reliable? LLMs typically do not return confidence scores and are not designed to do so; answers may be incorrect.
The proposal is to research and develop a reliable confidence score applicable in ING's data extraction projects using ground-truth datasets manually labelled by expert analysts.

Reference blog from the team: https://medium.com/p/c668844d52c8

Responsibilities

Study the latest research on confidence scoring for LLM outputs and on evaluation methodologies for generative QA.
Adapt and extend relevant methods for application in ING's RAG pipelines and document extraction use-cases.
Define experiments, build prototypes, and evaluate confidence scoring approaches against ground-truth labelled datasets.
Collaborate with data scientists, data engineers and subject matter experts in WBAA to integrate findings into practical pipelines.
Aim to produce publishable research results and a hands-on prototype.

Research considerations

The availability (or lack) of model internals such as network weights and next-token probabilities in commercial models (e.g., OpenAI/ChatGPT).
How to account for randomness in generated answers.
Multi-class answers: multiple answers may be correct for the same question (e.g., extracted from different document pages).

Requirements

Be a Master student (thesis candidate) enrolled at a Dutch university (or EU university for EU passport holders) for the internship duration.
Solid experience with Python.
Machine learning experience.
Solid skills in statistics and linear algebra (matrix rank, singular values, matrix decomposition, ...).
Availability for at least six months for the thesis project.
Aim to pursue a publication based on the work.
Ability to collaborate and contribute positively to a team of data scientists.

Team

The Wholesale Banking Advanced Analytics (WBAA) team includes data scientists, data engineers, software developers and others focused on bringing ML and statistical modeling into production products.
The team has experience supervising Master students and works closely with students on academic yet practical problems.

Benefits

Internship allowance of 700 EUR (based on a 36-hour work week).
Your own work laptop.
Hybrid working (blend of home and office working).
Personal growth, challenging work and an informal innovative environment.
Opportunity to work with highly skilled experts and potentially move into further opportunities at ING.

How to succeed / Application

ING values curiosity, continuous learning and responsibility.
To apply, upload your CV and motivation letter via the apply button or contact the recruiter attached to the advertisement.