lunduniversity.lu.se

Computer Science

Faculty of Engineering, LTH

2019 and later

CS MSc Thesis Presentation 14 March 2024

Föreläsning

From: 2024-03-14 13:15 to 14:00
Place: E:2405 (Glasburen)
Contact: birger [dot] swahn [at] cs [dot] lth [dot] se


One Computer Science MSc thesis to be presented on 14 March

Thursday, 14 March there will be a master thesis presentation in Computer Science at Lund University, Faculty of Engineering.

The presentation will take place in E:2405 (Glasburen).

Note to potential opponents: N. B. No more opponents for this presentation! Register as an opponent to the presentation of your choice by sending an email to the examiner for that presentation (firstname.lastname@cs.lth.se. Do not forget to specify the presentation you register for! Note that the number of opponents may be limited (often to two), so you might be forced to choose another presentation if you register too late. Registrations are individual, just as the oppositions are! More instructions are found on this page.)


13:15-14:00 in E:2405 (Glasburen)

Presenters: Fabian Sundholm, Adla Jebara
Title: Management of Training Data for Deep Learning Applications: Requirements and Solutions
Examiner: Emelie Engström
Supervisors: Lars Bendix (LTH), Fredrik Stål (Precise Biometrics)

In the realm of software development, extensive research has been conducted on source code management, but little to no attention has been given to managing associated data, such as the large volume of training data needed for the development of deep learning applications.

This thesis aims to investigate if there is a scalable solution for storing and managing training data used in different variants of machine learning models. This research includes identifying and formulating requirements for a training data management system, proposing design solutions to address these requirements, and finally, implementing a proof of concept.

The requirement specification was formulated through literature reviews and developer interviews. Design solutions were developed in alignment with the identified requirements and by exploring available tools. Thereafter, one of the two design solutions was chosen for implementation in a proof of concept.

The research findings include a comprehensive list of requirements, including key requirements such as versioning, scalability, traceability, and data lifecycle management. The proof of concept demonstrated that the proposed design solution did not fully meet the requirements, indicating a complexity in addressing the problem beyond initial expectations.

Due to time and resource constraints, a satisfactory full implementation of a proof of concept was not achieved. Moreover, a built solution meeting all the requirements to a satisfactory degree likely does not exist. Nevertheless, our research indicates that given additional time and resources, it is feasible to address the problem. Consequently, an interesting future work could be the development and implementation of such a solution.

Link to popular science summary: To be added