EDAN20 – Language Technology
This page is provisional and is being constantly updated
The first lecture will be held on September 2nd, 2024, 13-15.
The course has three pages:
- The official course page, https://cs.lth.se/edan20/;
- Canvas, used mainly to hand in assignments and manage communication with the participants;
- GitHub, where I store the labs descriptions https://github.com/pnugues/edan20 and the programs used in the course https://github.com/pnugues/pnlp.
The complete schedule for September/October 2024 is available here: https://cloud.timeedit.net/lu/web/lth1/ri1Q5006.html. Use the course schedule or lässchema links and enter EDAN20 to view the course schedule.
Course Delivery
This year, all the lecture classes or labs will be physical. We will not use Zoom (normaly).
Course Registration
You should formally register in the LADOK system, preferably before the course starts. You have the instructions here in Swedish and in English. If you fail to register, the department might register you anyway, if you attended the first lecture and meet all the requirements.
Lab Registration
You have to register to a laboratory group. Please do it here: https://sam.cs.lth.se/LabsSelectSession?occasionId=884
We will also use Discord for the lab sessions.
Objectives
The course introduces theories and techniques of natural language processing and language technology. It focuses on industrial or laboratory applications, such as document retrieval on the Internet, information extraction, or conversational agents. Fundamental algorithms will be described using Python.
Course contents
- An Overview of Language Processing
- Corpus Processing Tools
- Encoding and Annotation Schemes
- Topics in Information Theory and Machine Learning
- Logistic Regression and Neural Networks
- Counting and Indexing Words
- Word Sequences
- Dense Vector Representations
- Words, Parts of Speech, and Morphology
- Subword Segmentation
- Part-of-Speech and Sequence Annotation
- Self-Attention and Transformers
- Pretraining an Encoder: The BERT Language Model
- Sequence-to-Sequence Architectures: Encoder-Decoders and Decoders
Textbook
As textbook, I will use: Python for Natural Language Processing, 2024, Springer. It is available from Springer Link: https://link.springer.com/book/10.1007/978-3-031-57549-5.
You can access the book from the catalogue of Lund University Libraries: https://lubcat.lub.lu.se/. Once connected, you can download the PDF or read the chapters online.
Administrative details
Ges: Läsperiod HT1
Kontaktperson: Pierre Nugues
Förkunskapskrav: Se kursplanen.
OBS! Kursen ges på engelska
Kurswebb:http://cs.lth.se/edan20