lunduniversity.lu.se

Computer Science

Faculty of Engineering, LTH

Denna sida på svenska This page in English

EDAN20 – Language Technology

EDAN20 – Language Technology

This page is provisional and is being constantly updated

The first lecture will be held on September 2nd, 2024, 13-15.

The course has three pages:

  1. The official course page, https://cs.lth.se/edan20/;
  2. Canvas, used mainly to hand in assignments and manage communication with the participants;
  3. GitHub, where I store the labs descriptions https://github.com/pnugues/edan20 and the programs used in the course https://github.com/pnugues/pnlp.

The complete schedule for September/October 2024 is available here: https://cloud.timeedit.net/lu/web/lth1/ri1Q5006.html. Use the course schedule or lässchema links and enter EDAN20 to view the course schedule.

Course Delivery

This year, all the lecture classes or labs will be physical. We will not use Zoom (normaly).

Course Registration

You should formally register in the LADOK system, preferably before the course starts. You have the instructions here in Swedish and in English. If you fail to register, the department might register you anyway, if you attended the first lecture and meet all the requirements.

Lab Registration

You have to register to a laboratory group. Please do it here: https://sam.cs.lth.se/LabsSelectSession?occasionId=884

We will also use Discord for the lab sessions.

Objectives

The course introduces theories and techniques of natural language processing and language technology. It focuses on industrial or laboratory applications, such as document retrieval on the Internet, information extraction, or conversational agents. Fundamental algorithms will be described using Python.

Course contents

  • An Overview of Language Processing
  • Corpus Processing Tools
  • Encoding and Annotation Schemes
  • Topics in Information Theory and Machine Learning
  • Logistic Regression and Neural Networks
  • Counting and Indexing Words
  • Word Sequences
  • Dense Vector Representations
  • Words, Parts of Speech, and Morphology
  • Subword Segmentation
  • Part-of-Speech and Sequence Annotation
  • Self-Attention and Transformers
  • Pretraining an Encoder: The BERT Language Model
  • Sequence-to-Sequence Architectures: Encoder-Decoders and Decoders

Textbook

As textbook, I will use: Python for Natural Language Processing, 2024, Springer. It is available from Springer Link: https://link.springer.com/book/10.1007/978-3-031-57549-5.

You can access the book from the catalogue of Lund University Libraries: https://lubcat.lub.lu.se/. Once connected, you can download the PDF or read the chapters online.

Administrative details

Ges: Läsperiod HT1

Kontaktperson: Pierre Nugues

Förkunskapskrav: Se kursplanen.

OBS! Kursen ges på engelska

Kurswebb:http://cs.lth.se/edan20

Page Manager:

Facts about the course

EDAN20: Language Technology

Higher education credits:7.5

Grading scale:TH — (U, 3, 4, 5)

Level:A

Language of instruction:The course might be given in English

Course coordinator:Pierre Nugues

E-mail: Pierre.Nugues@cs.lth.se

Prerequisites:EDAA01 Programming — Second Course or EDA027 Algorithms and Data Structures

Admission specifics:

Assessment:Compulsory course items: Six assignments. Optional examination.

Home page:cs.lth.se/edan20

Further information/Transitional rules:Limited number of participants