EDAN20 – Language Technology
This page is provisional and is being constantly updated
The first lecture will be held on August 28, 2023, 13-15.
The course has three pages:
- The official course page, https://cs.lth.se/edan20/;
- Canvas, used mainly to hand in assignments and manage communication with the participants;
- GitHub, where I store the labs descriptions https://github.com/pnugues/edan20 and the programs used in the course https://github.com/pnugues/ilppp.
The complete schedule for August/September/October 2023 is available here: https://cloud.timeedit.net/lu/web/lth1/ri1Q5006.html. Use the course schedule or lässchema links and enter EDAN20 to view the course schedule.
Course Delivery
This year, all the lecture classes or labs will be physical. We will not use Zoom (normaly).
Course Registration
You need to attend the first lecture to be sure to keep your seat (or email the course leader before the course starts). You should formally register in the LADOK system, preferably before the course starts. You have the instructions here in Swedish and in English. If you fail to register, the department might register you anyway, if you attended the first lecture and meet all the requirements.
Lab Registration
You have to register to a laboratory group. Please do it here: https://sam.cs.lth.se/LabsSelectSession?occasionId=821
We will also use Discord for the lab sessions.
Objectives
The course introduces theories and techniques of natural language processing and language technology. It attempts to cover the whole field from speech recognition and synthesis to semantics and dialogue.
It focuses on industrial or laboratory applications, such as document retrieval on the Internet, information extraction, conversational agents, and verbal interaction in virtual worlds. Fundamental algorithms will be described using Python.
Course contents
- An overview of language processing: presentation of language processing, applications, disciplines of linguistic, examples
- Corpus and word processing: regular expressions, automata, an introduction to Python, concordances, tokenization, counting words, collocations
- Morphology and part-of-speech tagging: morphology, transducers, part-of-speech tagging,
- Prolog to write phrase-structure grammars: constituents, trees, using Prolog to do natural language analysis, DCG rules, variables, getting the syntactic structure, compositional analysis to get the semantic structure.
- Syntactic formalisms: constituency and dependency, chart parsing, statistical parsing, functions, dependency parsing.
- Semantics: formal semantics, lambda-calculus, compositionality: nouns, verbs, determiners, words and meaning, lexical semantics, case grammars, semantic grammars
- Discourse and dialogue: discourse and rhetoric, anaphora, structure, RST, dialogue: automata, pairs, speech acts, multimodality.
- Overview of speech synthesis and speech recognition
Textbook
As textbook, I will use:Language processing with Perl and Prolog, 2nd edition, 2014, Springer. It is available from Springer link: [html] [pdf], or in a paper version [html].
I started to write a 3rd edition with Python instead of Perl. Unfortunately, on August 15, 2016, I had a work accident at LTH: Workers demolished the window of my office while I was working and without warning me. Since then, I have a very debilitating tinnitus (ringing hears). This new edition will be considerably delayed (if I can ever publish it). I will nonetheless hand out a draft of the chapters I have written.
Students can also use the first edition from 2006,An Introduction to Language Processing with Perl and Prolog. The electronic version is available for free: [pdf]. You need to be logged from Lund University accounts to have a free copy. The paper version of the first edition costs 25 euros: [html].
Ges: Läsperiod HT1
Kontaktperson: Pierre Nugues
Förkunskapskrav: Se kursplanen.
OBS! Kursen ges på engelska
Kurswebb:http://cs.lth.se/edan20