Home  |  
Style  |  
Sitemap  |  
Svenska  |  
Lund University
 

Lectures

Lectures

Contents

^Chapter 1: An overview of language processing (04/09/2012)

^Chapter 2: Corpus processing tools (04/09/2012 and 06/09/2012)

^Chapter 3: Encoding, entropy, and annotation schemes (06/09/2012)

^Chapter 4: Counting words (11/09/2012)

^Chapter 5: Words, parts of speech, and morphology (13 and 20/09/2012)

^Chapter 6: Part-of-speech tagging using rules (13/09/2012)

  • Contents:
    • Part-of-speech tagging with symbolic rules
    • Annotation standards for parts of speech (tagsets)
  • The slides of this lecture [pdf].
  • Annotation manuals and corpora:
    • Manuals used by the Penn treebanks
    • BNC, the British national corpus, an annotated corpus in English following the text encoding initiative (TEI).
    • SUC, the Stockholm-Umeå corpus, an annotated corpus in Swedish
    • Negra, an annotated corpus in German
    • An inventory of available corpora compiled by a group at Stanford.
  • Software:

^Chapter 7: Part-of-speech tagging using stochastic techniques (20/09/2012)

  • Contents:
    • Stochastic tagging
    • Markov models
    • Tagging with decision trees
    • Application: Language models for machine translation
  • The slides of this lecture [pdf].
  • Demonstrations:
  • Software:

^Chapter 8: Phrase-structure grammars in Prolog (13/09/2012)

  • Contents:
    • Constituents, trees
    • Using Prolog to do natural language analysis, DCG rules, variables
    • Getting the syntactic structure
    • Compositional analysis to get the semantic structure
  • The slides of these lecture [pdf]
  • Prolog programs:
    • Two small DCG grammars [1] [2]
    • A tokenizer using Prolog clauses [3] and another one using DCG rules [4].
    • A small interpreter of regular expressions in Prolog by Robert Cameron [5].
  • Application examples:
    • The grammar checker in MS Word whose parser uses phrase-structure rules.
    • The natural language group at Microsoft Research.

^Chapter 9: Partial parsing (18/09/2012)

  • Contents:
    • ELIZA: word spotting and pattern matching
    • Multiwords and named entities
    • Noun groups and verb groups
    • Partial parsing: multiword and group detection in Prolog
    • Partial parsing: statistical techniques
    • Information extraction
    • Precision, recall, and F-measure (harmonic mean)
  • The slides of this lecture [pdf]
  • Prolog programs:
    • Prolog predicates to write local DCG grammars with simple noun group and verb group rules [1].
  • Documents:
    • Many interesting papers on partial parsing by Steven Abney;
    • An application example of information extraction: the FASTUS system from SRI.
    • Carsim, a system to generate animated 3D scenes from text that uses information extraction techniques.
  • Annotated corpora and evaluation resources:
  • Demonstrations:
  • Software:
  • Annotation resources:
    • The MUC site;
    • PEAS, a group annotation scheme for French
    • TüPP-D/Z, Tübingen Partially Parsed Corpus of Written German

^Chapter 10: Syntactic formalisms (25/09/2012)

^Chapter 11: Parsing techniques (25/09 and 02/10 2012)

  • Contents:
    • Top-down and bottom-up strategies
    • The shift-reduce algorithm
    • Earley's algorithm
    • Statistical parsing and PCFG
    • Dependency parsing
    • Nivre's parser
  • The slides of this lecture [pdf]
  • Prolog programs:
    • A shift-reduce parser [1]
    • Earley's parser [2]
    • Joakim Nivre's dependency parser [3].
    • Updates to the book:
      • Nivre's parser to parse an annotated corpus (gold standard parsing) [4] and an improved version of Nivre's parser [5].
      • Utilities to parse a CoNLL 2006 or 2007 corpus [6] [7] [8].
      • The Swedish corpus used in CoNLL 2006 and formatted as a Prolog clause. Training set [9] and test set [10].
  • Corpus resources:
    • Four freely available annotated dependency corpora, Danish, Dutch, Portuguese, and Swedish, and links to 7 others from the CoNLL-X shared task. Seven other corpora with the same annotation, Basque, Catalan, Chinese, Greek, Hungarian, Italian, and Turkish, from the CoNLL 2007 shared task.
    • The Susanne corpus, a free treebank for English
    • A French treebank from Université Paris VII (Available with a license)
  • Parsers resources:
  • On-line parsers:

^Chapter 12: Semantics and predicate logic (02/10/2012)

^Chapter 13: Lexical semantics (09/10/2012)

  • Contents:
    • Words and meaning
    • Lexical semantics
    • Lexical networks
    • Word sense disambiguation
    • Case grammars
    • Frame semantics and semantic roles
    • Semantic grammars
  • The slides of this lecture [pdf]. Anders Björkelund's presentation of his thesis on semantic role labeling [pdf].
  • Resources:
    • Lexical databases:
    • Sense identification:
      • Freely available sense tagged corpora available from the Senseval 3 evaluation task (tasks 01--11)
      • SemCor, the Brown corpus tagged with Wordnet senses. This was originally done at Princeton with WordNet 1.6. In the meantime, WordNet people reorganized the sense nomenclature. The different corpora are mappings according to WordNet sense versions
    • Semantic role labeling:
    • Semantic role labeling software:
      • A demonstration of the LTH semantic parser and its source code. (CoNLL 2009 version).
      • The LTH semantic parser code with Propbank and Nombank predicates from Richard Johansson (CoNLL 2008 version).
      • The LTH semantic parser with the Framenet paradigm from Richard Johansson.
      • The ASSERT Automatic Statistical SEmantic Role Tagger from Sameer Pradhan.
      • Semantic role labeling by the University of Illinois at Urbana-Champaign.
      • TextRunner, a system to extract predicate--argument strcutures from web pages.
      • The Senna semantic role-labeling tool from the NEC Laboratories America.
  • Application examples: EVAR, Nautilus.

^Chapter 14: Discourse (09/10/2012)

  • Contents:
    • Discourse definition,
    • Discourse entities
    • Reference and anaphora
    • Rhetorical structure theory (RST)
    • Parsing a text
    • Machine learning to discover RST relations
    • TimeML
  • The slides of this lecture [pdf]
  • Annotation and evaluation resources:
    • The coreference annotation manual used in MUC-7 by Hirschman and Chinchor.
    • A paper on coreference evaluation by Vilain et al. (1995).
    • An annotation manual for Rhetorical structure theory from the University of Southern California's Information Sciences Institute.
    • Another annotation manual for the Penn Discourse Treebank.
    • TimeML, markup language for temporal and event expressions.
  • Corpus resources:

^Chapter 15: Dialogue (16/10/2012)

  • Contents:
    • Dialogue automata
    • Pairs
    • Speech acts
    • Speech act recognition
  • The slides of this lecture [pdf]
  • Resources:
    • DAMSL, Dialogue markup scheme from the University of Rochester.
    • Dialogue acts in Verbmobil and Verbmobil-2 [1] [2].
    • The TRAINS corpus and annotated files from the University of Rochester.
  • VoiceXML, a markup framework to develop dialogue applications:
  • Application examples:
    • TRAINS, TRIPS.
    • A train information system in Swedish from SJ. Call 0046 771-75-75-75.
    • A paper by Johan Boye, Mats Wirén, Manny Rayner, Ian Lewin, David Carter, and Ralph Becket, "Language-Processing Strategies and Mixed-Initiative Dialogues", IJCAI-99 Workshop on Knowledge and Reasoning in Practical Dialogue Systems, July 1999.

^Complement: Speech synthesis (16/10/2012)

^Complement: Speech recognition (16/10/2012)

Page Manager: Peter Möller
Webmaster: webmaster@lth.se
Last updated: 2012-10-02