My research area is within natural language processing and I am currently mainly concerned with semantic processing and applications of it. Semantic parsing is a recent achievement. Together with Richard Johansson, we participated in the CoNLL 2008 shared task on the joint parsing of syntactic and semantic dependencies and the SemEval-2007 task on Frame-semantic Structure Extraction. In both evaluations, our systems obtained the best results. We also participated in CoNLL 2009, where we were ranked second on seven languages (first in Chinese and German). We have a home page here that lists our activities and links to software we have developed.
I am the author of textbook on natural language processing. It was published by Springer and it is now on its second printing. I maintain a page that provides free chapters, slides, and quantities of links on many aspects of the field. See here.
I am also involved in the ROSETTA project, where I participate in the construction of a knowledge integration framework to represent knowledge and skills in robotized production. Before I was with the SMErobot to build ontologies to generate user interfaces. More generally, I am interested in the use of NLP techniques in advanced interfaces. This includes the design and the implementation of conversational agents within a multimodal framework and text visualization. The Carsim system is a former project that exemplifies it.
Carsim's objective is to analyze car accident reports and to recreate them visually in a three-dimensional space. Carsim tries to reconstruct -- to imagine -- the written text through a symbolic animation. We started to work with the results of a deep semantic analysis of the events contained in the report. Here are some examples of what we obtained: collision and overtaking , , . We then used information extraction techniques that enable us to visualize nearly 35% of the reports in a corpus of French texts. The Swedish version of Carsim is available online. You can try it here.
Before Carsim, with a small group of students, we designed and implemented a conversational agent -- Ulysse -- that enables a user to navigate in a virtual reality environment using language. Ulysse parses the word stream and builds a logical form from it. It features a chart parser and a semantic module. Ulysse has a reference solver to map words to entities of the virtual world and a geometric reasoner. Ulysse is embedded into the virtual representation of the user in the world and it animates using planning rules. It then navigates this user into the world. Ulysse has been designed with the DIVE environment from the SICS. It can be interfaced with a speech recognition system such as IBM ViaVoice for instance. In the beginning, the whole environment was intended to be a teleconferencing tool and the agent, a supporting device.
Finally, more generally I am also interested in cognitive links between language, visualization, and psychology. I took part in two European community projects, VREPAR-II and VEPSY, to use virtual reality in clinical psychology where I led the French group on social phobias. VESPY won an honorable mention from the eEurope award program .
^Documents and Source Code
- You can download two semantic parsers that can handle both the Propbank and Framenet formats from this page.
- See the project description and online demonstration available from the Institute of Romance languages at Lund university.
- The Carsim demonstration online.
- Carsim was featured in an article of the New Scientist and another article of Technology and Research News.
Carsim Old Stuff
- The MAIF corpus of car accident reports in French;
- Our first car and Fabrice Tabordet's report;
- The formal description of some accidents: [A1], [A4], [A14], [A15], using the first and now outdated formalism grammar;
- The visualizer written in Java 3D and Arjan Egges' technical report . To run this program, you must have the Java 1.4 SDK, Java 3D, and the vrml97.jar loader.
- Bastian Schulz extended the visualizer, which now uses XML templates, and described it in a technical report . The new version of the visualizer is not compatible with the old template formalism. Both Arjan and Bastian's versions are out of date. We are looking to make the new version available.
- The first information extraction system in Prolog and the report by Sylvain Dupuy and Vincent Legendre. To run this program, you must have SWI Prolog (or any compatible Prolog).
- An improvement of the information extraction system and the report by Simon Le Gloannec, Pierre Aubeuf, and Cédric Métais. To run this program, you must have SWI Prolog (or any compatible Prolog).
- An old Carsim information extraction prototype
- A draft in English is available which gives the rationale behind the development of Ulysse. Some other documents in French and in English describe the current implementation of our prototype in more detail.
- Source code of the Ulysse conversational agent has been written in C and C++. It's available in two sets, both in French: the parser and the rest: semantics, motion, etc.
- The University of Caen shot a video of Ulysse. It uses the MPEG4 standard. You may need a codec to view it, for instance that of DivX. Transcripts of the dialogue and parsing are available here.
- We have ported parts of it in VRML, Prolog, and Java. VRML Ulysse is far from being completed. Compared to the first version, VRML Ulysse is only a skeleton. An prototype of the port is available to download.
- With VRML Ulysse, Frédéric Hubert, Master student at the University of Caen, implemented a kinematical definition of verb sauter 'jump'.
^Two Three-Dimensional Models
Click on the images to get the VRML worlds.
The Ithaque world of the Ulysse project
The car model of the Carsim project
^PhD and Master's Theses
- Peter Exner
- Håkan Jonsson
- Dennis Medved
- Karl Ekerot and Fredrik Appelros
- Richard Johansson, Lunds tekniska högskola, Lunds universitet, November 2003--December 2008. Lic. thesis [pdf] PhD thesis [pdf].
- Dominique Dutoit, Quelques opérations texte-->sens et texte-->sens-->texte utilisant une sémantique universaliste apriorique , soutenue le 30 novembre 2000 à l'université de Caen. [doc].
- Christophe Godéreaux, Un modèle d'agent conversationnel pour naviguer dans les mondes virtuels, soutenue le 7 janvier 1997 à l'université de Caen. [doc] [pdf] [zip].
- Pierre-Olivier El Guedj, Analyse syntaxique par charts combinant règles de dépendance et règles syntagmatiques , soutenue le 10 septembre 1996 à l'université de Caen. [doc] [pdf] [zip].
- Ola Åkerberg and Hans Svensson, Examensarbete, Lunds tekniska högskola, 2002. [doc] [pdf].
- Per Andersson, Examensarbete, Lunds tekniska högskola, 2003. [pdf].
- Christofer Bach and Johan Gunnarsson, Extraction of Trends in SMS text, MSc. Thesis, Lunds universitet, LTH, June 2010. [pdf]
- Loïc Baumard, ISMRA student project, 1998.
- Olivier Bersot, Diplôme d'ingénieur and DEA, ISMRA--Université de Caen, 1996.
- Anders Berglund, Examensarbete, Lunds tekniska högskola , 2004. [pdf].
- Raphaël Berthelon, ISMRA student project, 1998.
- Marc Bittar, ISMRA student project, 1998.
- Anders Björkelund and Love Hafdell, High-performance multilingual semantic role labeling, MSc. Thesis, Lunds universitet, NatFak, May 2009. [pdf] [slides]
- Lilian Blot, Maîtrise d'informatique, Université de Caen.
- Magnus Danielsson, Maskininlärningsbaserad koreferensbestämning för nominalfraser applicerat på svenska texter , Lunds universitet, NatFak, MSc. Thesis, januari 2005. [pdf].
- Korinna Diebel, Diplomarbeit, FH Konstanz, 1994, Erasmus scholarship.
- Sylvain Dupuy, Stage de Diplôme d'ingénieur, ISMRA, 2001.
- Arjan Egges, Computer science studies, University of Twente, Holland, 2000. (Arjan has a new page).
- Tobias Ek and Camilla Kirkegaard, Named Entity Extraction from Text Messages, MSc. Thesis, Lunds universitet, 2010. [pdf]
- Jonas Ekedahl, Examensarbete, Lunds universitet, Examensarbete, Lunds tekniska högskola, 2008. [pdf].
- José Esteve, Universidad Politécnica de Valencia , Spain, 2000.
- Peter Exner, Constructing Large Proposition Databases, MSc. Thesis, Lunds universitet, LTH, 2011. [pdf]
- Sebastian Ganslandt and Jakob Jörwall, Context-aware predictive text entry for Swedish using semantics and syntax, MSc. Thesis, Lunds universitet, January 2009. [pdf]
- Manuel García, Universidad Politécnica de Valencia, Escuela Técnica Superior de Ingenieros de telecomunicación, Espagne, 1999.
- Patrik Hansson, Lunds universitet, Naturvetenskapliga fakulteten, MSc. Thesis, March 2008. [pdf].
- Magnus Höjer, Examensarbete, Lunds universitet, December 2008. [pdf]
- Frédéric Hubert, DEA project, Université de Caen, 1999. [html] [doc].
- Yannick Jullien, post-doc from the Université de Caen;
- Fabian Kostadinov, Direkt Profil -- Implementation of a text critiquing system for non-native students of French . Diplomarbeit, Universität Zürich. April 2005. [pdf ]
- Simon Lindholm, A speech recognition system for Swedish running on Android, MSc. Thesis, Lunds universitet, LTH, June 2010. [pdf] [slides]
- Matthias Ludwig, Diplomarbeit, FH Weingarten, Office Franco-Allemand de la Jeunesse scholarship, 1995.
- Alejandro Machado. Recognizing artist ambiguity with machine learning techniques, Msc. thesis, Lunds universitet, LTH, 2012. [pdf]
- Dennis Medved. Combining Text Semantics and Image Geometry to Identify Relations, Msc. thesis, Lunds universitet, LTH, 2012. [pdf]
- Peter Nilsson, An experimental study of Nivre's parser, MSc. Thesis, Lunds universitet, NatFak, May 2009. [pdf]
- Jacob Persson, Examensarbete, Lunds universitet,NatFak, September 2008. [pdf].
- Lisa Persson, Lunds universitet, NatFak, MSc. Thesis, augusti 2004. [pdf].
- Jimmy Pettersson. Contextual text classification for parametric mappings in a distributed search engine, Msc. thesis, Lunds universitet, LTH, 2011. [pdf]
- Niclas Reisnert, Taligenkänning i en flygledningssimulator-miljö, MSc. Thesis, Lunds universitet, NatFak, May 2009. [pdf]
- Frédéric Revolta, Diplôme d'ingénieur and DEA, Université de Caen--ISMRA, 1995. (Frédéric has a new page).
- Andreas Salomonsson. Entity-based information retrieval, Msc. thesis, Lunds universitet, LTH, 2012. [pdf]
- Bastian Schulz, Studienarbeit, TU Hamburg-Harburg, 2002. [doc] [pdf].
- Christopher Scott, BSc University of Nottingham, Erasmus exchange student, 1999.
- Mitch Selander and Erik Svensson, Predictive text input engine for Indic scripts, MSc. Thesis, Lunds universitet, LTH, March 2009. [pdf]
- Marcus Stamborg, Statistical coreference resolving in a multi-language domain, Msc. thesis, Lunds universitet, LTH, 2012. [pdf]
- Maj Stenmark, Integration of semantic knowledge to enable re-use of robot programs, MSc. Thesis, Lunds universitet, LTH, 2011. [pdf]
- Rasmus Sundberg and Anders Eriksson, Visualizing Sentiment Analysis on a User Forum, MSc. Thesis, Lunds universitet, LTH, 2011. [pdf]
- Fabrice Tabordet, DEA, Université de Caen, 1997. (Fabrice has a new page).
- Dan Thorin, Examensarbete, Lunds universitet, 2008. [pdf].
- Jonas Thulin, Machine learning-based classifiers in the Direkt Profil grammatical profiling system. Lunds universitet, LTH, MSc. Thesis, januari 2007. [pdf]
- David Williams, Examensarbete, Lunds tekniska högskola, 2004. [pdf].