Master's project proposals
If you are interested in ex-jobb proposals in the area of natural language processing or semantic processing, please contact Pierre Nugues.
November 2011
Subject: Coreference solving.
Description:
We propose a Master's project open to LTH students in computer science to improve a coreference solver and make it multilingual.
Coreference solving consists in finding sets of mentions in a text, where one set relates to one single entity. Large scale coreference solving has become an integral part of the semantic analysis of text and the identification of entities across documents. Recent evaluations such as in CoNLL 2011 for English have enabled to assess the current state of the art in an impartial way and to spur innovation through the competition it entails. For details see here: http://conll.bbn.com/. At LTH, we have developed a high-performance solver in the context of CoNLL 2011.
The project will start form the existing code and will be twofold:
- The first task will be to assemble components to build an end-to-end analysis from text to entity resolution. The CoNLL task is designed so that a part of the data is given in a preprocessed form: the corpus is tokenized and parsed. In its current form, our analyzer uses the preprocessed data. The project will assemble components to parse a raw text and submit it as input to the coreference solver. The system will be accessible through a web interface.
- The second task will be to improve the performance of the coreference solver and make it multilingual. A solver can consist of a variety of modules to carry out the entity search and check the entity chain consistency. We believe the current version of our solver as a good module to search entities. However, it has no chain consistency checking. A part of the project will be dedicated to the implementation of algorithms to complement and enhance the quality of the resolution.
The Master's thesis will possibly be tied to the next CoNLL 2012 which will deal about coreference solving for English, Chinese, and Arabic.
To carry out the project successfully, the candidate will need very good programming aptitudes as well as a good understanding of natural language processing and some familiarity with machine learning techniques.
December 2010: We have two proposals from Oribi: