Human Language Technology:

Applications to Information Access

 

EPFL Doctoral Course EE-724, Electrical Engineering Doctoral School (EDEE), autumn 2016

 

Lecturer: Andrei Popescu-Belis, Idiap Research Institute

 

Time and place

Thursdays 10:15-12:00 and 13:15-15:00, from September 22, 2016 to December 22, 2016, in room ELE 111 (both for the morning course and the afternoon practical session).

Detailed schedule here:

small schedule snapshot

Overview

This course introduces recent applications of human language technology (HLT), presenting the basic knowledge required to implement them, with an overview of possible alternatives, evaluation methods (including notions about evaluation campaigns), and challenges or limits of the state of the art. The technologies focus on the problem of accessing text-based information across three main types of barriers: the quantity barrier (accessing information in very large repositories), the crosslingual barrier (accessing information across languages through machine translation), and the subjective barrier (accessing information that is enclosed in complex human interactions). The following technologies will be studied for each barrier to information access.

  1. The quantity barrier: document classification, information retrieval, learning to rank, recommender systems, query-free retrieval, question answering.
  2. The crosslingual barrier: machine translation (history of the field, presentation of rule-based and of statistical systems including phrase-based and tree-based ones, domain adaptation, the use of syntax and semantics), language models, methods for text alignment, issues and metrics for MT evaluation.
  3. The subjective barrier: sentiment analysis, subjectivity detection, analysis of human exchanges (spoken or written) for information access, search within multimedia archives.
  4. Conclusion on the bases of HLT research: defining a problem, building reference data, finding features for machine learning algorithms, training the algorithms, evaluating and analyzing the performance.

The course includes lectures (2h) followed by laboratory exercises (2h) using freely-available software and language resources (on each student's personal computer) to perform some of the tasks introduced in the course and to illustrate the properties of one or several presented algorithms. The exercises will serve as starting points for the individual projects (graded based on report and oral defense at the end of the semester in January 2017), on a topic to be chosen in agreement with the lecturer. Once in the semester students will present a scientific article, and one laboratory exercise will be graded.

Keywords: human language technology, language engineering, information retrieval, machine translation.

Required prior knowledge: at least one prior course in statistics, machine learning, computational linguistics, or artificial intelligence. Programming proficiency in a language such as Perl or Java.

Language: English

Form of examination: project report with oral presentation in the exam session of January 2017 (in addition, one paper presentation and one practical work will contribute to the grade)


Valid HTML 4.01!       Last modified: August 31, 2014