Andrei Popescu-Belis - Idiap Research Institute
|N.||Date||Morning (10:15-12:00, room ELE 111)||Afternoon (13:15-15:00, same room)|
|1||Sep. 22, 2016||Introduction
|I. OVERCOMING THE QUANTITY BARRIER|
||Run a classifier on newswire data (Reuters) – mainly as a test for getting working methods into place. extract features (possibly with tokenization, stemming, etc.). Run Weka. Evaluate the results.|
|--||Sep. 29, 2016
|No course.||No practical work.|
|2||Oct. 6, 2016||
||Experiments with Lucene: pre-process Reuters data, index it, search it with various queries.|
|3||Oct. 13, 2016||Beyond information retrieval (part 2)
Design a simple text-based just-in-time retrieval system (over Reuters or Wikipedia) for a text editing framework (using the Java document listener model provided) and Lucene (or even Google). The system suggests useful documents while the user types a text, such as an article or an email.
Note: practical work on text-based just-in-time document retrieval will be graded. A brief report (around 1-page) is due before Friday, October 28, 2016, 23:59 Lausanne time, by email.
|4||Oct. 20, 2016||
Deep learning for NLP: Word representation learning (by Nikolaos Pappas)
||Individual work on the optionally-graded practical exercise (TP), due Friday, October 28, 2016. This session will not be supervised.
Note: to send the query to Lucene (over the index created in Lesson 2), copy/adapt into DocumentEventDemo.java the code from SearchFiles.java, especially: declarations 020-038, initializations 090-092, then 100, 117, 152, 153, 177. Don't forget slides 44-47 of Lesson 3.
|II. OVERCOMING THE CROSS-LINGUAL BARRIER|
|5||Oct. 27, 2016||Introduction to machine translation
Generalities, history of MT, typology of rule-based systems, introduction to statistical systems and to MT evaluation.
|End of the TP on just-in-time retrieval, final questions and debugging. Reports due Friday, October 27. Optionally-graded means you can choose if your TP grade (20%) comes from this TP or from the upcoming one on MT.|
|6||Nov. 3, 2016||
Paper presentation by Trung PHAN: "Automatically building a stopword list for an information retrieval system", by Rachel Tsz-Wai Lo, Ben He, Iadh Ounis, Proceedings of the 5th Dutch-Belgium Information Retrieval Workshop (DIR'05), Utrecht, 2005.
Practical work: build your own SMT system.
The general goal of this series of practical sessions is to create a simple statistical MT system (e.g. EN/FR). See TP-7 on Nov. 7 for instructions.
|7||Nov. 10, 2016||
Paper presentation by Lesly MICULICICH on neural machine translation, including: "Neural Machine Translation by Jointly Learning to Align and Translate", by D. Bahdanau, K. Cho, Y. Bengio, Proceedings of the International Conference on Learning Representations (ICLR), 2015 (originally on Arxiv in 2014); and also: "Learning phrase representations using RNN encoder-decoder for statistical machine translation", by Cho, K., Van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y., Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.
Language modeling: a major component of statistical MT
Decoding with phrase-based translation models
|Practical work: phrase-based statistical MT with Moses Following the instructions on "Machine Translation practical work", train Moses to create a translation model on a small parallel corpus. The Moses system is pre-installed on a Virtual Box Image which will be distributed.|
|8||Nov. 17, 2016||
Presentation by Dhananjay RAM on Neural Network Language Models, based on three papers: "A Neural Probabilistic Language Model" by Y. Bengio et al. (JMLR 2003); "Recurrent neural network based language model" by Mikolov et al. (Interspeech 2010); and "LSTM Neural Networks for Language Modeling" by R. Sundermeyer et al. (Interspeech 2012).
Parameter tuning in phrase-based SMT
MT evaluation and applications
Texts for exercise: intuitive vs. analytic MT evaluation.
|Continuation of the practical work on statistical machine translation: building an operational SMT system, train and test in several conditions, and evaluate them comparatively.|
|III. OVERCOMING THE SUBJECTIVE BARRIER|
|--||Nov. 24, 2016||No course. This will be replaced by a lecture session on December 15 by Nikolaos Pappas.||Personal work (not in classroom): complete the optionally-graded practical work on statistical machine translation. Reports are due Friday, November 25, 2016. Optionally-graded means you can choose if your TP grade (20%) comes from this TP or from the previous one on just-in-time recommendation.|
|9||Dec. 1st, 2016||Paper presentation by Skanda Muralidhar on sentiment analysis of job interviews.
Introduction to sentiment analysis
|Exercise on classifying positive vs. negative reviews using lexical features (see slide 25).|
|--||Dec. 8, 2016||No course (holiday in Valais and at Idiap).||No practical work.|
|10 / 11||Dec. 15, 2016||
Deep learning for NLP: Multilingual Word Sequence Modeling by Nikolaos Pappas
Last part of the presentation by Nikolaos Pappas.
Advising on individual projects (two groups).
Details about final project and oral presentation
|12||Dec. 22, 2016||
Paper presentation by Wissam Halimi on human-computer dialogue in a block world (SHRDLU), based on
"A procedural model of language understanding"
by Terry Winograd. In Computer Models of Thought and Language, edited by R. C. Schank and K. M. Colby, San Francisco, CA: W. H. Freeman, p. 114-151, 1973.
Reprinted in Readings in natural language processing, edited by Barbara J. Grosz, Karen Sparck-Jones, and Bonnie Lynn Webber, San Francisco, CA: Morgan Kaufmann Publishers, p. 249-266, 1986.
Analysis of human interactions
Conclusion and synthesis on HLT research: defining a problem, building reference data, finding features for machine learning algorithms, training the algorithms, evaluating and analyzing the performance.
|Advising on individual projects (one group). See again the details about final examination.|