Wednesday, 26 September 2007

Summary of Elsnet Summer School 2007: Part III - Student Presentations

It is over two months late, but, what the hell, I do think it is still interesting and finally all of us got back from our summer vacations, Interspeech, etc, so here goes Pt 3 of the ESS07 Review: Student Presentations.

I regret not having done it before, as now some memories start to fade (and not having the presentation slides make it even harder to remember- Ian, help!?;)). If you are one of my colleagues and I fail to give due detail to your presentation, I assure you that this is not necessarily a reflex of a lack of interest or consideration of my part, but only the indication of a faulty memory. I invite you to add or expand the bit on your presentation. Also, please let me know if I misunderstood anything.

Most days we had the last section of our workday filled with student presentations. Each student had about 15 min to present, followed by 5 min of questions. Most of us were first-year PhD students, but we had also senior researchers such as Dietmar and Tanel.

David Del Vale Agudo presented on a multi-agent architecture for Dialogue Systems that used FIPA's ACL for communication.

Alvaro and Beatriz are the people who made the basis of the interface currently being used by Telefonica in the Companions Project. Small world, uh? Alvaro presented on biometry and voice recognition and Beatriz on the influence of ECAS upon the users perceptions. Or was it the other way around?(yes, they are a couple and always together, so I end up indexing all my tech discussions to the couple instead of its constituting individuals)

Trung presented on PODMP and so did Chandramohan. I found Trung’s work somewhat similar to Steve Young’s.

Pietra Maria presented on multi-party dialogue history.

Francesco Nesta (University of Trento/Italy) presented his work on the identification and separation of the user voice in a dialog system under usual ambient noise, such as television, radio,etc. To be more precise, a system in a smart home environment to identify the audio source of commands with all the background noise commonly found in a house. Even in the initial stages his results were impressive – I will definitely watch it out and try to bring some of it into both my thesis(after all games are noisy by nature – just consider the kids yelling!) and my work(a Companion should be with the user at all times and not get clueless if the user turns on the radio).

Timo Balman made a presentation on speech processing and prosody. He seemed to be heading towards an alternative but synthesizing approach to the field. Quite ambitious undertaking, but, alas, it is refreshing to see someone trying something bold now and then.

Weiwei presented not on his PhD research, but gave an introduction to the Companions project, which is part of his ongoing work.

I presented my preliminary results of the investigation of the intersection of dialogue systems and computer games. Basically in this first iteration I am trying to answer to what extend computer games constitute a particular domain, to discover the opportunities and challenges games present to the advancement of the research on dialogue systems, and what new features, game modes and benefits dialogue systems might bring to computer games. You can take a peek at the presentation and then check the articles on AI Wisdom 4 if you got excited about it!

The three Portuguese, Ana, Pedro and Filipe, jointly presented on a dialog system(L2F Butler?) that used Communicator and had a multi-agent realization.

Tanel presented on an Estonian dialogue system that was deployed at several contexts – I remember particularly the bits on ticket reservation. Particularly interesting bits were the techniques to deal with the challenges of the Estonian language itself(which seemed to me to have strong commonalities with Finish, Swedish and Russian).

Sheyla McCarthy presented on *Companions for the Elderly*. She did exactly he background and focus group studies that AFAIK no one has done in Companions yet- interviews with the elderly, design constraints based on their cognitive performance, social aspects to take into account in the dialogue system and, which I found most important from a pragmatic point of view, the acceptance of such a system by the elderly. She used a PDA-like device for the first test, and found out that most people found it difficult operate, though most were quite interested in the services her demonstrator provided. HCI and making it easy to use were pointed as major factors for the success of her research, which I fully agree. I guess she will receive an invite for a talk soon, from the Computer Science Department of the University of Sheffield!

Dietmar has shown his ongoing work on using the questions asked by students in an automatic dialogue system as a measure of the very student performance and understanding.

Daniel Schulman has shown research related to Companions too – how to make people feel that the dialogue system empathized with them, on a long term interaction.

Yun Jin(South Korea) has shown a telephone-based DS from the telecom sector. It was nice to see the focus on the whole system and the crucial roles the non-dialogue parts(persistence, query processing, load balancing) played in the deployed solution.

Theodora Kolouri[1](aka Lela) has shown her preliminary investigation on dialogue by/with robots. I found it interesting that her research seemed to be headed towards grounded dialogue – using the sensory-motor primitives to inform/drive/””reference resolve” dialogue elements. Finally perhaps situated cognition meets dialogue systems?

Vladmir Popescu presented a fairly complex and complete model for language generation that took into account pragmatics. I found at the time that it was almost too complex, but well, pragmatics is complex intrinsically. As it was for French and exploited many syntactic features, I could not quite follow all the examples, as mon francais ces’t terrible!

Cristine(always remember to say the final “e” of her name) presented the evaluation of a dialogue system in a domotic scenario.

Student presentations were a quite important part of the Summer School, as we were able to get feedback from the professors and senior researchers in the field, as well as our student colleagues.

I should point out that I found particularly fruitful the questions that Jim Larson himself called “Jim Larson’s question 1 and 2”. They were (my paraphrase) “How will you know that your system is working as expected?” and “How will this result contribute to the advancement of the field”. These standard and seemingly easy questions caught some of us off guard, and I see the avoidance in tackling them as a major treat to our field. If we do not know beforehand how we will verify our results, how can we possibly make any claim that is not, well...just a wild guess? The second one is what in a sense separates relevant research from academic mumbo-jumbo. If we forget to think about how our work contributes to the advancement of the field or the solutions to the problems our peers are trying to address, we run the risk of doing a research project that is of little use besides allowing us to go to conferences and be addressed by Dr…

Student presentation sections were chaired by Mike McTear, Ian O’Neil, Ramon Lopez-Cozar, Sebastian Moeller and Wolfgang Minker, all of whom did a great job on steering the presentations, providing their comments and coordinating the participation of the attendees. Many thanks to all of them, as well as to our fellow presenters!



[1] Lela, a friend of mine is about to name her soon-to-be-born daughter Theodora. When you consider that this is in Brazil, which had very little Greek immigration and has very little ties with Greece, that is remarkable! No wonder I would rather use the full Theodora in place of Lela, unless you asked me otherwise!

Thursday, 2 August 2007

Summary of Elsnet Summer School 2007: Part II - The Lectures

The lecture program was well-balanced between theoretical and hands-on courses.

Minker’s lectures on the current themes in spoke dialogue technology research provided also a review of the state of the art-on each area. His grad students presentations, Pietra-Maria an Alex, stood out as perhaps the best-overall lectures during the whole school. Pietra Maria presented on multi-party dialogue and Alex on speech/dialogue processing in mobile environments. No wonder I will keep a close eye at those two, as we might venture into multi-party at a later phase in Companions and the distributed solutions to speech-processing that Alex presented are of immediate relevance to the future prospect of embedded companions.

James Larson was an excellent addition to the program, providing an interesting departure from academic speculation and prototyping to a practical approach on the construction of commercial speech applications. His course focused on voice VoiceXML and related technologies for the construction of simple task-oriented systems. As I took the practical lab course too, I had the theoretical underpinnings in the morning and the opportunity to apply it in practice a few hours later in the afternoon. Most colleagues who used this course combination enjoyed the course.

To me VoiceXML and related technologies seemed too primitive and constraining to be of direct, immediate applicability to advanced research on dialogue systems (for instance, only VoiceXML 3.0 supports multiple hypothesis for the result of the ASR). However they are a very good solution for rapid prototyping of speech applications and the creation of speech-enabled web interfaces. If we are to develop a simple directed FSM-based system, why bother with Communicator? A suggestion of Larson’s that I believe is one of the most important for us as researchers is trying to use as much of the current W3 standards as possible, and, if needed, extending them, instead of re-inventing the wheel. This would allow us to not only cut the research time, but also to make our work have greater impact, as extensions of the standards could be potentially adopted. Also, conforming to a known standard, reuse and applicability to non-research situations would be made easier.

Jokinen’s course was concurrent with Larson’s, so I managed to just glimpse 10 minutes of her lectures and then go over the slides. It was quite reassuring to know that agents and multi-agent systems were one of the main approaches in Finland (at least in Helsinky and Tampere) and quite popular in Germany. Her slides centered around the issue of adaptation under different views (interaction, modularization, communication), which is one of the main strengths of multi-agent systems, if well-designed.

In the second week, Lopez-Cozar’s course focused not so much on general themes on dialogue research, but on the practical issues one face when building multimodal and multilingual dialogue systems. Special attention was paid to architectures and techniques, which suited me quite well. I had skimmed over his book before the course, but still I could benefit from many new insights, the colleagues’ questions and some interesting videos during the lectures. Minker’s and Lopez-Cozar lectures together provided a crash course on multimodal dialogue systems, from the current research challenges to the techniques at our disposal to face them. Many congratulations to Ian, McTear et al for the carefully thought combination!

Baggia’s course focused mostly on advanced TTS and speech grammars, building upon what Larson had done. We covered also some ECMAScript, AJAX and CCXML, with a focus on putting all together in a working system. We experimented with Loquendo’s VoiceNauta platform to carry out some experiments with the advanced TTS features and revisited some other aspects of the W3C speech interface languages, such as SRGS, SISR and VoiceXML. One interesting bit was a TTS system that would use the most approximate phoneme to speak a foreign word, as in the the phrase “Le presento Timothy Wilcock”. The net result was very similar to a genuine strong Italian accent while saying the name. This could be a terrific asset for constructing characters for entertainment and interactive storytelling applications. We had lots of practical tips, such as decreasing the confidence threshold to enable the apps to function with non-native speakers. The only setback on the course was that we saw too many advanced features at once, and sometimes I got lost in the flood of new info. Anyway, it was a productive finishing for our VoiceXML marathon.

During the afternoon of the second week I divided my time between Moeler’s, Pelachaud’s and Baggia' s extra class on language modeling, but mostly continued to delve into the voice XML stuff. Thus, the account on these lectures will be somewhat shorter and less well-informed.

Pelachaud’s course combined lectures and videos, but I only took the practical class, where we used a toolkit to modify the GRETA ECA. Having a background on ECAs from the master’s era and using the past slides, I could follow well alongside the colleagues. Most of the people were quite amused and impressed by the capabilities of the toolkit – we could control each muscle of the talking head, blend emotions, etc. It was good to have a glimpse of the issues involved in tying a dialogue system to an ECA, specially considering my PhD research.

Moeller’s provided some theoretical background, evaluation models and pratical exercises. I took only two lectures on the theoretical background, so will only briefly comment on them. Moeller provided background on psychophysics, usability and quality assessment and often contrasted the developer's and user's view of a system property in order to illustrate the quality models. For instance, he presented a model to investigate the relation of quality of service and quality of experience. A point worth stressing was taking into account the expectations of the user and what the user got as factor on the quality of experience.

If you read this far, I believe you might agree that, no matter one's level of expertise and particular interest on adaptive multimodal dialogue system, one could learn a lot with the lectures program. For a pragmatical researcher interested in architecture, it could hardly be better!


Summary of Elsnet Summer School 2007: Part I

If I have to summarize my impressions on the Elsnet Summer School 2007(ESS07) in a single sentence I would say that it was a great experience, with interesting lectures, good organization, excellent alliances and last but, not least, lots of fun!

In the following posts I will present a biased summary of the lectures, the student presentations, the social aspects of the school, the emerging issues I could check, their relation to my work on Companions and the PhD research and some miscelaneous bits of stuff that caught my attention.

Welcome!

This blog is intended to be a mix of a mind dump and a way to keep other researchers I am in contact with aware of my research pursuits.
Typical posts will be conference, article and tools reviews, rambles on embryonic research ideas and calls for participation in stuff I am involved in.
Sure, my professional page continues to keep my articles, pursuits and full bio.

Enjoy!