AI Engineer: August 2007

The lecture program was well-balanced between theoretical and hands-on courses.

Minker’s lectures on the current themes in spoke dialogue technology research provided also a review of the state of the art-on each area. His grad students presentations, Pietra-Maria an Alex, stood out as perhaps the best-overall lectures during the whole school. Pietra Maria presented on multi-party dialogue and Alex on speech/dialogue processing in mobile environments. No wonder I will keep a close eye at those two, as we might venture into multi-party at a later phase in Companions and the distributed solutions to speech-processing that Alex presented are of immediate relevance to the future prospect of embedded companions.

James Larson was an excellent addition to the program, providing an interesting departure from academic speculation and prototyping to a practical approach on the construction of commercial speech applications. His course focused on voice VoiceXML and related technologies for the construction of simple task-oriented systems. As I took the practical lab course too, I had the theoretical underpinnings in the morning and the opportunity to apply it in practice a few hours later in the afternoon. Most colleagues who used this course combination enjoyed the course.

To me VoiceXML and related technologies seemed too primitive and constraining to be of direct, immediate applicability to advanced research on dialogue systems (for instance, only VoiceXML 3.0 supports multiple hypothesis for the result of the ASR). However they are a very good solution for rapid prototyping of speech applications and the creation of speech-enabled web interfaces. If we are to develop a simple directed FSM-based system, why bother with Communicator? A suggestion of Larson’s that I believe is one of the most important for us as researchers is trying to use as much of the current W3 standards as possible, and, if needed, extending them, instead of re-inventing the wheel. This would allow us to not only cut the research time, but also to make our work have greater impact, as extensions of the standards could be potentially adopted. Also, conforming to a known standard, reuse and applicability to non-research situations would be made easier.

Jokinen’s course was concurrent with Larson’s, so I managed to just glimpse 10 minutes of her lectures and then go over the slides. It was quite reassuring to know that agents and multi-agent systems were one of the main approaches in Finland (at least in Helsinky and Tampere) and quite popular in Germany. Her slides centered around the issue of adaptation under different views (interaction, modularization, communication), which is one of the main strengths of multi-agent systems, if well-designed.

In the second week, Lopez-Cozar’s course focused not so much on general themes on dialogue research, but on the practical issues one face when building multimodal and multilingual dialogue systems. Special attention was paid to architectures and techniques, which suited me quite well. I had skimmed over his book before the course, but still I could benefit from many new insights, the colleagues’ questions and some interesting videos during the lectures. Minker’s and Lopez-Cozar lectures together provided a crash course on multimodal dialogue systems, from the current research challenges to the techniques at our disposal to face them. Many congratulations to Ian, McTear et al for the carefully thought combination!

Baggia’s course focused mostly on advanced TTS and speech grammars, building upon what Larson had done. We covered also some ECMAScript, AJAX and CCXML, with a focus on putting all together in a working system. We experimented with Loquendo’s VoiceNauta platform to carry out some experiments with the advanced TTS features and revisited some other aspects of the W3C speech interface languages, such as SRGS, SISR and VoiceXML. One interesting bit was a TTS system that would use the most approximate phoneme to speak a foreign word, as in the the phrase “Le presento Timothy Wilcock”. The net result was very similar to a genuine strong Italian accent while saying the name. This could be a terrific asset for constructing characters for entertainment and interactive storytelling applications. We had lots of practical tips, such as decreasing the confidence threshold to enable the apps to function with non-native speakers. The only setback on the course was that we saw too many advanced features at once, and sometimes I got lost in the flood of new info. Anyway, it was a productive finishing for our VoiceXML marathon.

During the afternoon of the second week I divided my time between Moeler’s, Pelachaud’s and Baggia' s extra class on language modeling, but mostly continued to delve into the voice XML stuff. Thus, the account on these lectures will be somewhat shorter and less well-informed.

Pelachaud’s course combined lectures and videos, but I only took the practical class, where we used a toolkit to modify the GRETA ECA. Having a background on ECAs from the master’s era and using the past slides, I could follow well alongside the colleagues. Most of the people were quite amused and impressed by the capabilities of the toolkit – we could control each muscle of the talking head, blend emotions, etc. It was good to have a glimpse of the issues involved in tying a dialogue system to an ECA, specially considering my PhD research.

Moeller’s provided some theoretical background, evaluation models and pratical exercises. I took only two lectures on the theoretical background, so will only briefly comment on them. Moeller provided background on psychophysics, usability and quality assessment and often contrasted the developer's and user's view of a system property in order to illustrate the quality models. For instance, he presented a model to investigate the relation of quality of service and quality of experience. A point worth stressing was taking into account the expectations of the user and what the user got as factor on the quality of experience.

If you read this far, I believe you might agree that, no matter one's level of expertise and particular interest on adaptive multimodal dialogue system, one could learn a lot with the lectures program. For a pragmatical researcher interested in architecture, it could hardly be better!

AI Engineer

Thursday, 2 August 2007

Summary of Elsnet Summer School 2007: Part II - The Lectures

Summary of Elsnet Summer School 2007: Part I

Welcome!

Blog Archive

About Me

Subscribe Now: Feed Icon

Related Sites

Visitors' Cluster Map