54
Einführung 1 Vorlesung SS 2012 Multilinguale Mensch-Maschine Kommunikation Prof. Dr. Tanja Schultz Dipl.-Inform. Tim Schlippe Dienstag, 17. April 2012

Vorlesung SS 2012 Multilinguale Mensch-Maschine Kommunikation · •Praktikum: Multilingual Speech Processing (Schultz) ... • Speech input is still more expensive than keyboard

  • Upload
    dothien

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

Ein

führu

ng

1

Vorlesung SS 2012

Multilinguale Mensch-Maschine

Kommunikation

Prof. Dr. Tanja Schultz

Dipl.-Inform. Tim Schlippe

Dienstag, 17. April 2012

Ein

führu

ng

2

Überblick

Vorlesung 1: Übersicht und Einführung

• Allgemeine Informationen zur Vorlesung

• Vorstellen des Lehrstuhls

• interACT

• Hinführung zum Thema

• Anwendungsbeispiele

2

Ein

führu

ng

3

Allgemeine Informationen: Vorlesung

3

Weiterführende Vorlesung im Hauptdiplom

– Vorkenntnisse sind nicht erforderlich

Prüfungsmöglichkeit:

– Ja, in Kognitive Systeme und Anthropomatik

Turnus:

– Jährlich im SS, 4+0

– Prüfung nur während der Vorlesungszeit (frühzeitig anmelden!)

Termine:

– Di 14:00 – 15:30 (HS -101) und Do 14:00 – 15:30 (SR 131)

– Start 19.04.2012, Ende 19.07.2012

DozentInnen:

– Prof. Dr. Tanja Schultz – Dipl.-Inform. Tim Schlippe

– Weitere MitarbeiterInnen des LS

Ein

führu

ng

4

Allgemeine Informationen: Vorlesung

4

Alle Vorlesungsunterlagen befinden sich unter

http://csl.anthropomatik.kit.edu > Studium und Lehre

> SS2012 > Multilinguale Mensch-Maschine Kommunikation

– Alle Folien als pdf (kein passwd Schutz)

– Aktuelle Änderungen, Ankündigungen, Syllabus

– Gegebenenfalls zusätzliches Material (Papers)

Grundlagen für Prüfungen:

– Vorlesungsinhalt, Folien, zusätzliches Material

Fragen, Probleme und Kommentare sind jederzeit während der

Vorlesung willkommen, oder im persönlichen Gespräch: CSL,

Laborgebäude Kinderklinik, Geb. 50.21, Adenauerring 4

– Tanja Schultz ([email protected]), Raum 113

– Tim Schlippe ([email protected]), Raum 117

Sprechstunden Tanja Schultz nach Vereinbarung

Ein

führu

ng

5

Allgemeine Informationen: CSL

5

Lehrstuhl für Kognitive Systeme seit 1. Juni 2007

– Karlsruher Institut für Technologie, Fakultät für Informatik

– Institut für Anthropomatik (neu seit 2009)

– Homepage: http://csl.anthropomatik.kit.edu

– Adresse: Adenauerring 4, 76131 Karlsruhe

Kontakt:

– Prof. Dr.-Ing. Tanja Schultz

[email protected]

• +49 721 608 46300

– Sekretariat Frau Helga Scherer

[email protected]

• +49 721 608 46312

Ein

führu

ng

6

Forschung: Human-Centered Technologies

6

Technologien und Methoden:

Erkennen, Verstehen, Identifizieren

Statistische Modellierung, Klassifikation, ...

Anw

endun

gsfe

ld M

ensch

-Maschin

e I

nte

raktion

Hera

susfo

rerd

eru

ngen

und A

ufg

agen:

Pro

duktivität

und

Usabili

ty

Kommunikation des Menschen mit seiner Umwelt

im weitesten Sinn:

Sprache, Bewegung, Biosignale A

nw

endun

gsfe

ld M

ensch

-Mensch K

om

munik

atio

n

Hera

usfo

rderu

ng u

nd A

ufg

aben:

Spra

chenvie

lfalt, k

ultu

relle

Barrie

ren

Aufw

and u

nd K

oste

n

Ein

führu

ng

7

Lehre am CSL – Winter

7

Wintersemester • Biosignale und Benutzerschnittstellen

– 4+0, prüfbar in Kognitive Systeme und Anthropomatik

– Einführung in Erfassung und Interpretation von Biosignalen

– Anwendungsbeispiele

• Analyse und Modellierung menschlicher Bewegungen

– Einführung in die Analyse, Modellierung, und Erkennung mensch-licher Bewegungsabläufe (gemeinsam mit Dr. Annika Wörner)

– 2+0, prüfbar in Kognitive Systeme und Anthropomatik

• Design und Evaluation Innovativer Benutzerschnittstellen

– 2+0, prüfbar in Kognitive Systeme und Anthropomatik

• Multilingual Speech Processing

– 2+0, Praktikum

– Entwicklung von Spracherkennungssystemen mittels Rapid Language Adaptation Tools

Ein

führu

ng

8

Lehre am CSL – Winter

8

Wintersemester

• Praktikum Biosignale 2: Emotion und Kognition

– 2+0

– Aufzeichnung und Analyse von Biosignalen (z.B. Puls, Hautleitwert, Atmung) zur Erfassung emotionaler und kognitiver Prozesse des Menschen

Ein

führu

ng

9

Lehre am CSL – Sommer

9

Sommersemester

• Multilinguale Mensch-Maschine Kommunikation

– 4+0, prüfbar in Kognitive Systeme und Anthropomatik

– Einführung in die automatische Spracherkennung und -verarbeitung

– Signalverarbeitung, statistische Modellierung, praktische Ansätze

und Methoden, Multilingualität

– Anwendungen in Mensch-Mensch Kommunikation und Mensch-

Maschine Interaktion

– Anwendungsbeispiele

• Praktikum: Biosignale

– Praktische Entwicklung

• Aufnahme von Bewegungsdaten (in Koop mit Sportinstitut)

• Verschiedene Biosensoren (Vicon, Beschleunigungssensoren, EMG)

• Automatischer Bewegungserkennung

Ein

führu

ng

10

Lehre am CSL – Sommer

10

Sommersemester

• Kognitive Modellierung

– 2+0, prüfbar in Kognitive Systeme und Anthropomatik

– Modellierung menschlicher Kognition und menschlichen Affekts im

Kontext der Mensch-Maschine-Interaktion

– Modelle menschlichen Verhaltens, menschliches Lernen

(Zusammenhang und Unterschiede zu maschinellen Lernverfahren),

Repräsentation von Wissen, Emotionsmodelle, und kognitive

Architekturen

• Methoden der Biosignalverarbeitung

– 2+0, prüfbar in Kognitive Systeme und Anthropomatik

– algorithmische Methoden der modernen Biosignalverarbeitung

Ein

führu

ng

11

Arbeiten am CSL

11

• Bachelor

• Master

• Studienarbeiten

• Diplomarbeiten

• Hiwi-Jobs

Ein

führu

ng

12 12

Development of adaptive dialog system

• CSL develops successful EEG-based workload recognition

system

– Is the user fully attentive or distracted?

• Integrated into speech dialog system to adapt its behavior

– Simple example: High workload use shorter, simpler utterances

• Your task for BA/MA/SA/DA thesis: Implement a workload

adaptive speech dialog system for more complex tasks (in Java)

– Explore possibilities for intelligent, “cognitive” system strategies to react to

high workload

– Creativity is encouraged and rewarded!

• Learn about…

– application of speech recognition

– design of intelligent speech dialog systems

– usability and user-centered design

• Contact: [email protected]

Ein

führu

ng

13 13

Aufgaben: • Finden und Extrahieren von Aussprachen im WWW • Sicherstellung der Qualität • Auswertung des Einfluss auf Spracherkennungssysteme

Benötigte Kenntnisse: • Grundlagenwissen Spracherkennung • Programmierkenntnisse, z.B. in Perl oder PHP • Spaß an Informatik und Linguistik

SA/BA/DA/MA: Web-derived Pronunciations

Ab sofort bei: Tim Schlippe ([email protected])

Ein

führu

ng

14

Hörerliste

14

• Ausfüllen!

N Nachname, Vorname Fach, Semester Mtr.-Nr Email

1 SCHULTZ, Tanja Informatik, 36 [email protected]

2

Ein

führu

ng

15

Literatur

15

Xuedong Huang, Alex Acero and Hsiao-wuen Hon, Spoken Language Processing, Prentice Hall PTR, NJ, 2001 ($81.90 internet price)

Rabiner and Juang, Fundamentals of Speech Recognition, Prentice Hall Signal Processing Series, Englewood Cliffs, NJ, 1993

Jelinek, Statistical Methods for Speech Recognition, MIT Press, Cambridge, MA, 1997 ($35)

Schultz and Kirchhoff, Multilingual Speech Processing, Elsevier, Academic Press, 2006 (ask the authors for discounts!)

+ diverse Artikel (pdf), die wir im Web zur Verfügung stellen

(wirklich lesen!)

Ein

führu

ng

16

Nützliche Links, Zusätzliches Material

16

• Alle Folien werden als pdf ins Web gestellt

http://csl.anthropomatik.kit.edu > Studium und Lehre > SS2012 > Multilinguale Mensch-Maschine Kommunikation

• Elektronisches Archiv vieler Publikationsbände und Berichte (Proceedings) der wichtigsten Konferenzen zum Thema “Speech and Language”

ICASSP (International Conference on Acoustics, Speech, and Signal

Processing)

Interspeech (Zusammenschluss von Eurospeech und ICSLP)

ASRU (Automatic Speech Recognition and Understanding)

ACL (Association of Comp Linguistics), NA-ACL (North American ACL)

HLT (Human Language Technologies) …

Ein

führu

ng

17

Nützliche Links, Zusätzliches Material

17

• Biosignale und Benutzerschnittstellen (Schultz)

– Sprache als ein Biosignal in einem allgemeineren Rahmen

• Maschinelle Übersetzung (Waibel)

– Zusammenhang: Sprachübersetzung, statistische Methoden,

Sprachmodellierung

• Mustererkennung (Beyerer)

– Grundlagen Mustererkennung

• Automatische Spracherkennung (Waibel/Stüker)

– Grundlagen Spracherkennung (WS)

• Praktikum: Multilingual Speech Processing (Schultz)

• Praktikum: Automatische Spracherkennung (Waibel)

• Seminar: Sprach-zu-Sprach-Übersetzung (Waibel)

Ein

führu

ng

18

Allgemeine Information: Ziel der Veranstaltung

18

Ziele der Vorlesung

•Sprache in der Mensch-Maschine Kommunikation

– Vor- und Nachteile von Sprache als Eingabesignal

– Aspekte der Multilingualität in der Spracherkennung

•Grundlagen der Spracherkennung

– Grundbegriffe

– Sprachproduktion und Perzeption

– Digitale Signalverarbeitung, Merkmalsextraktion

– Statistische Modellierung, Klassifikation

– Akustische Modellierung, HMMs

– Sprachmodellierung

•Weitere Themen der Sprachverarbeitung

– Dialogmodellierung, Synthese, (Übersetzung: bei Prof. Waibel)

•Anwendungsbeispiele aus der Forschung

Ein

führu

ng

19

Heute: Anwendungsbeispiele

19

• Spracherkennung: Von Spracheingabesignal nach Text

• Sprachsynthese: Von Text nach Sprachausgabesignal

• Sprachübersetzung (über Sprachengrenzen):

Von Sprachsignal in Sprache L1 zu Sprachsignal in L2

= Spracherkennung + MT + Sprachsynthese

• Sprachverstehen, Zusammenfassen

= Von Spracheingabesignal nach Bedeutung

• Sprachaktivität ist aber nicht nur das Was wird gesprochen Wer spricht? → SprecherIDentifizierung

Welche Sprache wird gesprochen? → LanguageID

Über was wird gesprochen? → TopicID

Wie wird gesprochen? → EmotionID

Zu wem wird gesprochen? → Focus of Attention

• Übersetzung (über Speziesgrenzen): Beispiel Delphine

Ein

führu

ng

20

Introduction

20

• Each of the lessons covers one topic from

“speech recognition and understanding”

• It covers the most important areas of today’s research

and also discusses some historic issues

• The goal of the course is to introduce you to the science

of automatic speech recognition and understanding

• Today‘s topic:

– Why are we doing Speech Recognition?

• What are the advantages and disadvantages

– Where is it useful?

• Examples of applications, demos

Ein

führu

ng

21

Why Automatic Speech Recognition?

21

ADVANTAGES:

• Natural way of communication for human beings

No practicing necessary for users, i.e. speech

does not require any teaching as opposed to

reading/writing

High bandwidth (speaking is faster than typing)

• Additional communication channel (Multimodality)

• Hands and eyes are free for other tasks

→ Works in the car / on the run / in the dark

• Mobility (microphones are smaller than keyboards)

• Some communication channels (e.g. phone) are designed

for speech

• ...

Ein

führu

ng

22

Why Automatic Speech Recognition?

22

DISADVANTAGES:

• Unusable where silence/confidentiality is required

(meetings, library, spoken access codes)

… we are working on solutions (see later)

• Still unsatisfactory recognition rate when:

Environment is very noisy (party, restaurant, train)

Unknown or unlimited domains

Uncooperative speakers (whisper, mumble, …)

• Problems with accents, dialects, code-switching

• Cultural factors (e.g. collectivism, uncertainty avoidance)

• Speech input is still more expensive than keyboard

Ein

führu

ng

23

Input Speeds (Characters per Minute)

23

Mode

Standard

Best

Handwriting

200

500

Typewriter

200

1000

Stenography

500

2000

Speech

1000

4000

Ein

führu

ng

24

Where is Speech Recognition and Understanding useful

Human - Machine Interaction:

1. Remote control applications • Operating Machines over the Phone

2. Hands/Eyes busy or not useful • Speech Recognition in cars

• Help for Physically Challenged, Nurse bots

3. Authentication • Speaker Identification/Verification/Segmentation

• Language/Accent Identification

4. Entertainment / Convenience • Speech Recognition for Entertainment

• Gaming

5. Indexing and Transcribing Acoustic Documents • Archive, Summarize, Search and Retrieve

Ein

führu

ng

25

Where is Speech Recognition and Understanding useful

Human - Human Interaction:

1. Mediate communication across language boundaries

• Speech Translation

• Language Learning

• Synchronization / Sign Language

2. Support human interaction

• Meeting and Lecture systems

• Non-verbal Cue Identification

• Multimodal applications

• Speech therapy support

Ein

führu

ng

26

Operating Machines over the Phone

• Remote Controlled Home Operate heating / air conditioning, turn lights on/off, check email

• Voice-Operated Answering Machine Call answering machine from anywhere and discuss recent calls

• Access Databases Pittsburgh Bus Information with CMU’s Let’s Go at 412-268-3526

Check the weather with MIT’s Jupiter at 1-888-573-8255

Zugauskunft (Erlangen), Telefonauskunft, Fluggesellschaften, Kino

• Call Center Route or dispatch calls, 911 emergency line

AT&T: How may I help you?

The HMIHY system was deployed in 2001, and according to AT&T

was handling more than 2 million calls per month by the end of 2001.

• Use Interactive Services worldwide Plan your next trip with an artificial travel agent

Ein

führu

ng

27

Hands-Free / Eyes-Free Tasks

• Hands and/or Eyes are busy with tools

Radio repair

Construction site

• Hands and/or Eyes are needed to operate machines/cars

Hold the steering wheel

Pull levers, turn knobs, operate switches

Watch the street while driving

Monitor production line

• Hands are working on other people Hair stylist cutting hair

Surgeon working on a patient

• Hands and/or Eyes are not helpful in the environment Dark rooms (photography)

Outer Space (remote control)

Ein

führu

ng

28

Speech Recognition in Cars

• Use your cellular phone while keeping your hands on the

wheel and eyes on the street, e.g. voice dialing

• Operate your audio device while driving

• Dictate messages (e-mails, SMS)

TODAY several companies and services

are emerging which do exactly this

• Talk to your personal digital assistant

• Navigation -

Ask your way through a foreign city

Find the nearest restaurant

Ein

führu

ng

29

Support in everyday life, Help for Elderly and Physically Challenged

People who are immobile such as lying in bed/hospital or who can‘t

their hands due to illness or accidents

• operate parts of their environment/machines by voice

• ask a robot for help

Nursebot Pearl and Florence: ISAC feeding a physically challenged individual

CMU‘s Robotic assistant for the elderly Center for Intelligent Systems, Vanderbilt Univ

Children with speaking disorders make significant improvements by trying

to make a speech recognizer understand them

Children with dyslexia and similar problems learn to read faster using

automatic speech recognition

Ein

führu

ng

30

Information in Sprache

Speech Speech

Recognition

Language Language

Recognition

Speaker Speaker

Recognition

Accent Accent

Recognition

Words Onune baksana be adam!

Language Name

TurkishTurkish

Speaker Name

UmutUmut

Accent

IstanbulIstanbul

:

: :

:

Emotion Emotion

Recognition

Emotion

AngryAngry

Topic ID: Chemicals

Entity Tracking: Istanbul

Acoustic Scene: Bus Station

Discourse Analysis: Negotiation

Tanja Schultz, Speaker Characteristics, In: C. Müller (Ed.) Speaker Classification, Lecture Notes in Computer

Science / Artificial Intelligence, Springer, Heidelberg - Berlin - New York, Volume 4343.

Ein

führu

ng

31

Speaker Recognition

?

?

?

Identification

? ?

Verification/Detection

Segmentation and Clustering

Whose voice is it? Is it Sally’s voice?

Tim

Will

Where are the speaker changes?

Which segments are from the same speaker?

Ein

führu

ng

32

Speaker Identification/Verification/Recognition

Verification verify someone’s claimed identity, e.g.

is the person who s/he claims to be

Instead of password: say something instead of typing

Identification “who is speaking”

Identifies a speaker from an enrolled

population by searching the database

Personalized behavior: customize machine reaction auto-

matically to the current user

Recognition Often used to refer to all problems of

verification, identification,

segmentation&clustering

Ein

führu

ng

33

Speaker Segmentation and Clustering

Overlapping speech Speech over noise

Speaker turn miss

Segmentation: Automatically segment incoming speech by speaker

Clustering: cluster segments of the same speaker

Adaptation: use parameters that are optimized recognition for specific speaker

Mandarin Broadcast News

Ein

führu

ng

34

Language Identification

Japanese

o Auswahl Erkenner (bei multilingualer Spracherkennung)

o Anrufweiterleitung (z.B. 911 emergency line)

o Datenanalyse, Auswahl

o Spezialfall: Akzenterkennung

o Optimierung aller Systemparameter auf Sprecherakzent

o E-Language Learning

Tanja Schultz, Identifizierung von Sprachen -Exemplarisch aufgezeigt am Beispiel der Sprachen Deutsch, Englisch und Spanisch,

Diplomarbeit, Institut für Logik, Komplexität und Deduktionssysteme, Universität Karlsruhe, April 1995

Ein

führu

ng

35

FarSID: Far-Field Speaker Recognition

• Originalsignal

• Effekt Echo

• Effekt Distance

• Effect Raumgröße (1-m Distanz, .5-sec Echo)

Klein

Q. Jin, Y. Pan, T. Schultz, Far-Field Speaker Recognition, Proceedings of the IEEE International

Conference on Acoustics, Speech, and Signal Processing, ICASSP, Toulouse, France, 2006

Ein

führu

ng

36

Global Communication

The dream (?) of communicating

across language boundaries

- A babelfish for everybody -

• Fun, Everyday life:

• Chat in your mother tongue

Worldwide

• Travel without comm. problems

• Business:

• Negotiate and being sure that

your partner is getting it right

• Computer has no stakes, e.g.

neutral translation, not lopsided

• Face-to-Face Communication

• Over the phone or internet

• Text-to-Text vs Speech-to-Speech

„The building of the tower of Babel“,

1563 by Pieter Brueghel,

Kunsthistorisches Museum, Vienna

The building of the Tower of Babel

and the Confusion of Tongues

(languages) in ancient Babylon

mentioned in Genesis

"Babel" is composed of two words

"baa“meaning "gate" and "el," "god."

Hence, "the gate of god.“ A related

word in Hebrew, "balal" means

"confusion."

Ein

führu

ng

37

GALE

GALE = Global Autonomous Language Exploitation:

Process huge volumes of speech and text data in

multiple languages (Arabic, Chinese, English)

• Broadcast News, Shows, Telephone Conversations

Apply automatic technology to spoken and written language:

• Absorb, Analyze, and Interpret

Deliver pertinent information in easy-to-understand

forms to monolingual analysts

Three engines:

- Transcription,

- Translation,

- Distillation

Ein

führu

ng

38

Demonstration GALE – Chinese TV

Mandarin

Broadcast News

CCTV

recorded in the US

over satellite

Transforming the

Mandarin speech

Into Chinese text

using Automatic

Speech Recognition ASR

Translating from

Chinese text into

English text

using Statistical

Machine Translation

SMT

H. Yu, Y.C. Tam, T. Schaaf, S. Stüker, Q. Jin, M. Noamany, T. Schultz, The ISL RT04 Mandarin Broadcast

News Evaluation System, EARS Rich Transcription Workshop, Palisades, NY, November 2004

Ein

führu

ng

39

PDA Speech Translation in Mobile Scenarios

• Tourism

– Needs in Foreign Country

– International Events

• Conferences

• Business

• Olympics

• Humanitarian Needs

– Humanitarian, Government

– Emergency line 911

– USA, multicultural

population

• Army, peace corps

A. Waibel, A. Badran, A. Black, R. Frederking, D. Gates, A. Lavie, L. Levin, K. Lenzo, L Mayfield Tomokiyo,

J. Reichert, T. Schultz, D. Wallace, M. Woszczyna, J. Zhang, Speechalator: Two-way Speech-to-Speech

Translation in your Hand. HLT-NAACL 2003, Edmonton, Alberta, Canada, 2003

Ein

führu

ng

40

Verbmobil

Talk to people (face-to-face) from/in other countries in your own

language.

A step towards Startrek's "Universal Translator“

Ein

führu

ng

41

Mobility: Personal Digital Assistants

Use your PDA or cellular phone to get help

• Navigation

• Translation

• Information (travel, transportation, medical, ...)

Demo

Ein

führu

ng

42

RLAT: Rapid Language Adaptation Tools

Major Problem: Tremendous costs and time for development

– Very few languages ( 50 out of 6900) with many resources

– Lack of conventions (e.g. Languages without writing system)

– Gap between technology and language expertise

SPICE: Intelligent system that learns language from user – Speech Processing: Interactive Creation and Evaluation toolkit

– Develop web-based toolkits for Speech Processing: ASR, MT, TTS

– http://cmuspice.org

– http://csl.ira.uka.de/rlat-dev

• Interactive efficient learning Interactive learning:

– Solicite knowledge from user in the loop

– Rapid adaptation of language independent models

Efficiency:

– Reduce time and costs by a factor of 10

T. Schultz, A. Black, S. Badaskar, M. Hornyak, J. Kominek, SPICE: Web-based Tools for Rapid Language

Adaptation in Speech Processing Systems, Proceedings of Interspeech, Antwerp, Belgium, August 2007

Demo

Ein

führu

ng

43

Meeting Room

The Meeting Browser is a powerful tool that allows us to record a new

meeting, review or summarize an existing meeting or search a set of

existing meetings for a particular speaker, topic, or idea.

http://www.is.cs.cmu.edu/meeting_room/

Ein

führu

ng

44

Indexing Acoustic Documents

The world is flooded with

information.

More and more

information is coming

through audio-visual

channels.

Trying to find information

in acoustic documents

needs an intelligent

acoustic search engine.

Ein

führu

ng

45

View4You / Informedia

Automatically records Broadcast News and allows the

user to retrieve video segments of news items for

different topics using spoken language input

Kemp/Waibel 1999

Ein

führu

ng

46

Education, Learning Languages

• LISTEN: Automated reading tutor that listens

to a child read it aloud a displayed text, and

helps where needed.

• CHENGO: web-based language learning in a

gaming environment for English, Chinese

• Programm CALL at CMU on Computer

Assisted Language Learning

Ein

führu

ng

47

Robust and Confidential Speech Recognition

Traditional Speech Recognition:

• Capture the acoustic sound wave by microphone

• Transform signal into electrical energy

Requirements and Challenges:

• Audibility:

Speech needs to be perceivable by microphone

(no low voice or whispering, no silent speech)

• Interference: Speech disturbs others

(no speaking in libraries, theaters, meetings)

• Privacy: Speech signal can be captured by others

(no confidential phone calls in public places)

• Robustness:

Signal is corrupted by noisy environment

(difficult to recognize in restaurants, bars, cars)

Ein

führu

ng

48

Bone-conduction

• When we speak normally our body is a resonance box

Skin and bones vibrate when we speak (try this!)

• Capture this vibration by so-called bone-conducting

or skin-conducting microphones

• Whispered speech is defined as:

– the articulated production of respiratory sound

– with few or no vibration of the vocal-folds

– produced by the motion of the articulator apparatus

– transmitted through the soft tissue or bones of the head

Nakajima

Zheng et al. Jou et al. / Intecs

Stethoscopic

Microphone

Ein

führu

ng

49

Electromyography – Silent Speech

Approach:

– Surface Electromyography (EMG)

– Surface = No needles

– Electro = electrical activity

– Myo = muscle

– Graphy = recording

s1

s2

s1 – s2

EMG-Signal

- Measure the electrical activity of facial

muscles by capturing the electrical

capacity differences

- MOTION is recorded, not acoustic signal

silently moving the lips / articulators

is good enough

SILENT SPEECH Demo

Ein

führu

ng

50 50

Demo Lautlose Kommunikation

• http://csl.ira.uka.de/SilentSpeech

Ein

führu

ng

51

Delphinisch

Kommunikation über Sprachgrenzen über Speziesgrenzen

• Zusammenarbeit mit Wild Dolphins Project • freilebende Atlantis Spotted Dolphins

• Bestimmung, Verhalten, Kommunikation

• Kommunikation mit Delphinen

• Delphine versuchen Kontakt aufzunehmen

• Information 20Mio Jahre alte Spezies

• “Dolphone” und “Delphinisch” • Lautproduktion, Perzeption, Frequenz,

Medium

• Mustererkennung, Extraktion,

Clustering, Statistische Modellierung

• Audio- und Video indexing, archiving, retrieval

• Audioaufnahme, -analyse, -synthese, -übersetzung

http://wilddolphinproject.com

Ein

führu

ng

52

Even Beyond Human Speech …

Towards Communication with Dolphins

CMU: www.cs.cmu.edu/~tanja

Wild Dolphin Project

(http://wilddolphinproject.com)

Why do we want to talk to Dolphins? • They might have a lot to say (20Mio old species)

• It is a challenging scientific problem

- Cross language boundaries

Cross species boundaries

- Different sound production, perception, …

- Different medium (water), transmission, omni-directional

• Nothing is known about dolphins’ language

• It involves spending a lot of time in the Bahamas

Why do Dolphins want to talk to us? We don’t know …

… but there is evidence that they try hard

Ein

führu

ng

53

• Collaborative research and development program

• Developing multimedia and multilingual indexing and

management tools

e.g. automatic analysis, classification, extraction and

exploration of information

• Facilitate extraction of information in unlimited quantities of

multimedia and multilingual documents, including written texts,

speech and music audio files, and images and videos.

• Available to everyone via personal computers, television and

handheld terminals.

Quaero

Ein

führu

ng

54

Conclusions

Speech:

• Is the most natural way of communication for human beings

• Does not require any teaching or practicing

• Has high bandwidth (speaking is faster than typing)

• Supplements other communication channels (Multimodality)

Speech Recognition is useful:

• In hands-busy and eyes-busy environments

• For mobile / small devices

• Support in everyday life, Help for physically challenged folks

Speech Recognition and Understanding:

• Allows to (remotely) operate Machines

• Supports global communication between humans

• Break language (and maybe sometimes cultural) barriers