28
1 Folien zur Vorlesung LV-64-422 SoSe 2014 Bernd Neumann Bildverarbeitung 2: Höhere Bilddeutung

Bildverarbeitung 2: Höhere Bilddeutung · Literature 6! S. Russell & P. Norvig, Artificial Intelligence – A Modern Approach 3. Auflage, Pearson 2010 Section "Perception" pp. 928-970

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bildverarbeitung 2: Höhere Bilddeutung · Literature 6! S. Russell & P. Norvig, Artificial Intelligence – A Modern Approach 3. Auflage, Pearson 2010 Section "Perception" pp. 928-970

1!

Folien zur Vorlesung LV-64-422 SoSe 2014

Bernd Neumann

Bildverarbeitung 2: Höhere Bilddeutung

Page 2: Bildverarbeitung 2: Höhere Bilddeutung · Literature 6! S. Russell & P. Norvig, Artificial Intelligence – A Modern Approach 3. Auflage, Pearson 2010 Section "Perception" pp. 928-970

2!

Intended Audience

• The slides are intended for a graduate course of roughly 20 hours (14 lectures of 90 min each).

• Students are expected to possess basic knowledge in Computer

Vision and Artificial Intelligence.

Page 3: Bildverarbeitung 2: Höhere Bilddeutung · Literature 6! S. Russell & P. Norvig, Artificial Intelligence – A Modern Approach 3. Auflage, Pearson 2010 Section "Perception" pp. 928-970

3!

Website

The website for this course can be reached via http://kogs-www.informatik.uni-hamburg.de/~neumann/HBD-SoSe-2014-Website/ You will find PDF copies of the slides and possibly other useful information related to the course. The website will be updated before each lecture.

Page 4: Bildverarbeitung 2: Höhere Bilddeutung · Literature 6! S. Russell & P. Norvig, Artificial Intelligence – A Modern Approach 3. Auflage, Pearson 2010 Section "Perception" pp. 928-970

4!

Contents (1)

Lecture 1: Introduction Contents overview, motivation, aims, problem areas

Lecture 2: Early work on scene interpretation Badler, Tsotsos, Hogg, Nagel, Neumann

Lecture 3: Basic knowledge representation formalisms Semantic Networks, Frames, Constraints, Relational Structures

Lecture 4: Conceptual structures for scene interpretation Aggregates, situation trees, scenarios

Lecture 5: Interface to low-level vision Primitive symbols, grounding

Lecture 6: Modelling spatial and temporal relations Fuzzy predicates, Allen, RCC8, constraints

Lecture 7: Interpretation procedures

Page 5: Bildverarbeitung 2: Höhere Bilddeutung · Literature 6! S. Russell & P. Norvig, Artificial Intelligence – A Modern Approach 3. Auflage, Pearson 2010 Section "Perception" pp. 928-970

5!

Contents (2)

Lecture 8: Logical framework Model construction, Abduction, Decription Logics

Lecture 9: Probabilistic Guidance Hierarchical Bayesian Networks

Lecture 10: Case study 1 Monitoring of aircraft service activities (Co-Friend)

Lecture 11: Case study 2 Activity recognition in a restaurant (RACE)

Lecture 12: Robot Localisation Simultaneous localisation and Mapping (SLAM)

Lecture 13: Summary and outlook

Page 6: Bildverarbeitung 2: Höhere Bilddeutung · Literature 6! S. Russell & P. Norvig, Artificial Intelligence – A Modern Approach 3. Auflage, Pearson 2010 Section "Perception" pp. 928-970

Literature

6!

S. Russell & P. Norvig, Artificial Intelligence – A Modern Approach 3. Auflage, Pearson 2010 Section "Perception" pp. 928-970 • Image Formation • Early Image-Processing Operations • Object Recognition by Appearance • Reconstructing the 3D World • Object Recognition from Structural Information • Using Vision • Summary, Bibliographical and Historical Notes, Exercises H.I. Christensen, H.-H. Nagel (eds.), Cognitive Vision Systems – Sampling the Spectrum of Approaches, Springer, LNCS 3948, 2006 (18 contributions on different aspects of Cognitive Vision) M. Sonka, V. Hlavac, R. Boyle, Image Processing, Analysis, and Machine Vision, 3. Auflage, Thomson 2008 Section 10: Image Understanding pp. 450-545

Page 7: Bildverarbeitung 2: Höhere Bilddeutung · Literature 6! S. Russell & P. Norvig, Artificial Intelligence – A Modern Approach 3. Auflage, Pearson 2010 Section "Perception" pp. 928-970

7!

What is Computer Vision?

Computer Vision is the academic discipline dealing with task-oriented reconstruction and interpretation of a scene by means of images.

scene: section of the real world stationary (3D) or moving (4D)

image: view of a scene projection, density image (2D) depth image (2 1/2D) image sequence (3D)

reconstruction computer-internal scene description and interpretation: quantitative + qualitative + symbolic

task-oriented: for a purpose, to fulfill a particular task context-dependent, supporting actions of an agent

Page 8: Bildverarbeitung 2: Höhere Bilddeutung · Literature 6! S. Russell & P. Norvig, Artificial Intelligence – A Modern Approach 3. Auflage, Pearson 2010 Section "Perception" pp. 928-970

What Is Scene Interpretation?!Scene Interpretation is the task of "understanding" or interpreting a scene beyond single-object recognition. Typical examples are traffic scene interpretation for driver assistance, inferring user intentions in smart-room scenarios, recognizing team behavior in robocup games, discovering criminal acts in monitoring tasks.

Typical characteristics: • Interpretations involve several objects and occurrences. • Interpretations depend on temporal and spatial relations between

parts of a scene • Interpretations describe the scene in qualitative terms, omitting

geometric details. • Interpretations include inferred facts, unobservable in the scene. • Interpretations are based on conceptual knowledge and

experience about the world.

"Scene interpretation" means roughly the same as "high-level vision".

Page 9: Bildverarbeitung 2: Höhere Bilddeutung · Literature 6! S. Russell & P. Norvig, Artificial Intelligence – A Modern Approach 3. Auflage, Pearson 2010 Section "Perception" pp. 928-970

Examples for Scene Interpretation (1)!

Garbage collection in Hamburg (1 frame of a sequence)

We want to recognize parts, activities, intentions, spatial & temporal relations

scene interpretation means understanding every-day occurrences

Page 10: Bildverarbeitung 2: Höhere Bilddeutung · Literature 6! S. Russell & P. Norvig, Artificial Intelligence – A Modern Approach 3. Auflage, Pearson 2010 Section "Perception" pp. 928-970

Examples for Scene Interpretation (2)!

Buster Keaton in "The Navigator" We want to recognize episodes, the "story", emotions, funnyness

Scene interpretation is silent movie understanding

Page 11: Bildverarbeitung 2: Höhere Bilddeutung · Literature 6! S. Russell & P. Norvig, Artificial Intelligence – A Modern Approach 3. Auflage, Pearson 2010 Section "Perception" pp. 928-970

Some Application Scenarios for Scene Interpretation!

• Understanding street traffic (long history)

•  Cameras monitoring parking lots, railway platforms, supermarkets, nuclear power plants, ...

•  Video archiving and retrieval

•  Soccer game analysis

•  Smart room cameras

•  Monitoring elderly people in nursing homes

•  Autonomous robot applications (e.g. robot watchmen, robot waiters, assistance for elderly)

Page 12: Bildverarbeitung 2: Höhere Bilddeutung · Literature 6! S. Russell & P. Norvig, Artificial Intelligence – A Modern Approach 3. Auflage, Pearson 2010 Section "Perception" pp. 928-970

Technological Challenges of Scene Interpretation Tasks!

• Problem area combines Computer Vision (CV) and Artificial Intelligence (AI), not well attended by mainstream CV and AI research

• Interpretations may build on common sense knowledge, common-sense knowledge representation is an unsolved issue

• Application scenarios may be large and highly diverse, knowledge engineering is a challenge

• Learning and adaptation may be required

• Reliability and complexity management may become important issues

• Economical application development requires generic approach

Page 13: Bildverarbeitung 2: Höhere Bilddeutung · Literature 6! S. Russell & P. Norvig, Artificial Intelligence – A Modern Approach 3. Auflage, Pearson 2010 Section "Perception" pp. 928-970

Cognitive Computer Vision!

Cognitive computer vision is concerned with integration and control of vision systems using explicit but not necessarily symbolic models of context, situation and goal-directed behaviour. Cognitive vision implies functionalities for knowledge representation, learning, reasoning about events & structures, recognition and categorization, and goal specification, all of which are concerned with the semantics of the relationship between the visual agent and its environment.

• integration and control • explicit models • not necessarily symbolic • context • situation • goal-directed behaviour • knowledge representation

• learning • reasoning • recognition • categorization • goal specification • visual agent

Scene interpretation is strongly related to "cognitive vision", a term created for vision comparable to human vision:

Topics of cognitive vision:

Page 14: Bildverarbeitung 2: Höhere Bilddeutung · Literature 6! S. Russell & P. Norvig, Artificial Intelligence – A Modern Approach 3. Auflage, Pearson 2010 Section "Perception" pp. 928-970

Multidisciplinary Contributions to Cognitive Vision!

Cognitive Vision research requires multidisciplinary efforts and escape from traditional research community boundaries.

Computer Vision • object recognition, tracking • bottom-up image analysis • geometry and shape • hypothesize-and-test control • probabilistic methods

Knowledge Representation & Reasoning • KR languages • logic-based reasoning services • default theories • reasoning about actions & change • Description Logics • spatial and temporal calculi

Robotics • planning, goal-directed behaviour • manipulation • sensor integration • navigation • localization, mapping, SLAM • integrative architectures

Learning & Data Mining • concept learning • inductive generalization • clustering • knowledge discovery

Cognitive Science • psychophysical models • neural models • conceptual spaces • qualitative representations • naive physics

Uncertain Reasoning • Bayesian nets, belief nets • decision & estimation • causality • probabilistic learning

Natural Language • high-level concepts • qualitative descriptions • NL scene descriptions • communication

Cognitive Vision

Page 15: Bildverarbeitung 2: Höhere Bilddeutung · Literature 6! S. Russell & P. Norvig, Artificial Intelligence – A Modern Approach 3. Auflage, Pearson 2010 Section "Perception" pp. 928-970

Basic Structure of Knowledge-based Scene Interpretation!

geometrical scene description

image sequences of dynamic scenes

high-level scene interpretations

scene models task context

Page 16: Bildverarbeitung 2: Höhere Bilddeutung · Literature 6! S. Russell & P. Norvig, Artificial Intelligence – A Modern Approach 3. Auflage, Pearson 2010 Section "Perception" pp. 928-970

Hybrid Representations for High-level Scene Interpretation!

raw image sequence

map-type feature representations

pictorial scene representations

metric feature representations

symbolic scene descriptions

signal domain

metric domain

symbolic domain

Page 17: Bildverarbeitung 2: Höhere Bilddeutung · Literature 6! S. Russell & P. Norvig, Artificial Intelligence – A Modern Approach 3. Auflage, Pearson 2010 Section "Perception" pp. 928-970

Context and Task Dependence!

Interpretations may depend on - domain context - spatial context - temporal context - intentional context - task context - communicative context - focus of attention - a priori probabilities

Constructing an interpretation is not a mapping from image data into interpretation space.

Page 18: Bildverarbeitung 2: Höhere Bilddeutung · Literature 6! S. Russell & P. Norvig, Artificial Intelligence – A Modern Approach 3. Auflage, Pearson 2010 Section "Perception" pp. 928-970

Signal-Symbol Problems (1)!

Mapping from quantitative into qualitative representations

Example: "along"

• • •

• •

"Citytour" (Wahlster 87)

Mapping from qualitative into quantitative representations

Example: "abbiegen" "typicality fields" (Mohnhaupt & Neumann 91, Herzog 94)

Page 19: Bildverarbeitung 2: Höhere Bilddeutung · Literature 6! S. Russell & P. Norvig, Artificial Intelligence – A Modern Approach 3. Auflage, Pearson 2010 Section "Perception" pp. 928-970

Signal-Symbol Problems (2)!

Symbol grounding

• •

road1

gully1

cover1 hole1

danger1

To what entity in the real world does a symbol refer?

Page 20: Bildverarbeitung 2: Höhere Bilddeutung · Literature 6! S. Russell & P. Norvig, Artificial Intelligence – A Modern Approach 3. Auflage, Pearson 2010 Section "Perception" pp. 928-970

Common-Sense Problems!

Common-sense reasoning

Deductions from symbolic knowledge about a scene should not only be correct w.r.t. to domain-related definitions but also w.r.t. to common sense.

Examples: (implies (and house (some near lake)) mosquito-house) (instance house1 house) (instance lake1 lake) (related house1 lake1 near) (instance house1 (not (mosquito-house))) => inconsistent by domain-related definitions

(instance house1 house) (instance cup1 cup) (related house1 cup1 inside) => inconsistent by common sense

Page 21: Bildverarbeitung 2: Höhere Bilddeutung · Literature 6! S. Russell & P. Norvig, Artificial Intelligence – A Modern Approach 3. Auflage, Pearson 2010 Section "Perception" pp. 928-970

Uncertainty Problems (1)!

Fuzzyness of concepts

Many high-level concepts have unsharp boundaries.

"behind" "overtake" "meet"

=> mapping into logical propositions may be problematic

• Fuzzy set theory offers "degree of applicability"

rear

1.0 0.7 0.7

0.3 0.3

Fuzzy definition of behind • Probability theory offers statistical measures for language use

Page 22: Bildverarbeitung 2: Höhere Bilddeutung · Literature 6! S. Russell & P. Norvig, Artificial Intelligence – A Modern Approach 3. Auflage, Pearson 2010 Section "Perception" pp. 928-970

Uncertainty Problems (2)!

Uncertainty of data

Example: Object boundaries

Strict bottom-up image interpretation is fundamentally ill-defined

house boundary is not discernable

Page 23: Bildverarbeitung 2: Höhere Bilddeutung · Literature 6! S. Russell & P. Norvig, Artificial Intelligence – A Modern Approach 3. Auflage, Pearson 2010 Section "Perception" pp. 928-970

Uncertainty Problems (3)!

Exploring multiple hypotheses

Answers from several disciplines: • graph matching • heuristic search • optimization theory • logic theories • probability & utility theory • case-based reasoning • neural networks • particle physics (and others) Mixed bottom-up and top-down interpretation strategies have been rarely explored

Page 24: Bildverarbeitung 2: Höhere Bilddeutung · Literature 6! S. Russell & P. Norvig, Artificial Intelligence – A Modern Approach 3. Auflage, Pearson 2010 Section "Perception" pp. 928-970

Uncertainty Problems (4)!

Cultural clash betweeen logical and probabilistic reasoning

Probabilistic methods are not yet seamlessly integrated with logical calculi Interesting recent developments: • First-order probabilistic inference (Poole 03) • Probabilistic relational models (http://dags.stanford.edu/PRMs/)

Example for reasoning in image interpretation: (from Kanade´s invited lecture at IJCAI-03: "Computer Vision: AI or Non-AI Problem?")

car on left side of street (uncertain orientation of car)

Japanese signs => left-hand traffic orientation of car resolved

Page 25: Bildverarbeitung 2: Höhere Bilddeutung · Literature 6! S. Russell & P. Norvig, Artificial Intelligence – A Modern Approach 3. Auflage, Pearson 2010 Section "Perception" pp. 928-970

Example of Dynamic Scene Interpretation!

S. Hongeng, R. Nevatia and F. Bremond. Video-Based Event Recognition: Activity Representation and Probabilistic Recognition Methods. Computer Vision and Image Understanding, Vol. 96 (2004), 129 - 162.

Recognising "Stealing by Blocking": "A" approaches a reference object (a person standing in the middle with his belongings on the ground). "B" and "C" then approach and block the view of "A" and the reference person from their belongings. In the mean time, "D" comes and takes the belongings.

Page 26: Bildverarbeitung 2: Höhere Bilddeutung · Literature 6! S. Russell & P. Norvig, Artificial Intelligence – A Modern Approach 3. Auflage, Pearson 2010 Section "Perception" pp. 928-970

Learning and Recognising Structures in Buildings (1) !

Rectangular objects recognised by low-level vision

Window-arrays recognised by high-level vision using a learnt model

EU-funded project eTRIMS* at the Cognitive Systems Laboratory of Hamburg University

*) E-Training for the Interpretation of Man-made Scenes

Page 27: Bildverarbeitung 2: Höhere Bilddeutung · Literature 6! S. Russell & P. Norvig, Artificial Intelligence – A Modern Approach 3. Auflage, Pearson 2010 Section "Perception" pp. 928-970

Learning and Recognising Structures in Buildings (2)!

Interpretation of a facade, entrance is not recognised

Entrance is recognised after learning from the example

Page 28: Bildverarbeitung 2: Höhere Bilddeutung · Literature 6! S. Russell & P. Norvig, Artificial Intelligence – A Modern Approach 3. Auflage, Pearson 2010 Section "Perception" pp. 928-970

Monitoring Airport Activities in the EU-Project Co-Friend!

Application scenario • Aircraft servicing operations at Toulouse-

Blagnac Airport are observed by eight cameras • Moving objects are tracked by a low-level vision

system • Activities such as refueling or baggage

unloading are recognised by a high-level vision system

Project goals • Reliable on-line interpretation of extended multi-camera video sequences • Learning new activities from examples • Robust recognition performance based on a rich domain ontology

EU-funded project Co-Friend* at the Cognitive Systems Laboratory of Hamburg University

*) Cognitive & Flexible learning system operating Robust Interpretation of Extended real scenes by multi-sensors Datafusion