Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
Neural Information Processing Group and
Bernstein Center for Computational Neuroscience,
Eberhard Karls Universität Tübingen
Max Planck Institute for Intelligent Systems, Tübingen
Felix Wichmann
Maschinelles Lernen:
Methoden, Algorithmen, Potentiale und
gesellschaftliche Herausforderungen
http://www.appblogger.de/wp-content/uploads/2013/03/pb-130314-pope-2005.photoblog900.jpg
http://msnbcmedia.msn.com/j/MSNBC/Components/Photo/_new/pb-130314-pope-2013.photoblog900.jpg
❶
One way to think about vision: inverse optics
Laws of physics “generate” 2D images on
our retinae from 3D scenes
(forward optics / rendering)
light source
(e.g. sun light)
object reflectance
amount of light
entering the eye
is a product of
light source intensity
and object reflectance
One way to think about vision: inverse optics
Laws of physics “generate” 2D images on
our retinae from 3D scenes
(forward optics / rendering)
Starting point to think about visual
perception: we want to infer the 3D scene
from the 2D retinal images:
inverse optics!
light source
(e.g. sun light)
object reflectance
amount of light
entering the eye
is a product of
light source intensity
and object reflectance
One way to think about vision: inverse optics
Laws of physics “generate” 2D images on
our retinae from 3D scenes
(forward optics / rendering)
Starting point to think about visual
perception: we want to infer the 3D scene
from the 2D retinal images:
inverse optics!
But: Inverse optics is mathematically
impossible.
light source
(e.g. sun light)
object reflectance
amount of light
entering the eye
is a product of
light source intensity
and object reflectance
N = 0
N = 1
N = 2
N = 5
N = 9
N = 24 (considered fully rendered)
modified from Matthias Bethge
modified from Matthias Bethge
modified from Matthias Bethge
illumination
(„light field“)
objects & surfaces
(geometry, materials)
resultingimage
(non-linear) „information entanglement“
visual inference („untangling“)
illumination
(„light field“)
objects & surfaces
(geometry, materials)
resultingimage
(non-linear) „information entanglement“
modified from Matthias Bethge
❷
Machine learning (ML) and statistics
Machine learning (ML) and statistics
Statistics is the science of learning from data. … [ML] is the science of
learning from data. These fields are identical in intent although they differ in
their history, conventions, emphasis and culture. (Wasserman, 2014)
Machine learning (ML) and statistics
Statistics is the science of learning from data. … [ML] is the science of
learning from data. These fields are identical in intent although they differ in
their history, conventions, emphasis and culture. (Wasserman, 2014)
ML is a comparatively new sub-branch of computational statistics jointly
developed in computer science and statistics.
Machine learning (ML) and statistics
Statistics is the science of learning from data. … [ML] is the science of
learning from data. These fields are identical in intent although they differ in
their history, conventions, emphasis and culture. (Wasserman, 2014)
ML is a comparatively new sub-branch of computational statistics jointly
developed in computer science and statistics.
ML is inference performed by computers based on past observations and
learning algorithms: ML algorithms are mainly concerned with discovering
hidden structure in data in order to predict novel data—exploratory
methods, to get things done!
Machine learning (ML) and statistics
Statistics is the science of learning from data. … [ML] is the science of
learning from data. These fields are identical in intent although they differ in
their history, conventions, emphasis and culture. (Wasserman, 2014)
ML is a comparatively new sub-branch of computational statistics jointly
developed in computer science and statistics.
ML is inference performed by computers based on past observations and
learning algorithms: ML algorithms are mainly concerned with discovering
hidden structure in data in order to predict novel data—exploratory
methods, to get things done!
“Classical” statistics typically is concerned with making precise probabilistic
statements about known data coming from known distributions, i.e. interest
in accurate models of data!
What is the difference between statistics and machine learning?
Machine Learning is AI people doing data analysis.
Data Mining is database people doing data analysis.
Applied Statistics is statisticians doing data analysis
Infographics is Graphic Designers doing data analysis.
Data Journalism is Journalists doing data analysis.
Econometrics is Economists doing data analysis
(and here you can win a Nobel Prize).
Psychometrics is Psychologists doing data analysis.
Chemometrics and Cheminformatics are Chemists doing data analysis.
Bioinformatics is Biologists doing data analysis.
30Aleks Jakulin, https://www.quora.com/What-is-the-difference-between-statistics-and-machine-learning
What is the difference between statistics and machine learning? (cont’d)
… if you look at what the goals both fields are trying to achieve, you see that
there is actually quite a big difference:
Statistics is interested in learning something about data, for example, which
have been measured as part of some biological experiment. … . But the
overall goal is to arrive at new scientific insight based on the data.
In Machine Learning, the goal is to solve some complex computational task by
“letting the machine learn”. Instead of trying to understand the problem well
enough to be able to write a program which is able to perform the task (for
example, handwritten character recognition), you instead collect a huge
amount of examples of what the program should do, and then run an
algorithm which is able to perform the task by learning from the examples.
Often, the learning algorithms are statistical in nature. But as long as the
prediction works well, any kind of statistical insight into the data is not
necessary.
31Mikio Braun, https://www.quora.com/What-is-the-difference-between-statistics-and-machine-learning
What is the difference between statistics and machine learning? (cont’d)
The primary differences are perhaps the types of the problems attacked, and
the goal of learning.
At the risk of data and models oversimplification, one could say that in
statistics a prime focus is often in understanding the data and relationships
in terms of models giving approximate summaries such as linear relations or
independencies. In contrast, the goals in algorithms and machine learning are
primarily to make predictions as accurately as possible and predictions to
understand the behaviour of learning algorithms.
These differing objectives have led to different developments in the two
fields: for example, neural network algorithms have been used extensively as
black-box function approximators in machine learning, but to many
statisticians they are less than satisfactory, because of the difficulties in
interpreting such models.
32Franck Dernoncourt, https://www.quora.com/What-is-the-difference-between-statistics-and-machine-learning
Terminology: types of learning
Terminology: types of learning
Supervised learning is the ML task of inferring a function from labeled training data. In supervised
learning, each example is a pair consisting of an input object (typically a vector) and a desired
output value (also called the supervisory signal). A supervised learning algorithm analyzes the
training data and produces an inferred function, which can be used for prediction.
Terminology: types of learning
Supervised learning is the ML task of inferring a function from labeled training data. In supervised
learning, each example is a pair consisting of an input object (typically a vector) and a desired
output value (also called the supervisory signal). A supervised learning algorithm analyzes the
training data and produces an inferred function, which can be used for prediction.
Reinforcement learning is an area of ML inspired by behaviorist psychology, concerned with how
software agents ought to take actions in an environment so as to maximize some notion of
cumulative reward. Unlike supervised ML correct input/output pairs are never presented, nor sub-
optimal actions explicitly corrected; only global reward for an action.
Terminology: types of learning
Supervised learning is the ML task of inferring a function from labeled training data. In supervised
learning, each example is a pair consisting of an input object (typically a vector) and a desired
output value (also called the supervisory signal). A supervised learning algorithm analyzes the
training data and produces an inferred function, which can be used for prediction.
Reinforcement learning is an area of ML inspired by behaviorist psychology, concerned with how
software agents ought to take actions in an environment so as to maximize some notion of
cumulative reward. Unlike supervised ML correct input/output pairs are never presented, nor sub-
optimal actions explicitly corrected; only global reward for an action.
Unsupervised learning is the ML task of inferring a function to describe hidden structure from
unlabeled data. Since the examples given to the learner are unlabeled, there is no error or reward
signal to evaluate a potential solution. This distinguishes unsupervised learning from supervised
learning and reinforcement learning. A good example is identifying close-knit groups of friends in
social network data; clustering algorithms, like k-means)
Terminology: types of learning
Supervised learning is the ML task of inferring a function from labeled training data. In supervised
learning, each example is a pair consisting of an input object (typically a vector) and a desired
output value (also called the supervisory signal). A supervised learning algorithm analyzes the
training data and produces an inferred function, which can be used for prediction.
Reinforcement learning is an area of ML inspired by behaviorist psychology, concerned with how
software agents ought to take actions in an environment so as to maximize some notion of
cumulative reward. Unlike supervised ML correct input/output pairs are never presented, nor sub-
optimal actions explicitly corrected; only global reward for an action.
Unsupervised learning is the ML task of inferring a function to describe hidden structure from
unlabeled data. Since the examples given to the learner are unlabeled, there is no error or reward
signal to evaluate a potential solution. This distinguishes unsupervised learning from supervised
learning and reinforcement learning. A good example is identifying close-knit groups of friends in
social network data; clustering algorithms, like k-means)
Semi-supervised learning is a class algorithms making use of unlabeled data for training—
typically a small amount of labeled data with a large amount of unlabeled data. Semi-supervised
learning falls between unsupervised learning (without any labeled training data) and supervised
learning (with completely labeled training data).
Terminology: types of problems in supervised ML
Terminology: types of problems in supervised ML
Classification: Problems where we seek a yes-or-no prediction, such as “Is
this tumour cancerous?”, “Does this cookie meet our quality standards?”, and
so on.
Terminology: types of problems in supervised ML
Classification: Problems where we seek a yes-or-no prediction, such as “Is
this tumour cancerous?”, “Does this cookie meet our quality standards?”, and
so on.
Regression: Problems where the value being predicted falls somewhere on a
continuous spectrum. These systems help us with questions of “How much?”
or “How many?”
Terminology: types of problems in supervised ML
Classification: Problems where we seek a yes-or-no prediction, such as “Is
this tumour cancerous?”, “Does this cookie meet our quality standards?”, and
so on.
Regression: Problems where the value being predicted falls somewhere on a
continuous spectrum. These systems help us with questions of “How much?”
or “How many?”
Support vector machine (SVM) is a supervised classification algorithm
Terminology: types of problems in supervised ML
Classification: Problems where we seek a yes-or-no prediction, such as “Is
this tumour cancerous?”, “Does this cookie meet our quality standards?”, and
so on.
Regression: Problems where the value being predicted falls somewhere on a
continuous spectrum. These systems help us with questions of “How much?”
or “How many?”
Support vector machine (SVM) is a supervised classification algorithm
Neural networks, including the now so popular convolutional deep neural
networks (DNNs), are supervised algorithms, too, typically however for multi-
class classification
Success of supervised classification in ML
Success of supervised classification in ML
ML—and in particular kernel methods as well as very recently so-called deep
neural networks (DNNs)—have proven successful whenever there is an
abundance of empirical data but a lack of explicit knowledge how the data
were generated:
Success of supervised classification in ML
ML—and in particular kernel methods as well as very recently so-called deep
neural networks (DNNs)—have proven successful whenever there is an
abundance of empirical data but a lack of explicit knowledge how the data
were generated:
• Predict credit card fraud from patterns of money withdrawals.
Success of supervised classification in ML
ML—and in particular kernel methods as well as very recently so-called deep
neural networks (DNNs)—have proven successful whenever there is an
abundance of empirical data but a lack of explicit knowledge how the data
were generated:
• Predict credit card fraud from patterns of money withdrawals.
• Predict toxicity of novel substances (biomedical research).
Success of supervised classification in ML
ML—and in particular kernel methods as well as very recently so-called deep
neural networks (DNNs)—have proven successful whenever there is an
abundance of empirical data but a lack of explicit knowledge how the data
were generated:
• Predict credit card fraud from patterns of money withdrawals.
• Predict toxicity of novel substances (biomedical research).
• Predict engine failure in airplanes.
Success of supervised classification in ML
ML—and in particular kernel methods as well as very recently so-called deep
neural networks (DNNs)—have proven successful whenever there is an
abundance of empirical data but a lack of explicit knowledge how the data
were generated:
• Predict credit card fraud from patterns of money withdrawals.
• Predict toxicity of novel substances (biomedical research).
• Predict engine failure in airplanes.
• Predict what people will google next.
Success of supervised classification in ML
ML—and in particular kernel methods as well as very recently so-called deep
neural networks (DNNs)—have proven successful whenever there is an
abundance of empirical data but a lack of explicit knowledge how the data
were generated:
• Predict credit card fraud from patterns of money withdrawals.
• Predict toxicity of novel substances (biomedical research).
• Predict engine failure in airplanes.
• Predict what people will google next.
• Predict what people want to buy next at amazon.
The Function Learning Problem
x
x
x
x
x
x
y
The Function Learning Problem
x
x
x
x
x
x
y
The Function Learning Problem
x
x
x
x
x
x
y
Learning Problem in General
Learning Problem in General
Training examples (x1,y1),…,(xm,ym)
Learning Problem in General
Training examples (x1,y1),…,(xm,ym)
Task: given a new x, find the new y
strong emphasis on prediction, that is, generalization!
Learning Problem in General
Training examples (x1,y1),…,(xm,ym)
Task: given a new x, find the new y
strong emphasis on prediction, that is, generalization!
Idea: (x,y) should look “similar” to the training examples
Learning Problem in General
Training examples (x1,y1),…,(xm,ym)
Task: given a new x, find the new y
strong emphasis on prediction, that is, generalization!
Idea: (x,y) should look “similar” to the training examples
Required: similarity measure for (x,y)
Learning Problem in General
Training examples (x1,y1),…,(xm,ym)
Task: given a new x, find the new y
strong emphasis on prediction, that is, generalization!
Idea: (x,y) should look “similar” to the training examples
Required: similarity measure for (x,y)
Much of creativity and difficulty in kernel-based ML: Find suitable similarity
measures for all the practical problems discussed before, e.g. credit card
fraud, toxicity of novel molecules, gene sequences, … .
Learning Problem in General
Training examples (x1,y1),…,(xm,ym)
Task: given a new x, find the new y
strong emphasis on prediction, that is, generalization!
Idea: (x,y) should look “similar” to the training examples
Required: similarity measure for (x,y)
Much of creativity and difficulty in kernel-based ML: Find suitable similarity
measures for all the practical problems discussed before, e.g. credit card
fraud, toxicity of novel molecules, gene sequences, … .
When are two molecules, with different atoms, structure, configuration etc.
the same? When are two strings of letters or sentences similar? What would
be the mean, or the variance of strings? Of molecules?
Learning Problem in General
Training examples (x1,y1),…,(xm,ym)
Task: given a new x, find the new y
strong emphasis on prediction, that is, generalization!
Idea: (x,y) should look “similar” to the training examples
Required: similarity measure for (x,y)
Much of creativity and difficulty in kernel-based ML: Find suitable similarity
measures for all the practical problems discussed before, e.g. credit card
fraud, toxicity of novel molecules, gene sequences, … .
When are two molecules, with different atoms, structure, configuration etc.
the same? When are two strings of letters or sentences similar? What would
be the mean, or the variance of strings? Of molecules?
Very recent deep neural network success:
The network learns the right similarity measure from the data!
The Support Vector Machine
The Support Vector Machine
Computer algorithm that learns by example to assign labels to objects
The Support Vector Machine
Computer algorithm that learns by example to assign labels to objects
Successful in handwritten digit recognition, credit card fraud detection,
classification of gene expression profiles etc.
The Support Vector Machine
Computer algorithm that learns by example to assign labels to objects
Successful in handwritten digit recognition, credit card fraud detection,
classification of gene expression profiles etc.
Essence of the SVM algorithm requires understanding of:
i. the separating hyperplane
ii. the maximum-margin hyperplane
iii. the soft margin
iv. the kernel function
The Support Vector Machine
Computer algorithm that learns by example to assign labels to objects
Successful in handwritten digit recognition, credit card fraud detection,
classification of gene expression profiles etc.
Essence of the SVM algorithm requires understanding of:
i. the separating hyperplane
ii. the maximum-margin hyperplane
iii. the soft margin
iv. the kernel function
For SVMs and machine learning in general:
i. regularisation
ii. cross-validation
1566 U NU D N U HN G
5
ft r
r f t
12
10
8
6
4
2
0
0 2 4 6 8 10 12
MARCKSL1
MARCKSL1 MARCKSL1 MARCKSL1 MARCKSL1
MARCKSL1 MARCKSL1
HOXA9Z
YX
a
ZY
XZ
YX
f
ZY
X
ZY
X
ZYX
ZY
X
r S S A
A E A S
A
A A A
f A
A E A S
A A S
1 000
P E
©2
006 N
atu
re P
ub
lis
hin
g G
rou
p
htt
p:/
/ww
w.n
atu
re.c
om
/na
ture
bio
tech
no
log
y
Two Genes and Two Forms of Leukemia
(microarrays deliver thousands of genes, but hard to draw ...)
1566 U NU D N U HN G
5
ft r
r f t
MARCKSL1
MARCKSL1 MARCKSL1 MARCKSL1 MARCKSL1
0 2 4 6 8 10 12
MARCKSL1 MARCKSL1
HOXA9
ZY
X
12
10
8
6
4
2
0
ZY
X
b
ZY
X
f
ZY
X
ZY
X
ZYX
ZY
X
r S S A
A E A S
A
A A A
f A
A E A S
A A S
1 000
P E
©2
006 N
atu
re P
ub
lis
hin
g G
rou
p
htt
p:/
/ww
w.n
atu
re.c
om
/na
ture
bio
tech
no
log
y
Separating Hyperplane
1566 U NU D N U HN G
5
ft r
r f t
MARCKSL1
MARCKSL1 MARCKSL1 MARCKSL1 MARCKSL1
0 2 4 6 8 10 12
MARCKSL1 MARCKSL1
HOXA9
ZY
X
ZY
XZ
YX
f
ZY
X
ZY
X
c
ZYX
ZY
X
r S S A
A E A S
A
A A A
f A
A E A S
A A S
1 000
P E
©2
006 N
atu
re P
ub
lis
hin
g G
rou
p
htt
p:/
/ww
w.n
atu
re.c
om
/natu
reb
iote
ch
no
log
y
Separating Hyperplane in 1D — a Point
1566 U NU D N U HN G
5
ft r
r f t
MARCKSL1
MARCKSL1 MARCKSL1 MARCKSL1 MARCKSL1
MARCKSL1 MARCKSL1
HOXA9
ZY
X
ZY
XZ
YX
f
ZY
X
ZY
X
h
d
121086420
–2
0
0
224 4
6
6
8
8
1
1
12
12
ZYX
ZY
X
r S S A
A E A S
A
A A A
f A
A E A S
A A S
1 000
P E
©2
006 N
atu
re P
ub
lis
hin
g G
rou
p
htt
p:/
/ww
w.n
atu
re.c
om
/na
ture
bio
tech
no
log
y
... and in 3D: a plane
1566 U NU D N U HN G
5
ft r
r f t
MARCKSL1
0 20 40 60 80 100 120
MARCKSL1 MARCKSL1 MARCKSL1 MARCKSL1
MARCKSL1 MARCKSL1
HOXA9
ZY
X
ZY
XZ
YX
f
ZY
X
ZY
X
ZYX
12
10
8
6
4
2
0
ZY
X
e
r S S A
A E A S
A
A A A
f A
A E A S
A A S
1 000
P E
©2006 N
atu
re P
ub
lish
ing
Gro
up
h
ttp
://w
ww
.na
ture
.co
m/n
atu
reb
iote
ch
no
log
y
Many Potential Separating Hyperplanes ...
(all “optimal” w.r.t. some loss function)
1566 U NU D N U HN G
5
ft r
r f t
MARCKSL1
MARCKSL1
0 2 4 6 8 10 12
MARCKSL1 MARCKSL1 MARCKSL1
MARCKSL1 MARCKSL1
HOXA9
ZY
X
ZY
XZ
YX
f
ZY
X
ZY
X
ZYX
12
10
8
6
4
2
0
ZY
X
r S S A
A E A S
A
A A A
f A
A E A S
A A S
1 000
P E©
20
06 N
atu
re P
ub
lis
hin
g G
rou
p
htt
p:/
/ww
w.n
atu
re.c
om
/na
ture
bio
tech
no
log
y The Maximum-Margin Hyperplane
1566 U NU D N U HN G
5
ft r
r f t
MARCKSL1
MARCKSL1 MARCKSL1
0 2 4 6 8 10 12
MARCKSL1 MARCKSL1
MARCKSL1 MARCKSL1
HOXA9
ZY
X
ZY
XZ
YX
f
ZY
X
g
ZY
X
ZYX
12
10
8
6
4
2
0
ZY
X
r S S A
A E A S
A
A A A
f A
A E A S
A A S
1 000
P E
©2006 N
atu
re P
ub
lish
ing
Gro
up
h
ttp
://w
ww
.natu
re.c
om
/natu
reb
iote
ch
no
log
y
What to Do With Outliers?
1566 U NU D N U HN G
5
ft r
r f t
MARCKSL1
MARCKSL1 MARCKSL1 MARCKSL1
0 2 4 6 8 10 12
MARCKSL1
MARCKSL1 MARCKSL1
HOXA9
ZY
X
ZY
XZ
YX
f
l
ZY
X
ZY
X
h
ZYX
12
10
8
6
4
2
0
ZY
X
r S S A
A E A S
A
A A A
f A
A E A S
A A S
1 000
P E
©2006 N
atu
re P
ub
lish
ing
Gro
up
h
ttp
://w
ww
.na
ture
.co
m/n
atu
reb
iote
ch
no
log
y The Soft-Margin Hyperplane
1566 U NU D N U HN G
5
ft r
r f t
MARCKSL1
MARCKSL1 MARCKSL1 MARCKSL1 MARCKSL1
MARCKSL1 MARCKSL1
HOXA9
ZY
X
ZY
XZ
YX
f
ZY
X
ZY
X
ZYX
ZY
X
i
–1 –5 0 5 1
Expression
r S S A
A E A S
A
A A A
f A
A E A S
A A S
1 000
P E
©2
006 N
atu
re P
ub
lis
hin
g G
rou
p
htt
p:/
/ww
w.n
atu
re.c
om
/natu
reb
iote
ch
no
log
y
The Kernel Function in 1D
1566 U NU D N U HN G
5
ft r
r f t
MARCKSL1
MARCKSL1 MARCKSL1 MARCKSL1 MARCKSL1
MARCKSL1 MARCKSL1
HOXA9
ZY
X
ZY
XZ
YX
f
1.0× 1e6
0.8
0.6
0.4
0.2
0
Expre
ssio
n *
expre
ssio
n
j
ZY
X
ZY
X
ZYX
ZY
X
–1 –5 0 5 1
Expression
r S S A
A E A S
A
A A A
f A
A E A S
A A S
1 000
P E
©2
006 N
atu
re P
ub
lis
hin
g G
rou
p
htt
p:/
/ww
w.n
atu
re.c
om
/natu
reb
iote
ch
no
log
y
Mapping the 1D data to 2D (here: squaring)
Not linearly separable in input space ...
wT ( )T ( )v
( w)T ( v)
w v
Remem er t at f r a tw matrices A a B t at ca e m l-ti lie (AB)T BTAT B si t e ect rs v a w are r -tate s c t at t e c i ci e wit t e ri ci al c m e ts
After t at t e are rescale si t e ia al matriI t is c r i ate s stem t e i er r ct w v K am tst a sta ar i er r ct I r er f r K t im leme t ai er r ct all ei e al es a e t e siti e If t e weret t ere c l e ect rs wit a s are le t smaller t a
zer clearl i c tra icti wit E cli ea i t iti s Allt is ill strates t e cl se c ecti s etwee t e sta ar i -er r ct E cli ea s ace a siti e efi ite matricesP siti e efi ite matrices ca efi e a i er r ct If t ec r i ate a is are r tate a rescale a r riatel t is i -er r ct ec mes a sta ar i er r ct a t eref reca i ce a rm a metric a a les t at e a e li e t efamiliar E cli ea es
Pr t ty es Ort lity
As a e am le c si er t e f ll wi classificati r -lem T e left a el f Fi re s ws tw cate ries rawfr m tw Ga ssia istri ti s Eac i t is a stim l st at is escri e tw ime si s T e ime si s arei l c rrelate f r t stim l s classes O a first la cea r t t e lear er will t fi a ecisi tse arate t e tw classes T e tw mea s f r t e tw classesare c ecte wit a s li li e a t e ecisi res lt-i fr m a r t t e classifier is als s w as a s li li eAs t e ecisi is rt al t t e s rtest c ec-ti etwee t e tw mea s it ca t a e res ect t t ec rrelati s i t e classes T e r lem es t act all liei t e r t t e classifier as s c it lies i t e i er r ctt at is se t efi e rt alit A r t t e lear er t ates t fail f r e e t e sim lest cate r str ct res s l
ta e t e c aria ce f t e stim l s ime si s i t acc t(Ree ; Frie & H l a ; As & G tt )If we ta e t e i er r ct K t e i e a siti eefi ite matri K t at is t e i erse f t e c aria ce matrif t e classes we et t e "ri t" efi iti f rt alitT e res lti ecisi is e icte as a as e li e Kis t e i erse f a siti e efi ite matri (c aria ce matri-ces are alwa s siti e efi ite) a is t eref re als a s-iti e efi ite matri He ce it c rres s t t e sta ari er r ct after r tati a scali t e s ace wit t e
matrices a T e mi le a el i Fi re s wst e r tate s ace a t e ri t a el s ws t e s ace afterscali I t e tra sf rme s ace t e tw classes t a ec rrelate a es a m re a t eref re t e r t t e classi-fier wit t e sta ar i er r ct i t is tra sf rme s aceca classif all stim li c rrectl
Figure 3. The crosses and the circles cannot be separated by a
linear perceptron in the plane.
N -li e r Cl ssi c ti Pr lems
A li ear classifier li e t e erce tr is a er attracti emet f r classificati eca se it il s str e -metric i t iti s a t e e tremel well- e el e mat e-matics f li ear al e ra H we er t ere are r lems t at ali ear classifier ca t s l e at least t irectl As se -eral s c l ical t e ries f cate rizati are ase li -ear classifiers t is iss e as als attracte s me atte ti it e s c l ical literat re (Me i & Sc wa e fl el ;Smit M rra & Mi a ) O e e am le f a r lemt at ca t e s l e wit a li ear classifier ca e see iFi re F r a l time t e m st lar a r ac t s l e
-li ear r lems li e t is e was t se a m lti-la ererce tr M lti-la er erce tr s are w t e a le ta r imate a f cti (H r i Sti c c m e & W ite
) a ca e trai e e cie tl si t e ac r -a ati al rit m (R mel art Hi t & Williams )T e a r ac t at will e rese te ere is f ame talli ere t I stea f tr i t lear a c m licate ecisif cti t e strate is t se a -li ear f cti t mat e i t atter s i t a s ace w ere t e r lem ca es l e a li ear classifier T e f ll wi t -e am le il-l strates t is a r ac
Fi re s ws e am les fr m tw classes (cr sses acircles) t at ca t e se arate a er la e i t e i -t s ace (i e a strai t li e i tw ime si s) I steaf tr i t classif t e e am les i t e i t s ace t at isi e t e al es x a x t e ata are tra sf rme ia -li ear wa Li ear classificati f t e ata is t e at-tem te i t e tra sf rme s ace I mac i e lear i s c a
-li ear tra sf rm is calle a feat re ma a t e res ltis ace a feat re s ace T e term “feat re” is alrea ea ilerl a e i s c l T eref re we will se t e m ree tral terms li earizati f cti a li earizati s ace
Map from 2D to 3D ...
Fi re W et er a r t t e classifier ca se arate tw classes e e s als t e i er r ct t at is c se T e left a el s ws twclasses (circles a cr sses) wit i l c rrelate ime si s i t is case t e sta ar i er r ct is t a r riate T e s rt s li
li e c ects t e mea s f t e cate r istri ti s a t e l s li li e is t e c rres i ecisi w e t e sta ar i err ct is se T e as e li e e icts a ecisi wit a i ere t i er r ct T is i er r ct c rres s t t e sta ari er r ct i t e s ace e icte t e ri t t at ca e tai e r tati a rescali t e ri i al s ace
Fi re T e cr sses a circles fr m Fi re ca e ma e
t a t ree- ime si al s ace i w ic t e ca e se arate ali ear erce tr
i stea F r e am le c si er t e f ll wi li earizatif cti : R !→ R
Φ(x) =
φ1(x)φ2(x)φ3(x)
=
x21√
2x1x2x22
. ( )
T is tra sf rmati ma s t e e am le atter s t a t ree-ime si al s ace t at is e icte i Fi re T e e am lesli e a tw - ime si alma if l f t is t ree- ime si als ace I t is s ace t e tw classes ec me li earl se a-ra le t at is it is ssi le t fi a tw - ime si al la es c t at t e circles fall e si e a t e cr sses t et er si e T is s ws t at wit a a r riate -li eartra sf rmati f t e i t a sim le li ear classifier ca s l e
t e r lem T e li earizati a r ac is a i t rescalit e ata i ata-a al sis ef re fitti a li ear m el I t ec rre t e am le eac er la e i t e li earizati s aceefi es a a ratic e ati i t e i t s ace He ce itis ssi le t eal wit a ratic (i e -li ear) f cti s
l si li ear met s I e eral t e strate is tre r cess t e ata wit t e el f a f cti s c t at ali ear erce tr m el is li el t e a lica le F rmallt is ca e e resse as
w (x)
i
wi i(x) ( )
w ere is w t e ime si f t e li earizati s ace aw is a wei t ect r i t e li earizati s ace It is cleart at t ere is a wi e ariet f -li ear f cti s t at cae se t re r cess t e i t I fact t is a r ac waser lar i t e earl a s f mac i e lear i (Nilss
) T e r lem is f c rse t at t e f cti as t ec se ef re lear i ca r cee I r t e am le wea e l s w w e ca se li ear met s t eal wita ratic f cti s t s all e will t w i a a ce
w et er it is ssi le t se arate t e ata wit a a raticf cti H we er if is c se t e s cie tl fle i lee i stea f a a ratic f cti wit l t ree c e -cie ts e c l c se a i r er l mial wit mac e cie ts t e it ma e ssi le t a r imate e e erc m licate ecisi f cti s T is c mes at t e c st fi creasi t e ime si alit f t e li earizati s ace at eref re earl mac i e lear i researc as trie t a it is
T e term li earizati s ace was se i a earl a er er-
el met s (Aizerma Bra erma & R z er )
... linear separability in 3D
(actually: data still 2D, “live” on a manifold of original D!)
Fi re W et er a r t t e classifier ca se arate tw classes e e s als t e i er r ct t at is c se T e left a el s ws twclasses (circles a cr sses) wit i l c rrelate ime si s i t is case t e sta ar i er r ct is t a r riate T e s rt s li
li e c ects t e mea s f t e cate r istri ti s a t e l s li li e is t e c rres i ecisi w e t e sta ar i err ct is se T e as e li e e icts a ecisi wit a i ere t i er r ct T is i er r ct c rres s t t e sta ari er r ct i t e s ace e icte t e ri t t at ca e tai e r tati a rescali t e ri i al s ace
Figure 4. The crosses and circles from Figure 3 can be mapped
to a three-dimensional space in which they can be separated by alinear perceptron.
i stea F r e am le c si er t e f ll wi li earizatif cti : R R
(x)
(x)(x)(x)
x
x x
x
( )
T is tra sf rmati ma s t e e am le atter s t a t ree-ime si al s ace t at is e icte i Fi re T e e am lesli e a tw - ime si alma if l f t is t ree- ime si als ace I t is s ace t e tw classes ec me li earl se a-ra le t at is it is ssi le t fi a tw - ime si al la es c t at t e circles fall e si e a t e cr sses t et er si e T is s ws t at wit a a r riate -li eartra sf rmati f t e i t a sim le li ear classifier ca s l e
t e r lem T e li earizati a r ac is a i t rescalit e ata i ata-a al sis ef re fitti a li ear m el I t ec rre t e am le eac er la e i t e li earizati s aceefi es a a ratic e ati i t e i t s ace He ce itis ssi le t eal wit a ratic (i e -li ear) f cti s
l si li ear met s I e eral t e strate is tre r cess t e ata wit t e el f a f cti s c t at ali ear erce tr m el is li el t e a lica le F rmallt is ca e e resse as
w (x)
i
wi i(x) ( )
w ere is w t e ime si f t e li earizati s ace aw is a wei t ect r i t e li earizati s ace It is cleart at t ere is a wi e ariet f -li ear f cti s t at cae se t re r cess t e i t I fact t is a r ac waser lar i t e earl a s f mac i e lear i (Nilss
) T e r lem is f c rse t at t e f cti as t ec se ef re lear i ca r cee I r t e am le wea e l s w w e ca se li ear met s t eal wita ratic f cti s t s all e will t w i a a ce
w et er it is ssi le t se arate t e ata wit a a raticf cti H we er if is c se t e s cie tl fle i lee i stea f a a ratic f cti wit l t ree c e -cie ts e c l c se a i r er l mial wit mac e cie ts t e it ma e ssi le t a r imate e e erc m licate ecisi f cti s T is c mes at t e c st fi creasi t e ime si alit f t e li earizati s ace at eref re earl mac i e lear i researc as trie t a it is
T e term li earizati s ace was se i a earl a er er-
el met s (Aizerma Bra erma & R z er )
1566 U NU D N U HN G
5
ft r
r f t
MARCKSL1
MARCKSL1 MARCKSL1 MARCKSL1 MARCKSL1
MARCKSL1 MARCKSL1
HOXA9
ZY
X
ZY
XZ
YX
f
10
8
6
4
2
0
k
ZY
X
ZY
X
ZYX
ZY
X
0 2 4 6 8 10
Expression
r S S A
A E A S
A
A A A
f A
A E A S
A A S
1 000
P E©
20
06 N
atu
re P
ub
lis
hin
g G
rou
p
htt
p:/
/ww
w.n
atu
re.c
om
/natu
reb
iote
ch
no
log
y
Projecting the 4D Hyperplane Back into 2D Input Space
SVM magic?
SVM magic?
For any consistent dataset there is a kernel that allows perfect
separation of the data
SVM magic?
For any consistent dataset there is a kernel that allows perfect
separation of the data
Why bother with soft-margins?
SVM magic?
For any consistent dataset there is a kernel that allows perfect
separation of the data
Why bother with soft-margins?
The so-called curse of dimensionality: as the number of variables
considered increases, the number of possible solutions increases
exponentially … overfitting looms large!
1566 U NU D N U HN G
5
ft r
r f t
MARCKSL1
MARCKSL1 MARCKSL1 MARCKSL1 MARCKSL1
MARCKSL1 MARCKSL1
HOXA9
ZY
X
ZY
XZ
YX
f
10
8
6
4
2
0
l
ZY
X
ZY
X
ZYX
ZY
X
0 2 4 6 8 10
Expression
r S S A
A E A S
A
A A A
f A
A E A S
A A S
1 000
P E
©2
006 N
atu
re P
ub
lis
hin
g G
rou
p
htt
p:/
/ww
w.n
atu
re.c
om
/natu
reb
iote
ch
no
log
y
Overfitting
Regularisation & Cross-validation
Regularisation & Cross-validation
Find a compromise between complexity and classification performance,
i.e. kernel function and soft-margin
Regularisation & Cross-validation
Find a compromise between complexity and classification performance,
i.e. kernel function and soft-margin
Penalise complex functions via a regularisation term or regulariser
Regularisation & Cross-validation
Find a compromise between complexity and classification performance,
i.e. kernel function and soft-margin
Penalise complex functions via a regularisation term or regulariser
Cross-validate the results (leave-one-out or 10-fold typically used)
SVM Summary
SVM Summary
Kernel essential—best kernel typically found by trial-and-error and
experience with similar problems etc.
SVM Summary
Kernel essential—best kernel typically found by trial-and-error and
experience with similar problems etc.
Inverting not always easy; need approximations etc. (i.e. science hard,
engineering easy as they don’t care as long as it works!)
SVM Summary
Kernel essential—best kernel typically found by trial-and-error and
experience with similar problems etc.
Inverting not always easy; need approximations etc. (i.e. science hard,
engineering easy as they don’t care as long as it works!)
Theoretically sound and a convex optimisation (no local minima)
SVM Summary
Kernel essential—best kernel typically found by trial-and-error and
experience with similar problems etc.
Inverting not always easy; need approximations etc. (i.e. science hard,
engineering easy as they don’t care as long as it works!)
Theoretically sound and a convex optimisation (no local minima)
Choose between:
• complicated decision functions and training (neural networks)
• clear theoretical foundation (best possible generalisation), convex
optimisation but need to trade-off complexity versus soft-margin and skilful
selection of the “right” kernel.
(= “correct” non-linear similarity measure for the data!)
Regularisation, Cross-Validation and Kernels
Much of the success of modern machine learning methods can attributed to three ideas:
Regularisation, Cross-Validation and Kernels
Much of the success of modern machine learning methods can attributed to three ideas:
1. Regularisation. Given are N “datapoints” (xi,yi) with …
and a model f . Then the “error” between data and model is:
In machine learning we not only take the “error” between model and data into account but
in addition a measure of the complexity of the model f:
x = x1, ..., xN
y = y1, ..., yN
E(y, f(x))
E(y, f(x)) + λR(f)
Regularisation, Cross-Validation and Kernels
Much of the success of modern machine learning methods can attributed to three ideas:
1. Regularisation. Given are N “datapoints” (xi,yi) with …
and a model f . Then the “error” between data and model is:
In machine learning we not only take the “error” between model and data into account but
in addition a measure of the complexity of the model f:
2. Cross-Validation. Regularisation is related to the prior in Bayesian statistics. Unlike
in Bayesian statistics the trade-off between small error and low-complexity of the
model is controlled by a parameter λ— this is optimized using cross-validation.
x = x1, ..., xN
y = y1, ..., yN
E(y, f(x))
E(y, f(x)) + λR(f)
Regularisation, Cross-Validation and Kernels
Much of the success of modern machine learning methods can attributed to three ideas:
1. Regularisation. Given are N “datapoints” (xi,yi) with …
and a model f . Then the “error” between data and model is:
In machine learning we not only take the “error” between model and data into account but
in addition a measure of the complexity of the model f:
2. Cross-Validation. Regularisation is related to the prior in Bayesian statistics. Unlike
in Bayesian statistics the trade-off between small error and low-complexity of the
model is controlled by a parameter λ— this is optimized using cross-validation.
3. Non-linear mapping with linear separation.
True for kernels as well as DNNs.
x = x1, ..., xN
y = y1, ..., yN
E(y, f(x))
E(y, f(x)) + λR(f)
❸
What changed vision research in 2012?
What changed vision research in 2012?
ImageNet challenge: 1000 categories, 1.2 million training images.
What changed vision research in 2012?
ImageNet challenge: 1000 categories, 1.2 million training images.
AlexNet by Krizhevsky, Sutskever & Hinton (2012) appears on the stage, and
basically reduces the prediction error by nearly 50%:
What changed vision research in 2012?
ImageNet challenge: 1000 categories, 1.2 million training images.
AlexNet by Krizhevsky, Sutskever & Hinton (2012) appears on the stage, and
basically reduces the prediction error by nearly 50%:
VisionDeep CNN
LanguageGenerating RNN
A group of people shopping at an outdoor
market.
There are many vegetables at the
fruit stand.
A woman is throwing a frisbee in a park.
A little girl sitting on a bed with a teddy bear. A group of people sitting on a boat in the water. A giraffe standing in a forest withtrees in the background.
A dog is standing on a hardwood floor. A stop sign is on a road with amountain in the background
?−→
Problem of finding a sharp image from a blurry photo:
Blind Image Deconvolution
modified from Michael Hirsch
from Michael Hirsch
from Michael Hirsch
Sequence of Blurry Photos (Image Burst)
from Michael Hirsch
Sequence of Blurry Photos (Image Burst)
from Michael Hirsch
Sequence of Blurry Photos (Image Burst)
from Michael Hirsch
Sequence of Blurry Photos (Image Burst)
from Michael Hirsch
Result of Proposed Image Burst Deblurring Method
from Michael Hirsch
EnhanceNet: Photo-realistic Super-resolution
from Michael Hirsch
EnhanceNet: Photo-realistic Super-resolution
from Michael Hirsch
from Michael Hirsch
from Michael Hirsch
Autonomous cars
Autonomous cars
Fundamentals of Neural Networks
Interest in shallow, 2-layer artificial neural networks (ANN)—so-called
perceptrons—began in the late 1950s and early 60s (Frank Rosenblatt),
based on Warren McCulloch and Walter Pitts’s as well Donald Hebb’s ideas
of computation by neurons from the 1940s.
https://kimschmidtsbrain.files.wordpress.com/2015/10/perceptron.jpg
http://cambridgemedicine.org/sites/default/files/styles/large/public/field/image/DonaldOldingHebb.jpg?itok=py9Uh4D5
Fundamentals of Neural Networks
Interest in shallow, 2-layer artificial neural networks (ANN)—so-called
perceptrons—began in the late 1950s and early 60s (Frank Rosenblatt),
based on Warren McCulloch and Walter Pitts’s as well Donald Hebb’s ideas
of computation by neurons from the 1940s.
Second wave of ANN research and interest in psychology—often termed
connectionism—after the publication of the parallel distributed processing
(PDP) books by David Rumelhart and James McClelland (1986), using the
backpropagation algorithm as a learning rule for multi-layer networks.
Fundamentals of Neural Networks
Interest in shallow, 2-layer artificial neural networks (ANN)—so-called
perceptrons—began in the late 1950s and early 60s (Frank Rosenblatt),
based on Warren McCulloch and Walter Pitts’s as well Donald Hebb’s ideas
of computation by neurons from the 1940s.
Second wave of ANN research and interest in psychology—often termed
connectionism—after the publication of the parallel distributed processing
(PDP) books by David Rumelhart and James McClelland (1986), using the
backpropagation algorithm as a learning rule for multi-layer networks.
Three-layer network with (potentially infinitely many) hidden units in the
intermediate layer is a universal function approximator (Kurt Hornik, 1991).
Fundamentals of Neural Networks
Interest in shallow, 2-layer artificial neural networks (ANN)—so-called
perceptrons—began in the late 1950s and early 60s (Frank Rosenblatt),
based on Warren McCulloch and Walter Pitts’s as well Donald Hebb’s ideas
of computation by neurons from the 1940s.
Second wave of ANN research and interest in psychology—often termed
connectionism—after the publication of the parallel distributed processing
(PDP) books by David Rumelhart and James McClelland (1986), using the
backpropagation algorithm as a learning rule for multi-layer networks.
Three-layer network with (potentially infinitely many) hidden units in the
intermediate layer is a universal function approximator (Kurt Hornik, 1991).
Non-convex optimization problems during backpropagation training, and lack
of data and computing power limited the usefulness of the ANNs:
Fundamentals of Neural Networks
Interest in shallow, 2-layer artificial neural networks (ANN)—so-called
perceptrons—began in the late 1950s and early 60s (Frank Rosenblatt),
based on Warren McCulloch and Walter Pitts’s as well Donald Hebb’s ideas
of computation by neurons from the 1940s.
Second wave of ANN research and interest in psychology—often termed
connectionism—after the publication of the parallel distributed processing
(PDP) books by David Rumelhart and James McClelland (1986), using the
backpropagation algorithm as a learning rule for multi-layer networks.
Three-layer network with (potentially infinitely many) hidden units in the
intermediate layer is a universal function approximator (Kurt Hornik, 1991).
Non-convex optimization problems during backpropagation training, and lack
of data and computing power limited the usefulness of the ANNs:
Universal function approximator in theory, but in practice three-layer ANNs
could often not successfully solve complex problems.
Fundamentals of Neural Networks (cont’d)
Breakthrough again with so-called deep neural networks or DNNs, widely
known since the 2012 NIPS-paper by Alex Krizhevsky, Ilya Sutskever &
Geoffrey Hinton.
Fundamentals of Neural Networks (cont’d)
Breakthrough again with so-called deep neural networks or DNNs, widely
known since the 2012 NIPS-paper by Alex Krizhevsky, Ilya Sutskever &
Geoffrey Hinton.
https://www.wired.com/wp-content/uploads/
blogs/wiredenterprise/wp-content/uploads/
2013/03/hinton1.jpg
Fundamentals of Neural Networks (cont’d)
Breakthrough again with so-called deep neural networks or DNNs, widely
known since the 2012 NIPS-paper by Alex Krizhevsky et al.
DNN: loose terminology to refer to networks with at least two hidden or
intermediate layers, typically at least five to ten (or up to dozens):
Fundamentals of Neural Networks (cont’d)
Breakthrough again with so-called deep neural networks or DNNs, widely
known since the 2012 NIPS-paper by Alex Krizhevsky et al.
DNN: loose terminology to refer to networks with at least two hidden or
intermediate layers, typically at least five to ten (or up to dozens):
1. Massive increase in labelled training data (“the internet”),
2. computing power (GPUs),
3. simple non-linearity (ReLU) instead of sigmoid,
4. convolutional rather than fully connected layers,
and
5. weight sharing across deep layers
appear to be the critical ingredients for the current success of DNNs, and
makes them the current method of choice in ML, particular in application.
Fundamentals of Neural Networks (cont’d)
Breakthrough again with so-called deep neural networks or DNNs, widely
known since the 2012 NIPS-paper by Alex Krizhevsky et al.
DNN: loose terminology to refer to networks with at least two hidden or
intermediate layers, typically at least five to ten (or up to dozens):
1. Massive increase in labelled training data (“the internet”),
2. computing power (GPUs),
3. simple non-linearity (ReLU) instead of sigmoid,
4. convolutional rather than fully connected layers,
and
5. weight sharing across deep layers
appear to be the critical ingredients for the current success of DNNs, and
makes them the current method of choice in ML, particular in application.
At least superficially DNNs appear to be similar to the human object
recognition system: convolutions (“filters”, “receptive fields”) followed by
non-linearities and pooling is thought to be the canonical computation of
cortex, at least within sensory areas.
Fundamentals of Neural Networks
90
a
Linear
Threshold
SigmoidRecti!ed linear
1
0
–1
–2 –1 21
1
0
y
y
bw1
x1
w2
x2
z = b + Σ xiwi
i
Annu. R
ev. V
is. S
ci. 2015.1
:417-4
46. D
ow
nlo
aded
fro
m w
ww
.annual
revie
ws.
org
Acc
ess
pro
vid
ed b
y W
IB6134 -
Univ
ersi
ty o
f T
ueb
ingen
on 0
2/1
8/1
6. F
or
per
sonal
use
only
.
Kriegeskorte (2015)
Fundamentals of Neural Networks
91
a b cy
2
y1
x1
x2
W2
W1
y1
x2
x1
y1
x2
x1
y2 = f (f (x W
1) • W2)y2 = x W
1 W
2 = x W'
approximator:
An
nu
. R
ev.
Vis
. S
ci.
20
15
.1:4
17
-44
6.
Do
wn
load
ed f
rom
ww
w.a
nn
ual
rev
iew
s.o
rg A
cces
s p
rov
ided
by
WIB
61
34
- U
niv
ersi
ty o
f T
ueb
ing
en o
n 0
2/1
8/1
6.
Fo
r p
erso
nal
use
on
ly.
Kriegeskorte (2015)
Example: VGG-16
VGG16 by Simonyan & Zisserman (2014); 92.7% top-5 test accuracy on ImageNet
http
s://
ww
w.c
s.to
ron
to.e
du/~
fros
sard
/pos
t/vg
g16/
#arc
hit
ectu
re
http://scs.ryerson.ca/~aharley/vis/conv/flat.html
Deep Neural Networks (DNNs)
18
2
Input(2)
Output(1 sigmoid)
Hidden(2 sigmoid)
a b
yy
xy
x=yz
xy
z yzz
y=Δ Δ
Δ Δ
Δ Δz yz
xy x=
xz
yz
xxy
=
H1
❹
Adversarial attacks?
Szegedy et al. (2014)
Adversarial examples? (cont’d)
Reese
Witherspoon
Sharif et al. (2016)
Adversarial examples? (cont’d)
Reese
Witherspoon
Sharif et al. (2016)
Adversarial examples? (cont’d)
Reese
Witherspoon
Russel
Crowe
Sharif et al. (2016)
Adversarial examples? (cont’d)
Reese
Witherspoon
Russel
Crowe
Sharif et al. (2016)
Adversarial examples? (cont’d)
Sharif et al. (2016)
DARPA Challenge 2015
DARPA Challenge 2015
Boston Dynamics 2017
Boston Dynamics 2017
Human versus artificial intelligence
We learn unsupervised or semi-supervised, sometimes reinforcement, very
rarely supervised (school, University) – all successful AI is currently
supervised only, i.e. only when the correct answer is known!
We can do lots of things using the same network (or a set of closely coupled
networks) — all DNNs are typically only good at one or few tasks.
101
❺
Gesellschaftliche Herausforderungen
Arbeitsbedingungen und Arbeitsmarkt:
Einsatz von Technologie macht die Arbeit “einfacher” – typischerweise fällt die
Notwendigkeit einer Lehre oder Ausbildung weg.
Die Folge sind sinkende Löhne … schließlich kann “jeder” die Arbeit machen.
103
Arbeitslosigkeit?
104
Arbeitslosigkeit?
Autonome Fahrzeuge – womöglich kurz nach der Erlaubnis, solche Fahrzeuge im Straßenverkehr zu haben, die Pflicht, nur noch damit zu fahren.
104
Arbeitslosigkeit?
Autonome Fahrzeuge – womöglich kurz nach der Erlaubnis, solche Fahrzeuge im Straßenverkehr zu haben, die Pflicht, nur noch damit zu fahren.
540.000 Berufskraftfahrer in Deutschland (Stand 2013)250.000 Taxifahrerlaubnisse (Stand 2017) 25.000 Lokführer (Stand 2017)815.000 Arbeitsplätze gefährdet (Quote von 5.8% auf 8.1%)
104
Arbeitslosigkeit?
Autonome Fahrzeuge – womöglich kurz nach der Erlaubnis, solche Fahrzeuge im Straßenverkehr zu haben, die Pflicht, nur noch damit zu fahren.
540.000 Berufskraftfahrer in Deutschland (Stand 2013)250.000 Taxifahrerlaubnisse (Stand 2017) 25.000 Lokführer (Stand 2017)815.000 Arbeitsplätze gefährdet (Quote von 5.8% auf 8.1%)
Roboter in der Post? Abfallwirtschaft? Logistik?Deutsche Post DHL hat 211.000 Mitarbeiter in Deutschland (Stand 2016), in der Ver- und Entsorgung arbeiteten 2014 ca. 155.000 Menschen, als Reinigungskräfte 2014 offiziell fast 760.000; Amazon beschäftigt alleine in D 23.000 Menschen in Logistik-Zentren: 1.150.000 Arbeitsplätze!
104
Arbeitslosigkeit?
Autonome Fahrzeuge – womöglich kurz nach der Erlaubnis, solche Fahrzeuge im Straßenverkehr zu haben, die Pflicht, nur noch damit zu fahren.
540.000 Berufskraftfahrer in Deutschland (Stand 2013)250.000 Taxifahrerlaubnisse (Stand 2017) 25.000 Lokführer (Stand 2017)815.000 Arbeitsplätze gefährdet (Quote von 5.8% auf 8.1%)
Roboter in der Post? Abfallwirtschaft? Logistik?Deutsche Post DHL hat 211.000 Mitarbeiter in Deutschland (Stand 2016), in der Ver- und Entsorgung arbeiteten 2014 ca. 155.000 Menschen, als Reinigungskräfte 2014 offiziell fast 760.000; Amazon beschäftigt alleine in D 23.000 Menschen in Logistik-Zentren: 1.150.000 Arbeitsplätze!
Humanoide Roboter in der Pflege?2014 arbeiteten in der Alten- und Krankenpflege in D über 900.000 Menschen … .
104
Gesellschaftliche Herausforderungen
Arbeitsbedingungen und Arbeitsmarkt:
Einsatz von Technologie macht die Arbeit “einfacher” – typischerweise fällt die
Notwendigkeit einer Lehre oder Ausbildung weg.
Die Folge sind sinkende Löhne … schließlich kann “jeder” die Arbeit machen.
Politik und Gesellschaft:
Leben in der selben Wirklichkeit? Personalisierte Information in sozialen Medien
und der Verlust breit und kontrovers informierender Quellen – weit verbreiteter
Konsum von Propaganda.
105
Propaganda
Propaganda ist der Versuch der gezielten Beeinflussung des Denkens,
Handelns und Fühlens von Menschen. Wer Propaganda betreibt, verfolgt
damit immer ein bestimmtes Interesse. … Charakteristisch für Propaganda
ist, dass sie die verschiedenen Seiten einer Thematik nicht darlegt und
Meinung und Information vermischt. Wer Propaganda betreibt, möchte
nicht diskutieren und mit Argumenten überzeugen, sondern mit allen Tricks
die Emotionen und das Verhalten der Menschen beeinflussen, beispielsweise
indem sie diese ängstigt, wütend macht oder ihnen Verheißungen ausspricht.
Propaganda nimmt dem Menschen das Denken ab und gibt ihm stattdessen
das Gefühl, mit der übernommenen Meinung richtig zu liegen.
Quelle: Bundeszentrale für politische Bildungwww.bpb.de
106
Gesellschaftliche Herausforderungen
Arbeitsbedingungen und Arbeitsmarkt:
Einsatz von Technologie macht die Arbeit “einfacher” – typischerweise fällt die
Notwendigkeit einer Lehre oder Ausbildung weg.
Die Folge sind sinkende Löhne … schließlich kann “jeder” die Arbeit machen.
Politik und Gesellschaft:
Leben in der selben Wirklichkeit? Personalisierte Information in sozialen Medien
und der Verlust breit und kontrovers informierender Quellen – weit verbreiteter
Konsum von Propaganda.
Privatsphäre? Veränderung (zwischenmenschlicher) Kommunikation?
107
Weapons of Mass Destruction (WMDs)
https://www.wired.com/images_blogs/dangerroom/2011/03/powell_un_anthrax.jpg
Gesellschaftliche Herausforderungen
Arbeitsbedingungen und Arbeitsmarkt:
Einsatz von Technologie macht die Arbeit “einfacher” – typischerweise fällt die
Notwendigkeit einer Lehre oder Ausbildung weg.
Die Folge sind sinkende Löhne … schließlich kann “jeder” die Arbeit machen.
Politik und Gesellschaft:
Leben in der selben Wirklichkeit? Personalisierte Information in sozialen Medien
und der Verlust breit und kontrovers informierender Quellen – weit verbreiteter
Konsum von Propaganda.
Privatsphäre? Veränderung (zwischenmenschlicher) Kommunikation?
Naïver Glaube an die Objektivität von Algorithmen
… und Ranglisten, die Vermessung und Quantifizierung des Lebens:
China, z.B., plant das Social Credit System einzuführen.
110
https://de.wikipedia.org/wiki/Nick_Bostrom
Gesellschaftliche Herausforderungen
Arbeitsbedingungen und Arbeitsmarkt:
Einsatz von Technologie macht die Arbeit “einfacher” – typischerweise fällt die
Notwendigkeit einer Lehre oder Ausbildung weg.
Die Folge sind sinkende Löhne … schließlich kann “jeder” die Arbeit machen.
Politik und Gesellschaft:
Leben in der selben Wirklichkeit? Personalisierte Information in sozialen Medien
und der Verlust breit und kontrovers informierender Quellen – weit verbreiteter
Konsum von Propaganda.
Privatsphäre? Veränderung (zwischenmenschlicher) Kommunikation?
Naïver Glaube an die Objektivität von Algorithmen
… und Ranglisten, die Vermessung und Quantifizierung des Lebens: China plant
das Social Credit System einzuführen.
Doomsday-Szenarien
Kommt die Singularität? Wenn ja: Garten Eden oder Hölle?
112
Doomsday-Videos to watch
Google's Geoffrey Hinton - "There's no reason to think computers won't get much
smarter than us” (10 mins): https://www.youtube.com/watch?v=p6lM3bh-npg
Demis Hassabis, CEO, DeepMind Technologies - The Theory of Everything
(16 mins): https://www.youtube.com/watch?v=rbsqaJwpu6A
Nick Bostrom, What happens when our computers get smarter than we are?
(17 mins): https://www.ted.com/talks/nick_bostrom_what_happens_when_our_computers_get_smarter_than_we_are
Why Elon Musk is worried about artificial intelligence (3 mins)https://www.youtube.com/watch?v=US95slMMQis
Neural Information Processing Group and
Bernstein Center for Computational Neuroscience,
Eberhard Karls Universität Tübingen
Max Planck Institute for Intelligent Systems, Tübingen
Felix Wichmann
Thanks