186
Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen Max Planck Institute for Intelligent Systems, Tübingen Felix Wichmann Maschinelles Lernen: Methoden, Algorithmen, Potentiale und gesellschaftliche Herausforderungen

Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Neural Information Processing Group and

Bernstein Center for Computational Neuroscience,

Eberhard Karls Universität Tübingen

Max Planck Institute for Intelligent Systems, Tübingen

Felix Wichmann

Maschinelles Lernen:

Methoden, Algorithmen, Potentiale und

gesellschaftliche Herausforderungen

Page 2: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

http://www.appblogger.de/wp-content/uploads/2013/03/pb-130314-pope-2005.photoblog900.jpg

Page 3: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

http://msnbcmedia.msn.com/j/MSNBC/Components/Photo/_new/pb-130314-pope-2013.photoblog900.jpg

Page 4: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Page 5: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

One way to think about vision: inverse optics

Laws of physics “generate” 2D images on

our retinae from 3D scenes

(forward optics / rendering)

light source

(e.g. sun light)

object reflectance

amount of light

entering the eye

is a product of

light source intensity

and object reflectance

Page 6: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

One way to think about vision: inverse optics

Laws of physics “generate” 2D images on

our retinae from 3D scenes

(forward optics / rendering)

Starting point to think about visual

perception: we want to infer the 3D scene

from the 2D retinal images:

inverse optics!

light source

(e.g. sun light)

object reflectance

amount of light

entering the eye

is a product of

light source intensity

and object reflectance

Page 7: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

One way to think about vision: inverse optics

Laws of physics “generate” 2D images on

our retinae from 3D scenes

(forward optics / rendering)

Starting point to think about visual

perception: we want to infer the 3D scene

from the 2D retinal images:

inverse optics!

But: Inverse optics is mathematically

impossible.

light source

(e.g. sun light)

object reflectance

amount of light

entering the eye

is a product of

light source intensity

and object reflectance

Page 8: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

N = 0

Page 9: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

N = 1

Page 10: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

N = 2

Page 11: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

N = 5

Page 12: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

N = 9

Page 13: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

N = 24 (considered fully rendered)

Page 14: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

modified from Matthias Bethge

Page 15: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

modified from Matthias Bethge

Page 16: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

modified from Matthias Bethge

illumination

(„light field“)

objects & surfaces

(geometry, materials)

resultingimage

(non-linear) „information entanglement“

Page 17: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

visual inference („untangling“)

illumination

(„light field“)

objects & surfaces

(geometry, materials)

resultingimage

(non-linear) „information entanglement“

modified from Matthias Bethge

Page 18: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen
Page 19: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen
Page 20: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen
Page 21: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen
Page 22: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen
Page 23: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen
Page 24: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen
Page 25: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen
Page 26: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen
Page 27: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen
Page 28: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen
Page 29: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen
Page 30: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Page 31: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Machine learning (ML) and statistics

Page 32: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Machine learning (ML) and statistics

Statistics is the science of learning from data. … [ML] is the science of

learning from data. These fields are identical in intent although they differ in

their history, conventions, emphasis and culture. (Wasserman, 2014)

Page 33: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Machine learning (ML) and statistics

Statistics is the science of learning from data. … [ML] is the science of

learning from data. These fields are identical in intent although they differ in

their history, conventions, emphasis and culture. (Wasserman, 2014)

ML is a comparatively new sub-branch of computational statistics jointly

developed in computer science and statistics.

Page 34: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Machine learning (ML) and statistics

Statistics is the science of learning from data. … [ML] is the science of

learning from data. These fields are identical in intent although they differ in

their history, conventions, emphasis and culture. (Wasserman, 2014)

ML is a comparatively new sub-branch of computational statistics jointly

developed in computer science and statistics.

ML is inference performed by computers based on past observations and

learning algorithms: ML algorithms are mainly concerned with discovering

hidden structure in data in order to predict novel data—exploratory

methods, to get things done!

Page 35: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Machine learning (ML) and statistics

Statistics is the science of learning from data. … [ML] is the science of

learning from data. These fields are identical in intent although they differ in

their history, conventions, emphasis and culture. (Wasserman, 2014)

ML is a comparatively new sub-branch of computational statistics jointly

developed in computer science and statistics.

ML is inference performed by computers based on past observations and

learning algorithms: ML algorithms are mainly concerned with discovering

hidden structure in data in order to predict novel data—exploratory

methods, to get things done!

“Classical” statistics typically is concerned with making precise probabilistic

statements about known data coming from known distributions, i.e. interest

in accurate models of data!

Page 36: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

What is the difference between statistics and machine learning?

Machine Learning is AI people doing data analysis.

Data Mining is database people doing data analysis.

Applied Statistics is statisticians doing data analysis

Infographics is Graphic Designers doing data analysis.

Data Journalism is Journalists doing data analysis.

Econometrics is Economists doing data analysis

(and here you can win a Nobel Prize).

Psychometrics is Psychologists doing data analysis.

Chemometrics and Cheminformatics are Chemists doing data analysis.

Bioinformatics is Biologists doing data analysis.

30Aleks Jakulin, https://www.quora.com/What-is-the-difference-between-statistics-and-machine-learning

Page 37: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

What is the difference between statistics and machine learning? (cont’d)

… if you look at what the goals both fields are trying to achieve, you see that

there is actually quite a big difference:

Statistics is interested in learning something about data, for example, which

have been measured as part of some biological experiment. … . But the

overall goal is to arrive at new scientific insight based on the data.

In Machine Learning, the goal is to solve some complex computational task by

“letting the machine learn”. Instead of trying to understand the problem well

enough to be able to write a program which is able to perform the task (for

example, handwritten character recognition), you instead collect a huge

amount of examples of what the program should do, and then run an

algorithm which is able to perform the task by learning from the examples.

Often, the learning algorithms are statistical in nature. But as long as the

prediction works well, any kind of statistical insight into the data is not

necessary.

31Mikio Braun, https://www.quora.com/What-is-the-difference-between-statistics-and-machine-learning

Page 38: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

What is the difference between statistics and machine learning? (cont’d)

The primary differences are perhaps the types of the problems attacked, and

the goal of learning.

At the risk of data and models oversimplification, one could say that in

statistics a prime focus is often in understanding the data and relationships

in terms of models giving approximate summaries such as linear relations or

independencies. In contrast, the goals in algorithms and machine learning are

primarily to make predictions as accurately as possible and predictions to

understand the behaviour of learning algorithms.

These differing objectives have led to different developments in the two

fields: for example, neural network algorithms have been used extensively as

black-box function approximators in machine learning, but to many

statisticians they are less than satisfactory, because of the difficulties in

interpreting such models.

32Franck Dernoncourt, https://www.quora.com/What-is-the-difference-between-statistics-and-machine-learning

Page 39: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Terminology: types of learning

Page 40: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Terminology: types of learning

Supervised learning is the ML task of inferring a function from labeled training data. In supervised

learning, each example is a pair consisting of an input object (typically a vector) and a desired

output value (also called the supervisory signal). A supervised learning algorithm analyzes the

training data and produces an inferred function, which can be used for prediction.

Page 41: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Terminology: types of learning

Supervised learning is the ML task of inferring a function from labeled training data. In supervised

learning, each example is a pair consisting of an input object (typically a vector) and a desired

output value (also called the supervisory signal). A supervised learning algorithm analyzes the

training data and produces an inferred function, which can be used for prediction.

Reinforcement learning is an area of ML inspired by behaviorist psychology, concerned with how

software agents ought to take actions in an environment so as to maximize some notion of

cumulative reward. Unlike supervised ML correct input/output pairs are never presented, nor sub-

optimal actions explicitly corrected; only global reward for an action.

Page 42: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Terminology: types of learning

Supervised learning is the ML task of inferring a function from labeled training data. In supervised

learning, each example is a pair consisting of an input object (typically a vector) and a desired

output value (also called the supervisory signal). A supervised learning algorithm analyzes the

training data and produces an inferred function, which can be used for prediction.

Reinforcement learning is an area of ML inspired by behaviorist psychology, concerned with how

software agents ought to take actions in an environment so as to maximize some notion of

cumulative reward. Unlike supervised ML correct input/output pairs are never presented, nor sub-

optimal actions explicitly corrected; only global reward for an action.

Unsupervised learning is the ML task of inferring a function to describe hidden structure from

unlabeled data. Since the examples given to the learner are unlabeled, there is no error or reward

signal to evaluate a potential solution. This distinguishes unsupervised learning from supervised

learning and reinforcement learning. A good example is identifying close-knit groups of friends in

social network data; clustering algorithms, like k-means)

Page 43: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Terminology: types of learning

Supervised learning is the ML task of inferring a function from labeled training data. In supervised

learning, each example is a pair consisting of an input object (typically a vector) and a desired

output value (also called the supervisory signal). A supervised learning algorithm analyzes the

training data and produces an inferred function, which can be used for prediction.

Reinforcement learning is an area of ML inspired by behaviorist psychology, concerned with how

software agents ought to take actions in an environment so as to maximize some notion of

cumulative reward. Unlike supervised ML correct input/output pairs are never presented, nor sub-

optimal actions explicitly corrected; only global reward for an action.

Unsupervised learning is the ML task of inferring a function to describe hidden structure from

unlabeled data. Since the examples given to the learner are unlabeled, there is no error or reward

signal to evaluate a potential solution. This distinguishes unsupervised learning from supervised

learning and reinforcement learning. A good example is identifying close-knit groups of friends in

social network data; clustering algorithms, like k-means)

Semi-supervised learning is a class algorithms making use of unlabeled data for training—

typically a small amount of labeled data with a large amount of unlabeled data. Semi-supervised

learning falls between unsupervised learning (without any labeled training data) and supervised

learning (with completely labeled training data).

Page 44: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Terminology: types of problems in supervised ML

Page 45: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Terminology: types of problems in supervised ML

Classification: Problems where we seek a yes-or-no prediction, such as “Is

this tumour cancerous?”, “Does this cookie meet our quality standards?”, and

so on.

Page 46: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Terminology: types of problems in supervised ML

Classification: Problems where we seek a yes-or-no prediction, such as “Is

this tumour cancerous?”, “Does this cookie meet our quality standards?”, and

so on.

Regression: Problems where the value being predicted falls somewhere on a

continuous spectrum. These systems help us with questions of “How much?”

or “How many?”

Page 47: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Terminology: types of problems in supervised ML

Classification: Problems where we seek a yes-or-no prediction, such as “Is

this tumour cancerous?”, “Does this cookie meet our quality standards?”, and

so on.

Regression: Problems where the value being predicted falls somewhere on a

continuous spectrum. These systems help us with questions of “How much?”

or “How many?”

Support vector machine (SVM) is a supervised classification algorithm

Page 48: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Terminology: types of problems in supervised ML

Classification: Problems where we seek a yes-or-no prediction, such as “Is

this tumour cancerous?”, “Does this cookie meet our quality standards?”, and

so on.

Regression: Problems where the value being predicted falls somewhere on a

continuous spectrum. These systems help us with questions of “How much?”

or “How many?”

Support vector machine (SVM) is a supervised classification algorithm

Neural networks, including the now so popular convolutional deep neural

networks (DNNs), are supervised algorithms, too, typically however for multi-

class classification

Page 49: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Success of supervised classification in ML

Page 50: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Success of supervised classification in ML

ML—and in particular kernel methods as well as very recently so-called deep

neural networks (DNNs)—have proven successful whenever there is an

abundance of empirical data but a lack of explicit knowledge how the data

were generated:

Page 51: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Success of supervised classification in ML

ML—and in particular kernel methods as well as very recently so-called deep

neural networks (DNNs)—have proven successful whenever there is an

abundance of empirical data but a lack of explicit knowledge how the data

were generated:

• Predict credit card fraud from patterns of money withdrawals.

Page 52: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Success of supervised classification in ML

ML—and in particular kernel methods as well as very recently so-called deep

neural networks (DNNs)—have proven successful whenever there is an

abundance of empirical data but a lack of explicit knowledge how the data

were generated:

• Predict credit card fraud from patterns of money withdrawals.

• Predict toxicity of novel substances (biomedical research).

Page 53: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Success of supervised classification in ML

ML—and in particular kernel methods as well as very recently so-called deep

neural networks (DNNs)—have proven successful whenever there is an

abundance of empirical data but a lack of explicit knowledge how the data

were generated:

• Predict credit card fraud from patterns of money withdrawals.

• Predict toxicity of novel substances (biomedical research).

• Predict engine failure in airplanes.

Page 54: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Success of supervised classification in ML

ML—and in particular kernel methods as well as very recently so-called deep

neural networks (DNNs)—have proven successful whenever there is an

abundance of empirical data but a lack of explicit knowledge how the data

were generated:

• Predict credit card fraud from patterns of money withdrawals.

• Predict toxicity of novel substances (biomedical research).

• Predict engine failure in airplanes.

• Predict what people will google next.

Page 55: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Success of supervised classification in ML

ML—and in particular kernel methods as well as very recently so-called deep

neural networks (DNNs)—have proven successful whenever there is an

abundance of empirical data but a lack of explicit knowledge how the data

were generated:

• Predict credit card fraud from patterns of money withdrawals.

• Predict toxicity of novel substances (biomedical research).

• Predict engine failure in airplanes.

• Predict what people will google next.

• Predict what people want to buy next at amazon.

Page 56: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

The Function Learning Problem

x

x

x

x

x

x

y

Page 57: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

The Function Learning Problem

x

x

x

x

x

x

y

Page 58: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

The Function Learning Problem

x

x

x

x

x

x

y

Page 59: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Learning Problem in General

Page 60: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Learning Problem in General

Training examples (x1,y1),…,(xm,ym)

Page 61: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Learning Problem in General

Training examples (x1,y1),…,(xm,ym)

Task: given a new x, find the new y

strong emphasis on prediction, that is, generalization!

Page 62: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Learning Problem in General

Training examples (x1,y1),…,(xm,ym)

Task: given a new x, find the new y

strong emphasis on prediction, that is, generalization!

Idea: (x,y) should look “similar” to the training examples

Page 63: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Learning Problem in General

Training examples (x1,y1),…,(xm,ym)

Task: given a new x, find the new y

strong emphasis on prediction, that is, generalization!

Idea: (x,y) should look “similar” to the training examples

Required: similarity measure for (x,y)

Page 64: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Learning Problem in General

Training examples (x1,y1),…,(xm,ym)

Task: given a new x, find the new y

strong emphasis on prediction, that is, generalization!

Idea: (x,y) should look “similar” to the training examples

Required: similarity measure for (x,y)

Much of creativity and difficulty in kernel-based ML: Find suitable similarity

measures for all the practical problems discussed before, e.g. credit card

fraud, toxicity of novel molecules, gene sequences, … .

Page 65: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Learning Problem in General

Training examples (x1,y1),…,(xm,ym)

Task: given a new x, find the new y

strong emphasis on prediction, that is, generalization!

Idea: (x,y) should look “similar” to the training examples

Required: similarity measure for (x,y)

Much of creativity and difficulty in kernel-based ML: Find suitable similarity

measures for all the practical problems discussed before, e.g. credit card

fraud, toxicity of novel molecules, gene sequences, … .

When are two molecules, with different atoms, structure, configuration etc.

the same? When are two strings of letters or sentences similar? What would

be the mean, or the variance of strings? Of molecules?

Page 66: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Learning Problem in General

Training examples (x1,y1),…,(xm,ym)

Task: given a new x, find the new y

strong emphasis on prediction, that is, generalization!

Idea: (x,y) should look “similar” to the training examples

Required: similarity measure for (x,y)

Much of creativity and difficulty in kernel-based ML: Find suitable similarity

measures for all the practical problems discussed before, e.g. credit card

fraud, toxicity of novel molecules, gene sequences, … .

When are two molecules, with different atoms, structure, configuration etc.

the same? When are two strings of letters or sentences similar? What would

be the mean, or the variance of strings? Of molecules?

Very recent deep neural network success:

The network learns the right similarity measure from the data!

Page 67: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

The Support Vector Machine

Page 68: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

The Support Vector Machine

Computer algorithm that learns by example to assign labels to objects

Page 69: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

The Support Vector Machine

Computer algorithm that learns by example to assign labels to objects

Successful in handwritten digit recognition, credit card fraud detection,

classification of gene expression profiles etc.

Page 70: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

The Support Vector Machine

Computer algorithm that learns by example to assign labels to objects

Successful in handwritten digit recognition, credit card fraud detection,

classification of gene expression profiles etc.

Essence of the SVM algorithm requires understanding of:

i. the separating hyperplane

ii. the maximum-margin hyperplane

iii. the soft margin

iv. the kernel function

Page 71: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

The Support Vector Machine

Computer algorithm that learns by example to assign labels to objects

Successful in handwritten digit recognition, credit card fraud detection,

classification of gene expression profiles etc.

Essence of the SVM algorithm requires understanding of:

i. the separating hyperplane

ii. the maximum-margin hyperplane

iii. the soft margin

iv. the kernel function

For SVMs and machine learning in general:

i. regularisation

ii. cross-validation

Page 72: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

1566 U NU D N U HN G

5

ft r

r f t

12

10

8

6

4

2

0

0 2 4 6 8 10 12

MARCKSL1

MARCKSL1 MARCKSL1 MARCKSL1 MARCKSL1

MARCKSL1 MARCKSL1

HOXA9Z

YX

a

ZY

XZ

YX

f

ZY

X

ZY

X

ZYX

ZY

X

r S S A

A E A S

A

A A A

f A

A E A S

A A S

1 000

P E

©2

006 N

atu

re P

ub

lis

hin

g G

rou

p

htt

p:/

/ww

w.n

atu

re.c

om

/na

ture

bio

tech

no

log

y

Two Genes and Two Forms of Leukemia

(microarrays deliver thousands of genes, but hard to draw ...)

Page 73: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

1566 U NU D N U HN G

5

ft r

r f t

MARCKSL1

MARCKSL1 MARCKSL1 MARCKSL1 MARCKSL1

0 2 4 6 8 10 12

MARCKSL1 MARCKSL1

HOXA9

ZY

X

12

10

8

6

4

2

0

ZY

X

b

ZY

X

f

ZY

X

ZY

X

ZYX

ZY

X

r S S A

A E A S

A

A A A

f A

A E A S

A A S

1 000

P E

©2

006 N

atu

re P

ub

lis

hin

g G

rou

p

htt

p:/

/ww

w.n

atu

re.c

om

/na

ture

bio

tech

no

log

y

Separating Hyperplane

Page 74: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

1566 U NU D N U HN G

5

ft r

r f t

MARCKSL1

MARCKSL1 MARCKSL1 MARCKSL1 MARCKSL1

0 2 4 6 8 10 12

MARCKSL1 MARCKSL1

HOXA9

ZY

X

ZY

XZ

YX

f

ZY

X

ZY

X

c

ZYX

ZY

X

r S S A

A E A S

A

A A A

f A

A E A S

A A S

1 000

P E

©2

006 N

atu

re P

ub

lis

hin

g G

rou

p

htt

p:/

/ww

w.n

atu

re.c

om

/natu

reb

iote

ch

no

log

y

Separating Hyperplane in 1D — a Point

Page 75: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

1566 U NU D N U HN G

5

ft r

r f t

MARCKSL1

MARCKSL1 MARCKSL1 MARCKSL1 MARCKSL1

MARCKSL1 MARCKSL1

HOXA9

ZY

X

ZY

XZ

YX

f

ZY

X

ZY

X

h

d

121086420

–2

0

0

224 4

6

6

8

8

1

1

12

12

ZYX

ZY

X

r S S A

A E A S

A

A A A

f A

A E A S

A A S

1 000

P E

©2

006 N

atu

re P

ub

lis

hin

g G

rou

p

htt

p:/

/ww

w.n

atu

re.c

om

/na

ture

bio

tech

no

log

y

... and in 3D: a plane

Page 76: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

1566 U NU D N U HN G

5

ft r

r f t

MARCKSL1

0 20 40 60 80 100 120

MARCKSL1 MARCKSL1 MARCKSL1 MARCKSL1

MARCKSL1 MARCKSL1

HOXA9

ZY

X

ZY

XZ

YX

f

ZY

X

ZY

X

ZYX

12

10

8

6

4

2

0

ZY

X

e

r S S A

A E A S

A

A A A

f A

A E A S

A A S

1 000

P E

©2006 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://w

ww

.na

ture

.co

m/n

atu

reb

iote

ch

no

log

y

Many Potential Separating Hyperplanes ...

(all “optimal” w.r.t. some loss function)

Page 77: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

1566 U NU D N U HN G

5

ft r

r f t

MARCKSL1

MARCKSL1

0 2 4 6 8 10 12

MARCKSL1 MARCKSL1 MARCKSL1

MARCKSL1 MARCKSL1

HOXA9

ZY

X

ZY

XZ

YX

f

ZY

X

ZY

X

ZYX

12

10

8

6

4

2

0

ZY

X

r S S A

A E A S

A

A A A

f A

A E A S

A A S

1 000

P E©

20

06 N

atu

re P

ub

lis

hin

g G

rou

p

htt

p:/

/ww

w.n

atu

re.c

om

/na

ture

bio

tech

no

log

y The Maximum-Margin Hyperplane

Page 78: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

1566 U NU D N U HN G

5

ft r

r f t

MARCKSL1

MARCKSL1 MARCKSL1

0 2 4 6 8 10 12

MARCKSL1 MARCKSL1

MARCKSL1 MARCKSL1

HOXA9

ZY

X

ZY

XZ

YX

f

ZY

X

g

ZY

X

ZYX

12

10

8

6

4

2

0

ZY

X

r S S A

A E A S

A

A A A

f A

A E A S

A A S

1 000

P E

©2006 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://w

ww

.natu

re.c

om

/natu

reb

iote

ch

no

log

y

What to Do With Outliers?

Page 79: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

1566 U NU D N U HN G

5

ft r

r f t

MARCKSL1

MARCKSL1 MARCKSL1 MARCKSL1

0 2 4 6 8 10 12

MARCKSL1

MARCKSL1 MARCKSL1

HOXA9

ZY

X

ZY

XZ

YX

f

l

ZY

X

ZY

X

h

ZYX

12

10

8

6

4

2

0

ZY

X

r S S A

A E A S

A

A A A

f A

A E A S

A A S

1 000

P E

©2006 N

atu

re P

ub

lish

ing

Gro

up

h

ttp

://w

ww

.na

ture

.co

m/n

atu

reb

iote

ch

no

log

y The Soft-Margin Hyperplane

Page 80: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

1566 U NU D N U HN G

5

ft r

r f t

MARCKSL1

MARCKSL1 MARCKSL1 MARCKSL1 MARCKSL1

MARCKSL1 MARCKSL1

HOXA9

ZY

X

ZY

XZ

YX

f

ZY

X

ZY

X

ZYX

ZY

X

i

–1 –5 0 5 1

Expression

r S S A

A E A S

A

A A A

f A

A E A S

A A S

1 000

P E

©2

006 N

atu

re P

ub

lis

hin

g G

rou

p

htt

p:/

/ww

w.n

atu

re.c

om

/natu

reb

iote

ch

no

log

y

The Kernel Function in 1D

Page 81: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

1566 U NU D N U HN G

5

ft r

r f t

MARCKSL1

MARCKSL1 MARCKSL1 MARCKSL1 MARCKSL1

MARCKSL1 MARCKSL1

HOXA9

ZY

X

ZY

XZ

YX

f

1.0× 1e6

0.8

0.6

0.4

0.2

0

Expre

ssio

n *

expre

ssio

n

j

ZY

X

ZY

X

ZYX

ZY

X

–1 –5 0 5 1

Expression

r S S A

A E A S

A

A A A

f A

A E A S

A A S

1 000

P E

©2

006 N

atu

re P

ub

lis

hin

g G

rou

p

htt

p:/

/ww

w.n

atu

re.c

om

/natu

reb

iote

ch

no

log

y

Mapping the 1D data to 2D (here: squaring)

Page 82: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Not linearly separable in input space ...

wT ( )T ( )v

( w)T ( v)

w v

Remem er t at f r a tw matrices A a B t at ca e m l-ti lie (AB)T BTAT B si t e ect rs v a w are r -tate s c t at t e c i ci e wit t e ri ci al c m e ts

After t at t e are rescale si t e ia al matriI t is c r i ate s stem t e i er r ct w v K am tst a sta ar i er r ct I r er f r K t im leme t ai er r ct all ei e al es a e t e siti e If t e weret t ere c l e ect rs wit a s are le t smaller t a

zer clearl i c tra icti wit E cli ea i t iti s Allt is ill strates t e cl se c ecti s etwee t e sta ar i -er r ct E cli ea s ace a siti e efi ite matricesP siti e efi ite matrices ca efi e a i er r ct If t ec r i ate a is are r tate a rescale a r riatel t is i -er r ct ec mes a sta ar i er r ct a t eref reca i ce a rm a metric a a les t at e a e li e t efamiliar E cli ea es

Pr t ty es Ort lity

As a e am le c si er t e f ll wi classificati r -lem T e left a el f Fi re s ws tw cate ries rawfr m tw Ga ssia istri ti s Eac i t is a stim l st at is escri e tw ime si s T e ime si s arei l c rrelate f r t stim l s classes O a first la cea r t t e lear er will t fi a ecisi tse arate t e tw classes T e tw mea s f r t e tw classesare c ecte wit a s li li e a t e ecisi res lt-i fr m a r t t e classifier is als s w as a s li li eAs t e ecisi is rt al t t e s rtest c ec-ti etwee t e tw mea s it ca t a e res ect t t ec rrelati s i t e classes T e r lem es t act all liei t e r t t e classifier as s c it lies i t e i er r ctt at is se t efi e rt alit A r t t e lear er t ates t fail f r e e t e sim lest cate r str ct res s l

ta e t e c aria ce f t e stim l s ime si s i t acc t(Ree ; Frie & H l a ; As & G tt )If we ta e t e i er r ct K t e i e a siti eefi ite matri K t at is t e i erse f t e c aria ce matrif t e classes we et t e "ri t" efi iti f rt alitT e res lti ecisi is e icte as a as e li e Kis t e i erse f a siti e efi ite matri (c aria ce matri-ces are alwa s siti e efi ite) a is t eref re als a s-iti e efi ite matri He ce it c rres s t t e sta ari er r ct after r tati a scali t e s ace wit t e

matrices a T e mi le a el i Fi re s wst e r tate s ace a t e ri t a el s ws t e s ace afterscali I t e tra sf rme s ace t e tw classes t a ec rrelate a es a m re a t eref re t e r t t e classi-fier wit t e sta ar i er r ct i t is tra sf rme s aceca classif all stim li c rrectl

Figure 3. The crosses and the circles cannot be separated by a

linear perceptron in the plane.

N -li e r Cl ssi c ti Pr lems

A li ear classifier li e t e erce tr is a er attracti emet f r classificati eca se it il s str e -metric i t iti s a t e e tremel well- e el e mat e-matics f li ear al e ra H we er t ere are r lems t at ali ear classifier ca t s l e at least t irectl As se -eral s c l ical t e ries f cate rizati are ase li -ear classifiers t is iss e as als attracte s me atte ti it e s c l ical literat re (Me i & Sc wa e fl el ;Smit M rra & Mi a ) O e e am le f a r lemt at ca t e s l e wit a li ear classifier ca e see iFi re F r a l time t e m st lar a r ac t s l e

-li ear r lems li e t is e was t se a m lti-la ererce tr M lti-la er erce tr s are w t e a le ta r imate a f cti (H r i Sti c c m e & W ite

) a ca e trai e e cie tl si t e ac r -a ati al rit m (R mel art Hi t & Williams )T e a r ac t at will e rese te ere is f ame talli ere t I stea f tr i t lear a c m licate ecisif cti t e strate is t se a -li ear f cti t mat e i t atter s i t a s ace w ere t e r lem ca es l e a li ear classifier T e f ll wi t -e am le il-l strates t is a r ac

Fi re s ws e am les fr m tw classes (cr sses acircles) t at ca t e se arate a er la e i t e i -t s ace (i e a strai t li e i tw ime si s) I steaf tr i t classif t e e am les i t e i t s ace t at isi e t e al es x a x t e ata are tra sf rme ia -li ear wa Li ear classificati f t e ata is t e at-tem te i t e tra sf rme s ace I mac i e lear i s c a

-li ear tra sf rm is calle a feat re ma a t e res ltis ace a feat re s ace T e term “feat re” is alrea ea ilerl a e i s c l T eref re we will se t e m ree tral terms li earizati f cti a li earizati s ace

Page 83: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Map from 2D to 3D ...

Fi re W et er a r t t e classifier ca se arate tw classes e e s als t e i er r ct t at is c se T e left a el s ws twclasses (circles a cr sses) wit i l c rrelate ime si s i t is case t e sta ar i er r ct is t a r riate T e s rt s li

li e c ects t e mea s f t e cate r istri ti s a t e l s li li e is t e c rres i ecisi w e t e sta ar i err ct is se T e as e li e e icts a ecisi wit a i ere t i er r ct T is i er r ct c rres s t t e sta ari er r ct i t e s ace e icte t e ri t t at ca e tai e r tati a rescali t e ri i al s ace

Fi re T e cr sses a circles fr m Fi re ca e ma e

t a t ree- ime si al s ace i w ic t e ca e se arate ali ear erce tr

i stea F r e am le c si er t e f ll wi li earizatif cti : R !→ R

Φ(x) =

φ1(x)φ2(x)φ3(x)

=

x21√

2x1x2x22

. ( )

T is tra sf rmati ma s t e e am le atter s t a t ree-ime si al s ace t at is e icte i Fi re T e e am lesli e a tw - ime si alma if l f t is t ree- ime si als ace I t is s ace t e tw classes ec me li earl se a-ra le t at is it is ssi le t fi a tw - ime si al la es c t at t e circles fall e si e a t e cr sses t et er si e T is s ws t at wit a a r riate -li eartra sf rmati f t e i t a sim le li ear classifier ca s l e

t e r lem T e li earizati a r ac is a i t rescalit e ata i ata-a al sis ef re fitti a li ear m el I t ec rre t e am le eac er la e i t e li earizati s aceefi es a a ratic e ati i t e i t s ace He ce itis ssi le t eal wit a ratic (i e -li ear) f cti s

l si li ear met s I e eral t e strate is tre r cess t e ata wit t e el f a f cti s c t at ali ear erce tr m el is li el t e a lica le F rmallt is ca e e resse as

w (x)

i

wi i(x) ( )

w ere is w t e ime si f t e li earizati s ace aw is a wei t ect r i t e li earizati s ace It is cleart at t ere is a wi e ariet f -li ear f cti s t at cae se t re r cess t e i t I fact t is a r ac waser lar i t e earl a s f mac i e lear i (Nilss

) T e r lem is f c rse t at t e f cti as t ec se ef re lear i ca r cee I r t e am le wea e l s w w e ca se li ear met s t eal wita ratic f cti s t s all e will t w i a a ce

w et er it is ssi le t se arate t e ata wit a a raticf cti H we er if is c se t e s cie tl fle i lee i stea f a a ratic f cti wit l t ree c e -cie ts e c l c se a i r er l mial wit mac e cie ts t e it ma e ssi le t a r imate e e erc m licate ecisi f cti s T is c mes at t e c st fi creasi t e ime si alit f t e li earizati s ace at eref re earl mac i e lear i researc as trie t a it is

T e term li earizati s ace was se i a earl a er er-

el met s (Aizerma Bra erma & R z er )

Page 84: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

... linear separability in 3D

(actually: data still 2D, “live” on a manifold of original D!)

Fi re W et er a r t t e classifier ca se arate tw classes e e s als t e i er r ct t at is c se T e left a el s ws twclasses (circles a cr sses) wit i l c rrelate ime si s i t is case t e sta ar i er r ct is t a r riate T e s rt s li

li e c ects t e mea s f t e cate r istri ti s a t e l s li li e is t e c rres i ecisi w e t e sta ar i err ct is se T e as e li e e icts a ecisi wit a i ere t i er r ct T is i er r ct c rres s t t e sta ari er r ct i t e s ace e icte t e ri t t at ca e tai e r tati a rescali t e ri i al s ace

Figure 4. The crosses and circles from Figure 3 can be mapped

to a three-dimensional space in which they can be separated by alinear perceptron.

i stea F r e am le c si er t e f ll wi li earizatif cti : R R

(x)

(x)(x)(x)

x

x x

x

( )

T is tra sf rmati ma s t e e am le atter s t a t ree-ime si al s ace t at is e icte i Fi re T e e am lesli e a tw - ime si alma if l f t is t ree- ime si als ace I t is s ace t e tw classes ec me li earl se a-ra le t at is it is ssi le t fi a tw - ime si al la es c t at t e circles fall e si e a t e cr sses t et er si e T is s ws t at wit a a r riate -li eartra sf rmati f t e i t a sim le li ear classifier ca s l e

t e r lem T e li earizati a r ac is a i t rescalit e ata i ata-a al sis ef re fitti a li ear m el I t ec rre t e am le eac er la e i t e li earizati s aceefi es a a ratic e ati i t e i t s ace He ce itis ssi le t eal wit a ratic (i e -li ear) f cti s

l si li ear met s I e eral t e strate is tre r cess t e ata wit t e el f a f cti s c t at ali ear erce tr m el is li el t e a lica le F rmallt is ca e e resse as

w (x)

i

wi i(x) ( )

w ere is w t e ime si f t e li earizati s ace aw is a wei t ect r i t e li earizati s ace It is cleart at t ere is a wi e ariet f -li ear f cti s t at cae se t re r cess t e i t I fact t is a r ac waser lar i t e earl a s f mac i e lear i (Nilss

) T e r lem is f c rse t at t e f cti as t ec se ef re lear i ca r cee I r t e am le wea e l s w w e ca se li ear met s t eal wita ratic f cti s t s all e will t w i a a ce

w et er it is ssi le t se arate t e ata wit a a raticf cti H we er if is c se t e s cie tl fle i lee i stea f a a ratic f cti wit l t ree c e -cie ts e c l c se a i r er l mial wit mac e cie ts t e it ma e ssi le t a r imate e e erc m licate ecisi f cti s T is c mes at t e c st fi creasi t e ime si alit f t e li earizati s ace at eref re earl mac i e lear i researc as trie t a it is

T e term li earizati s ace was se i a earl a er er-

el met s (Aizerma Bra erma & R z er )

Page 85: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

1566 U NU D N U HN G

5

ft r

r f t

MARCKSL1

MARCKSL1 MARCKSL1 MARCKSL1 MARCKSL1

MARCKSL1 MARCKSL1

HOXA9

ZY

X

ZY

XZ

YX

f

10

8

6

4

2

0

k

ZY

X

ZY

X

ZYX

ZY

X

0 2 4 6 8 10

Expression

r S S A

A E A S

A

A A A

f A

A E A S

A A S

1 000

P E©

20

06 N

atu

re P

ub

lis

hin

g G

rou

p

htt

p:/

/ww

w.n

atu

re.c

om

/natu

reb

iote

ch

no

log

y

Projecting the 4D Hyperplane Back into 2D Input Space

Page 86: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

SVM magic?

Page 87: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

SVM magic?

For any consistent dataset there is a kernel that allows perfect

separation of the data

Page 88: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

SVM magic?

For any consistent dataset there is a kernel that allows perfect

separation of the data

Why bother with soft-margins?

Page 89: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

SVM magic?

For any consistent dataset there is a kernel that allows perfect

separation of the data

Why bother with soft-margins?

The so-called curse of dimensionality: as the number of variables

considered increases, the number of possible solutions increases

exponentially … overfitting looms large!

Page 90: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

1566 U NU D N U HN G

5

ft r

r f t

MARCKSL1

MARCKSL1 MARCKSL1 MARCKSL1 MARCKSL1

MARCKSL1 MARCKSL1

HOXA9

ZY

X

ZY

XZ

YX

f

10

8

6

4

2

0

l

ZY

X

ZY

X

ZYX

ZY

X

0 2 4 6 8 10

Expression

r S S A

A E A S

A

A A A

f A

A E A S

A A S

1 000

P E

©2

006 N

atu

re P

ub

lis

hin

g G

rou

p

htt

p:/

/ww

w.n

atu

re.c

om

/natu

reb

iote

ch

no

log

y

Overfitting

Page 91: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Regularisation & Cross-validation

Page 92: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Regularisation & Cross-validation

Find a compromise between complexity and classification performance,

i.e. kernel function and soft-margin

Page 93: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Regularisation & Cross-validation

Find a compromise between complexity and classification performance,

i.e. kernel function and soft-margin

Penalise complex functions via a regularisation term or regulariser

Page 94: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Regularisation & Cross-validation

Find a compromise between complexity and classification performance,

i.e. kernel function and soft-margin

Penalise complex functions via a regularisation term or regulariser

Cross-validate the results (leave-one-out or 10-fold typically used)

Page 95: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

SVM Summary

Page 96: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

SVM Summary

Kernel essential—best kernel typically found by trial-and-error and

experience with similar problems etc.

Page 97: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

SVM Summary

Kernel essential—best kernel typically found by trial-and-error and

experience with similar problems etc.

Inverting not always easy; need approximations etc. (i.e. science hard,

engineering easy as they don’t care as long as it works!)

Page 98: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

SVM Summary

Kernel essential—best kernel typically found by trial-and-error and

experience with similar problems etc.

Inverting not always easy; need approximations etc. (i.e. science hard,

engineering easy as they don’t care as long as it works!)

Theoretically sound and a convex optimisation (no local minima)

Page 99: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

SVM Summary

Kernel essential—best kernel typically found by trial-and-error and

experience with similar problems etc.

Inverting not always easy; need approximations etc. (i.e. science hard,

engineering easy as they don’t care as long as it works!)

Theoretically sound and a convex optimisation (no local minima)

Choose between:

• complicated decision functions and training (neural networks)

• clear theoretical foundation (best possible generalisation), convex

optimisation but need to trade-off complexity versus soft-margin and skilful

selection of the “right” kernel.

(= “correct” non-linear similarity measure for the data!)

Page 100: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Regularisation, Cross-Validation and Kernels

Much of the success of modern machine learning methods can attributed to three ideas:

Page 101: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Regularisation, Cross-Validation and Kernels

Much of the success of modern machine learning methods can attributed to three ideas:

1. Regularisation. Given are N “datapoints” (xi,yi) with …

and a model f . Then the “error” between data and model is:

In machine learning we not only take the “error” between model and data into account but

in addition a measure of the complexity of the model f:

x = x1, ..., xN

y = y1, ..., yN

E(y, f(x))

E(y, f(x)) + λR(f)

Page 102: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Regularisation, Cross-Validation and Kernels

Much of the success of modern machine learning methods can attributed to three ideas:

1. Regularisation. Given are N “datapoints” (xi,yi) with …

and a model f . Then the “error” between data and model is:

In machine learning we not only take the “error” between model and data into account but

in addition a measure of the complexity of the model f:

2. Cross-Validation. Regularisation is related to the prior in Bayesian statistics. Unlike

in Bayesian statistics the trade-off between small error and low-complexity of the

model is controlled by a parameter λ— this is optimized using cross-validation.

x = x1, ..., xN

y = y1, ..., yN

E(y, f(x))

E(y, f(x)) + λR(f)

Page 103: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Regularisation, Cross-Validation and Kernels

Much of the success of modern machine learning methods can attributed to three ideas:

1. Regularisation. Given are N “datapoints” (xi,yi) with …

and a model f . Then the “error” between data and model is:

In machine learning we not only take the “error” between model and data into account but

in addition a measure of the complexity of the model f:

2. Cross-Validation. Regularisation is related to the prior in Bayesian statistics. Unlike

in Bayesian statistics the trade-off between small error and low-complexity of the

model is controlled by a parameter λ— this is optimized using cross-validation.

3. Non-linear mapping with linear separation.

True for kernels as well as DNNs.

x = x1, ..., xN

y = y1, ..., yN

E(y, f(x))

E(y, f(x)) + λR(f)

Page 104: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Page 105: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

What changed vision research in 2012?

Page 106: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

What changed vision research in 2012?

ImageNet challenge: 1000 categories, 1.2 million training images.

Page 107: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

What changed vision research in 2012?

ImageNet challenge: 1000 categories, 1.2 million training images.

AlexNet by Krizhevsky, Sutskever & Hinton (2012) appears on the stage, and

basically reduces the prediction error by nearly 50%:

Page 108: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

What changed vision research in 2012?

ImageNet challenge: 1000 categories, 1.2 million training images.

AlexNet by Krizhevsky, Sutskever & Hinton (2012) appears on the stage, and

basically reduces the prediction error by nearly 50%:

Page 109: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen
Page 110: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen
Page 111: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen
Page 112: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen
Page 113: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen
Page 114: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen
Page 115: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen
Page 116: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen
Page 117: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen
Page 118: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen
Page 119: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen
Page 120: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

VisionDeep CNN

LanguageGenerating RNN

A group of people shopping at an outdoor

market.

There are many vegetables at the

fruit stand.

Page 121: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

A woman is throwing a frisbee in a park.

A little girl sitting on a bed with a teddy bear. A group of people sitting on a boat in the water. A giraffe standing in a forest withtrees in the background.

A dog is standing on a hardwood floor. A stop sign is on a road with amountain in the background

Page 122: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

?−→

Problem of finding a sharp image from a blurry photo:

Blind Image Deconvolution

modified from Michael Hirsch

Page 123: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

from Michael Hirsch

Page 124: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

from Michael Hirsch

Page 125: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen
Page 126: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen
Page 127: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen
Page 128: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen
Page 129: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen
Page 130: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Sequence of Blurry Photos (Image Burst)

from Michael Hirsch

Page 131: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Sequence of Blurry Photos (Image Burst)

from Michael Hirsch

Page 132: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Sequence of Blurry Photos (Image Burst)

from Michael Hirsch

Page 133: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Sequence of Blurry Photos (Image Burst)

from Michael Hirsch

Page 134: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Result of Proposed Image Burst Deblurring Method

from Michael Hirsch

Page 135: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

EnhanceNet: Photo-realistic Super-resolution

from Michael Hirsch

Page 136: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

EnhanceNet: Photo-realistic Super-resolution

from Michael Hirsch

Page 137: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

from Michael Hirsch

Page 138: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

from Michael Hirsch

Page 139: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Autonomous cars

Page 140: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Autonomous cars

Page 141: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Fundamentals of Neural Networks

Interest in shallow, 2-layer artificial neural networks (ANN)—so-called

perceptrons—began in the late 1950s and early 60s (Frank Rosenblatt),

based on Warren McCulloch and Walter Pitts’s as well Donald Hebb’s ideas

of computation by neurons from the 1940s.

Page 142: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

https://kimschmidtsbrain.files.wordpress.com/2015/10/perceptron.jpg

Page 143: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

http://cambridgemedicine.org/sites/default/files/styles/large/public/field/image/DonaldOldingHebb.jpg?itok=py9Uh4D5

Page 144: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Fundamentals of Neural Networks

Interest in shallow, 2-layer artificial neural networks (ANN)—so-called

perceptrons—began in the late 1950s and early 60s (Frank Rosenblatt),

based on Warren McCulloch and Walter Pitts’s as well Donald Hebb’s ideas

of computation by neurons from the 1940s.

Second wave of ANN research and interest in psychology—often termed

connectionism—after the publication of the parallel distributed processing

(PDP) books by David Rumelhart and James McClelland (1986), using the

backpropagation algorithm as a learning rule for multi-layer networks.

Page 145: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Fundamentals of Neural Networks

Interest in shallow, 2-layer artificial neural networks (ANN)—so-called

perceptrons—began in the late 1950s and early 60s (Frank Rosenblatt),

based on Warren McCulloch and Walter Pitts’s as well Donald Hebb’s ideas

of computation by neurons from the 1940s.

Second wave of ANN research and interest in psychology—often termed

connectionism—after the publication of the parallel distributed processing

(PDP) books by David Rumelhart and James McClelland (1986), using the

backpropagation algorithm as a learning rule for multi-layer networks.

Three-layer network with (potentially infinitely many) hidden units in the

intermediate layer is a universal function approximator (Kurt Hornik, 1991).

Page 146: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Fundamentals of Neural Networks

Interest in shallow, 2-layer artificial neural networks (ANN)—so-called

perceptrons—began in the late 1950s and early 60s (Frank Rosenblatt),

based on Warren McCulloch and Walter Pitts’s as well Donald Hebb’s ideas

of computation by neurons from the 1940s.

Second wave of ANN research and interest in psychology—often termed

connectionism—after the publication of the parallel distributed processing

(PDP) books by David Rumelhart and James McClelland (1986), using the

backpropagation algorithm as a learning rule for multi-layer networks.

Three-layer network with (potentially infinitely many) hidden units in the

intermediate layer is a universal function approximator (Kurt Hornik, 1991).

Non-convex optimization problems during backpropagation training, and lack

of data and computing power limited the usefulness of the ANNs:

Page 147: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Fundamentals of Neural Networks

Interest in shallow, 2-layer artificial neural networks (ANN)—so-called

perceptrons—began in the late 1950s and early 60s (Frank Rosenblatt),

based on Warren McCulloch and Walter Pitts’s as well Donald Hebb’s ideas

of computation by neurons from the 1940s.

Second wave of ANN research and interest in psychology—often termed

connectionism—after the publication of the parallel distributed processing

(PDP) books by David Rumelhart and James McClelland (1986), using the

backpropagation algorithm as a learning rule for multi-layer networks.

Three-layer network with (potentially infinitely many) hidden units in the

intermediate layer is a universal function approximator (Kurt Hornik, 1991).

Non-convex optimization problems during backpropagation training, and lack

of data and computing power limited the usefulness of the ANNs:

Universal function approximator in theory, but in practice three-layer ANNs

could often not successfully solve complex problems.

Page 148: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Fundamentals of Neural Networks (cont’d)

Breakthrough again with so-called deep neural networks or DNNs, widely

known since the 2012 NIPS-paper by Alex Krizhevsky, Ilya Sutskever &

Geoffrey Hinton.

Page 149: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Fundamentals of Neural Networks (cont’d)

Breakthrough again with so-called deep neural networks or DNNs, widely

known since the 2012 NIPS-paper by Alex Krizhevsky, Ilya Sutskever &

Geoffrey Hinton.

https://www.wired.com/wp-content/uploads/

blogs/wiredenterprise/wp-content/uploads/

2013/03/hinton1.jpg

Page 150: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Fundamentals of Neural Networks (cont’d)

Breakthrough again with so-called deep neural networks or DNNs, widely

known since the 2012 NIPS-paper by Alex Krizhevsky et al.

DNN: loose terminology to refer to networks with at least two hidden or

intermediate layers, typically at least five to ten (or up to dozens):

Page 151: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Fundamentals of Neural Networks (cont’d)

Breakthrough again with so-called deep neural networks or DNNs, widely

known since the 2012 NIPS-paper by Alex Krizhevsky et al.

DNN: loose terminology to refer to networks with at least two hidden or

intermediate layers, typically at least five to ten (or up to dozens):

1. Massive increase in labelled training data (“the internet”),

2. computing power (GPUs),

3. simple non-linearity (ReLU) instead of sigmoid,

4. convolutional rather than fully connected layers,

and

5. weight sharing across deep layers

appear to be the critical ingredients for the current success of DNNs, and

makes them the current method of choice in ML, particular in application.

Page 152: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Fundamentals of Neural Networks (cont’d)

Breakthrough again with so-called deep neural networks or DNNs, widely

known since the 2012 NIPS-paper by Alex Krizhevsky et al.

DNN: loose terminology to refer to networks with at least two hidden or

intermediate layers, typically at least five to ten (or up to dozens):

1. Massive increase in labelled training data (“the internet”),

2. computing power (GPUs),

3. simple non-linearity (ReLU) instead of sigmoid,

4. convolutional rather than fully connected layers,

and

5. weight sharing across deep layers

appear to be the critical ingredients for the current success of DNNs, and

makes them the current method of choice in ML, particular in application.

At least superficially DNNs appear to be similar to the human object

recognition system: convolutions (“filters”, “receptive fields”) followed by

non-linearities and pooling is thought to be the canonical computation of

cortex, at least within sensory areas.

Page 153: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Fundamentals of Neural Networks

90

a

Linear

Threshold

SigmoidRecti!ed linear

1

0

–1

–2 –1 21

1

0

y

y

bw1

x1

w2

x2

z = b + Σ xiwi

i

Annu. R

ev. V

is. S

ci. 2015.1

:417-4

46. D

ow

nlo

aded

fro

m w

ww

.annual

revie

ws.

org

Acc

ess

pro

vid

ed b

y W

IB6134 -

Univ

ersi

ty o

f T

ueb

ingen

on 0

2/1

8/1

6. F

or

per

sonal

use

only

.

Kriegeskorte (2015)

Page 154: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Fundamentals of Neural Networks

91

a b cy

2

y1

x1

x2

W2

W1

y1

x2

x1

y1

x2

x1

y2 = f (f (x W

1) • W2)y2 = x W

1 W

2 = x W'

approximator:

An

nu

. R

ev.

Vis

. S

ci.

20

15

.1:4

17

-44

6.

Do

wn

load

ed f

rom

ww

w.a

nn

ual

rev

iew

s.o

rg A

cces

s p

rov

ided

by

WIB

61

34

- U

niv

ersi

ty o

f T

ueb

ing

en o

n 0

2/1

8/1

6.

Fo

r p

erso

nal

use

on

ly.

Kriegeskorte (2015)

Page 155: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Example: VGG-16

VGG16 by Simonyan & Zisserman (2014); 92.7% top-5 test accuracy on ImageNet

http

s://

ww

w.c

s.to

ron

to.e

du/~

fros

sard

/pos

t/vg

g16/

#arc

hit

ectu

re

Page 156: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

http://scs.ryerson.ca/~aharley/vis/conv/flat.html

Page 157: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Deep Neural Networks (DNNs)

18

2

Input(2)

Output(1 sigmoid)

Hidden(2 sigmoid)

a b

yy

xy

x=yz

xy

z yzz

y=Δ Δ

Δ Δ

Δ Δz yz

xy x=

xz

yz

xxy

=

H1

Page 158: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Page 159: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Adversarial attacks?

Szegedy et al. (2014)

Page 160: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Adversarial examples? (cont’d)

Reese

Witherspoon

Sharif et al. (2016)

Page 161: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Adversarial examples? (cont’d)

Reese

Witherspoon

Sharif et al. (2016)

Page 162: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Adversarial examples? (cont’d)

Reese

Witherspoon

Russel

Crowe

Sharif et al. (2016)

Page 163: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Adversarial examples? (cont’d)

Reese

Witherspoon

Russel

Crowe

Sharif et al. (2016)

Page 164: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Adversarial examples? (cont’d)

Sharif et al. (2016)

Page 165: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

DARPA Challenge 2015

Page 166: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

DARPA Challenge 2015

Page 167: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Boston Dynamics 2017

Page 168: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Boston Dynamics 2017

Page 169: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Human versus artificial intelligence

We learn unsupervised or semi-supervised, sometimes reinforcement, very

rarely supervised (school, University) – all successful AI is currently

supervised only, i.e. only when the correct answer is known!

We can do lots of things using the same network (or a set of closely coupled

networks) — all DNNs are typically only good at one or few tasks.

101

Page 170: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Page 171: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Gesellschaftliche Herausforderungen

Arbeitsbedingungen und Arbeitsmarkt:

Einsatz von Technologie macht die Arbeit “einfacher” – typischerweise fällt die

Notwendigkeit einer Lehre oder Ausbildung weg.

Die Folge sind sinkende Löhne … schließlich kann “jeder” die Arbeit machen.

103

Page 172: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Arbeitslosigkeit?

104

Page 173: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Arbeitslosigkeit?

Autonome Fahrzeuge – womöglich kurz nach der Erlaubnis, solche Fahrzeuge im Straßenverkehr zu haben, die Pflicht, nur noch damit zu fahren.

104

Page 174: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Arbeitslosigkeit?

Autonome Fahrzeuge – womöglich kurz nach der Erlaubnis, solche Fahrzeuge im Straßenverkehr zu haben, die Pflicht, nur noch damit zu fahren.

540.000 Berufskraftfahrer in Deutschland (Stand 2013)250.000 Taxifahrerlaubnisse (Stand 2017) 25.000 Lokführer (Stand 2017)815.000 Arbeitsplätze gefährdet (Quote von 5.8% auf 8.1%)

104

Page 175: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Arbeitslosigkeit?

Autonome Fahrzeuge – womöglich kurz nach der Erlaubnis, solche Fahrzeuge im Straßenverkehr zu haben, die Pflicht, nur noch damit zu fahren.

540.000 Berufskraftfahrer in Deutschland (Stand 2013)250.000 Taxifahrerlaubnisse (Stand 2017) 25.000 Lokführer (Stand 2017)815.000 Arbeitsplätze gefährdet (Quote von 5.8% auf 8.1%)

Roboter in der Post? Abfallwirtschaft? Logistik?Deutsche Post DHL hat 211.000 Mitarbeiter in Deutschland (Stand 2016), in der Ver- und Entsorgung arbeiteten 2014 ca. 155.000 Menschen, als Reinigungskräfte 2014 offiziell fast 760.000; Amazon beschäftigt alleine in D 23.000 Menschen in Logistik-Zentren: 1.150.000 Arbeitsplätze!

104

Page 176: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Arbeitslosigkeit?

Autonome Fahrzeuge – womöglich kurz nach der Erlaubnis, solche Fahrzeuge im Straßenverkehr zu haben, die Pflicht, nur noch damit zu fahren.

540.000 Berufskraftfahrer in Deutschland (Stand 2013)250.000 Taxifahrerlaubnisse (Stand 2017) 25.000 Lokführer (Stand 2017)815.000 Arbeitsplätze gefährdet (Quote von 5.8% auf 8.1%)

Roboter in der Post? Abfallwirtschaft? Logistik?Deutsche Post DHL hat 211.000 Mitarbeiter in Deutschland (Stand 2016), in der Ver- und Entsorgung arbeiteten 2014 ca. 155.000 Menschen, als Reinigungskräfte 2014 offiziell fast 760.000; Amazon beschäftigt alleine in D 23.000 Menschen in Logistik-Zentren: 1.150.000 Arbeitsplätze!

Humanoide Roboter in der Pflege?2014 arbeiteten in der Alten- und Krankenpflege in D über 900.000 Menschen … .

104

Page 177: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Gesellschaftliche Herausforderungen

Arbeitsbedingungen und Arbeitsmarkt:

Einsatz von Technologie macht die Arbeit “einfacher” – typischerweise fällt die

Notwendigkeit einer Lehre oder Ausbildung weg.

Die Folge sind sinkende Löhne … schließlich kann “jeder” die Arbeit machen.

Politik und Gesellschaft:

Leben in der selben Wirklichkeit? Personalisierte Information in sozialen Medien

und der Verlust breit und kontrovers informierender Quellen – weit verbreiteter

Konsum von Propaganda.

105

Page 178: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Propaganda

Propaganda ist der Versuch der gezielten Beeinflussung des Denkens,

Handelns und Fühlens von Menschen. Wer Propaganda betreibt, verfolgt

damit immer ein bestimmtes Interesse. … Charakteristisch für Propaganda

ist, dass sie die verschiedenen Seiten einer Thematik nicht darlegt und

Meinung und Information vermischt. Wer Propaganda betreibt, möchte

nicht diskutieren und mit Argumenten überzeugen, sondern mit allen Tricks

die Emotionen und das Verhalten der Menschen beeinflussen, beispielsweise

indem sie diese ängstigt, wütend macht oder ihnen Verheißungen ausspricht.

Propaganda nimmt dem Menschen das Denken ab und gibt ihm stattdessen

das Gefühl, mit der übernommenen Meinung richtig zu liegen.

Quelle: Bundeszentrale für politische Bildungwww.bpb.de

106

Page 179: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Gesellschaftliche Herausforderungen

Arbeitsbedingungen und Arbeitsmarkt:

Einsatz von Technologie macht die Arbeit “einfacher” – typischerweise fällt die

Notwendigkeit einer Lehre oder Ausbildung weg.

Die Folge sind sinkende Löhne … schließlich kann “jeder” die Arbeit machen.

Politik und Gesellschaft:

Leben in der selben Wirklichkeit? Personalisierte Information in sozialen Medien

und der Verlust breit und kontrovers informierender Quellen – weit verbreiteter

Konsum von Propaganda.

Privatsphäre? Veränderung (zwischenmenschlicher) Kommunikation?

107

Page 180: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Weapons of Mass Destruction (WMDs)

https://www.wired.com/images_blogs/dangerroom/2011/03/powell_un_anthrax.jpg

Page 181: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen
Page 182: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Gesellschaftliche Herausforderungen

Arbeitsbedingungen und Arbeitsmarkt:

Einsatz von Technologie macht die Arbeit “einfacher” – typischerweise fällt die

Notwendigkeit einer Lehre oder Ausbildung weg.

Die Folge sind sinkende Löhne … schließlich kann “jeder” die Arbeit machen.

Politik und Gesellschaft:

Leben in der selben Wirklichkeit? Personalisierte Information in sozialen Medien

und der Verlust breit und kontrovers informierender Quellen – weit verbreiteter

Konsum von Propaganda.

Privatsphäre? Veränderung (zwischenmenschlicher) Kommunikation?

Naïver Glaube an die Objektivität von Algorithmen

… und Ranglisten, die Vermessung und Quantifizierung des Lebens:

China, z.B., plant das Social Credit System einzuführen.

110

Page 183: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

https://de.wikipedia.org/wiki/Nick_Bostrom

Page 184: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Gesellschaftliche Herausforderungen

Arbeitsbedingungen und Arbeitsmarkt:

Einsatz von Technologie macht die Arbeit “einfacher” – typischerweise fällt die

Notwendigkeit einer Lehre oder Ausbildung weg.

Die Folge sind sinkende Löhne … schließlich kann “jeder” die Arbeit machen.

Politik und Gesellschaft:

Leben in der selben Wirklichkeit? Personalisierte Information in sozialen Medien

und der Verlust breit und kontrovers informierender Quellen – weit verbreiteter

Konsum von Propaganda.

Privatsphäre? Veränderung (zwischenmenschlicher) Kommunikation?

Naïver Glaube an die Objektivität von Algorithmen

… und Ranglisten, die Vermessung und Quantifizierung des Lebens: China plant

das Social Credit System einzuführen.

Doomsday-Szenarien

Kommt die Singularität? Wenn ja: Garten Eden oder Hölle?

112

Page 185: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Doomsday-Videos to watch

Google's Geoffrey Hinton - "There's no reason to think computers won't get much

smarter than us” (10 mins): https://www.youtube.com/watch?v=p6lM3bh-npg

Demis Hassabis, CEO, DeepMind Technologies - The Theory of Everything

(16 mins): https://www.youtube.com/watch?v=rbsqaJwpu6A

Nick Bostrom, What happens when our computers get smarter than we are?

(17 mins): https://www.ted.com/talks/nick_bostrom_what_happens_when_our_computers_get_smarter_than_we_are

Why Elon Musk is worried about artificial intelligence (3 mins)https://www.youtube.com/watch?v=US95slMMQis

Page 186: Maschinelles Lernen: Methoden, Algorithmen, Potentiale und ...Neural Information Processing Group and Bernstein Center for Computational Neuroscience, Eberhard Karls Universität Tübingen

Neural Information Processing Group and

Bernstein Center for Computational Neuroscience,

Eberhard Karls Universität Tübingen

Max Planck Institute for Intelligent Systems, Tübingen

Felix Wichmann

Thanks