Wissenschaftliches Arbeiten - Auswertung Folie 1 Artificial Neural Networks Uwe Lämmel Wismar Business School laemmel [email protected]

Wissenschaftliches Arbeiten - Auswertung

Folie 1

ArtificialNeural Networks

Uwe Lämmel

Wismar Business

School

www.wi.hs-wismar.de/~laemmel

[email protected]


Folie 2


Folie 3

Literature & Software

– Robert Callan: The Essence of Neural Networks, Pearson Education, 2002.

– JavaNNS based on SNNS:Stuttgarter Neuronale Netze Simulatorhttp://www.ra.cs.uni-tuebingen.de/software/JavaNNS/


Folie 4

Prerequisites

NO algorithmic solution availableor algorithmic solution too time consuming

NO knowledge-based solution LOTS of experience (data)

Try a NN


Folie 5

Content

Idea An artificial Neuron – Neural Network Supervised Learning – feed-forward networks Competitive Learning – Self-Organising Map Applications


Folie 6

Two different types of knowledge processing

Logic Conclusion

sequential Aware of Symbol processing,

Rule processing precise Engineering „traditional" AI

Perception, Recognition

parallel Not aware of Neural Networks

fuzzy Cognitive oriented Connectionism


Folie 7

Idea

A human being learns by example “learning by doing”

– seeing(Perception), Walking, Speaking,…

Can a machine do the same?

A human being uses his brain. A brain consists of millions of single cells.A cell is connected with ten thousands of other cell.

Is it possible to simulate a similar structure on a computer?


Folie 8

Idea

Artificial Neural Network

– Information processing similar to processes in a mammal brain

– heavy parallel systems,

– able to learn

– great number of simple cells

? Is it useful to copy nature ?

– wheel, aeroplane, ...


Folie 9

Idea

we need: software neurons software connection between neurons software learning algorithms

An artificial neural network functions in a similar way

a natural neural network does.


Folie 10

A biological Neuron

cell andcell nucleus

Axon(Neurit)

Dendrits

Synapse

• Dendrits: (Input) Getting other

activations• Axon: (Output ) forward the

activation (from 1mm up to 1m long)

• Synapse: transfer of activation:– To other cells, e.g.. Dendrits of

other neurons– a cell has about 1.000 to 10.000

connections to other cells• Cell Nucleus: (processing)

evaluation of activation


Folie 11

Abstraction

Dendrits: weighted connectionsweight: real number

Axon: output: real number

Synapse: --- (identity: output is directly forwarded)

Cell nucleus: unit contains simple functionsinput = (many) real numbersprocessing = evaluation of activationoutput = real number (~activation)


Folie 12

An artificial Neuron

net : input from the networkw : weight of a connectionact : activationfact : activation function

: bias/thresholdfout : output function (mostly ID)

o : output

j

jjii ownet

),( iiacti netfact

)( iouti actfo

w1i w2i wji...

oi


Folie 13

A simple switch

Set parameters according to function:– Input neurons 1,2 :

a1,a2 input pattern,here: oi=ai

– weights of edges: w1, w2

– bias Give values for w1, w2 , we can evaluate output o

a1=__ a2=__

o=__

net= o1w1+o2

w2a = 1, if net>= 0, otherwise

o = a

w1=__ w2=__


Folie 14

Questions

find values for the parameters so that a logic function is simulated:

– Logical AND

– Logical OR

– Logical exclusive OR (XOR)

– Identity We want to process more than 2

inputs. Find appropriate parameter values.

– Logical AND, 3 (4) inputs

– OR, XOR iff 2 out of 4 are 1


Folie 15

Mathematics in a Cell

Propagation function neti(t) = ojwj = w1i o1 + w2i o2 + ...

Activationai(t) – Activation at time t

Activation function fact :ai(t+1) = fact(ai(t), neti(t), i) i – bias

Output function fout : oi = fout(ai)


Folie 16

Bias function

-1,0

-0,5

0,0

0,5

1,0

-4,0 -2,0 0,0 2,0 4,0

Identity

-4,0

-2,0

0,0

2,0

4,0

-4,0 -2,5 -1,0 0,5 2,0 3,5

Activation Functions

activation functions are

sigmoid functions


Folie 17

y = tanh(c·x)

-1,0

-0,5

0,5

1,0

-0,6 0,6 1,0

c=1c=2

c=3

-1,0

Activation Functions

y = 1/(1+exp(-c·x))

0,5

1,0

-1,0 0,0 1,0

c=1c=3

c=10

Logistic function:

activation functions are

sigmoid functions


Folie 18

Structure of a network

layers input layer – contains input neurons output layer – contains output neurons hidden layer – contains hidden neurons

An n-layer network has: n layer of connections which can be

trained n+1 neuron layers n –1 hidden layers


Folie 19

Neural Network - Definition

A Neural Network is characterized by connections of many (a lot of) simple units (neurons)

and units exchanging signals via these connections

A neural Network is a coherent, directed graph which has weighted edges and each node (neurons, units ) contains a value (activation).


Folie 20

Elements of a NN

Connections/Links – directed, weighted graph

– weight: wij (from cell i to cell j)

– weight matrix Propagation function

– network input of a neuron will be calculated:

neti = ojwji Learning algorithm


Folie 21

ExampleXOR-Network

1 20 1

1 10 1

31 1 10 1,5 1

0

-20

4 10,5

TRUE

weight matrix wij: i\j 1 2 3 4 1 0 0 1 1 2 0 0 1 1 3 0 0 0 -2 4 0 0 0 0


Folie 22

Supervised Learning – feed-forward networks

Idea An artificial Neuron – Neural Network Supervised Learning – feed-forward

networks– Architecture– Backpropagation Learning

Competitive Learning – Self-Organising Map Applications


Folie 23

Multi-layer feed-forward network


Folie 24

Feed-Forward Network


Folie 25

Evaluation of the net output

Ni

Nj

Nk

netj

netk

Oj=actj

Ok=act

k

Trai

ning

pa

ttern

p

Oi=pi

Input-Layer hidden Layer(s) Output Layer


Folie 26

Backpropagation Learning Algorithm

supervised Learning error is a function of the weights wi :

E(W) = E(w1,w2, ... , wn) We are looking for a minimal error minimal error = hollow in the error surface Backpropagation uses the gradient

for weight approximation.


Folie 27

error curve


Folie 28

Problem

error in output layer:difference output – teaching output

error in a hidden layer?

output teaching output

input layer

hidden layer


Folie 29

Mathematics

– modifying weights according to the gradient of the error function

W = - E(W)

E(W) is the gradient

is a factor, called learning parameter

-1 -0,6 -0,2 0,2 0,6 1


Folie 30

Mathematics

Here: modification of weights:

W = – E(W)

– E(W): Gradient

– Proportion factor for the weight vector W, : learning factor

E(Wj) = E(w1j,w2j, ..., wnj)


Folie 31

Error Function

– Error functionquadratic distance between real and teaching output of all patterns p:

– tj - teaching output– oj - real output

– Now: error for one pattern only (omitting pattern index p):

p

pEE

2)(2

1j

jj otE

– Modification of a weight:

(1)

(2)

ijij w

Ew


Folie 32

Backpropagation rule

Multi layer networks Semi linear Activation function

(monotone, differentiable, e.g. logistic function)

Problem: no teaching outputs for hidden neurons


Folie 33

Backpropagation Learning Rule

(6.3)

Start:

6.1 in more detail:

dependencies: ))(( jactoutj netffo

fout = Id

ijij w

Ew

(6.1)

(6.2)

ij

j

j

j

jij w

net

net

o

o

Ew

i

ijij wonet


Folie 34

)1())(1()( jjjLogisticjLogisticj

joonetfnetf

net

o

The 3rd and 2nd Factor

3rd Factor:dependency net input – weights

ij

j

j

j

jij w

net

net

o

o

Ew

)( jactj

j netfnet

o

2nd Factor:

derivation of the activation function:

(6.4)

(6.5)

ik

kjkijij

j owoww

net

(6.7)


Folie 35

The 1st Factor

1st Factor: dependency error – output

ij

j

j

j

jij w

net

net

o

o

Ew

Error signal of hidden neuron j:

(6.8)

(6.10)

– Error signal of output neuron j: 2)(2

1j

jj otE

)(2

1 2jj

kkk

jj

ototoo

E

(6.9)

kjkk

k iiki

jk

k j

k

kj

w

wooo

net

net

E

o

E

j : error signal


Folie 36

Error Signal

j = f’act(netj)·(tj – oj) Output neuron j:

Hidden neuron j:

j

j

jjj net

o

o

E

net

E

j = f’act(netj) · kwjk

(6.12)

(6.11)


Folie 37

Standard Backpropagation Rule

For the logistic activation function:f ´act(netj ) = fact(netj )(1 – fact(netj )) = oj (1 –oj)

Therefore:

Neuron Outputjif)()1(

Neuronhiddenjif)1(

jjjj

kjkkjj

j

otoo

woo

and:jiij ow

jiijij oww '


Folie 38

error signal for fact = tanh

For the activation function tanh holds:f´act(netj ) = (1 – f ²act(netj )) = (1 – tanh² oj )

therefore:

neuronoutput jif,)()tanh1(

neuronhiddenjif,)tanh1(

2

2

jjj

kjkkj

j

oto

wo


Folie 39

Backpropagation - Problems

B CA


Folie 40

Backpropagation-Problems

– A: flat plateau – backpropagation goes very slowly– finding a minimum takes a lot of time

– B: Oscillation in a narrow gorge– it jumps from one side to the other and back

– C: leaving a minimum – if the modification in one training step is to high,

the minimum can be lost


Folie 41

Solutions: looking at the values

change the parameter of the logistic function in order to get other values

Modification of weights depends on the output:if oi=0 no modification will take place

If we use binary input we probably have a lot of zero-values: Change [0,1] into [-½ , ½] or [-1,1]

use another activation function, eg. tanh and use [-1..1] values


Folie 42

Solution: Quickprop

assumption: error curve is a square function calculate the vertex of the curve

)1()()1(

)()(

tw

tStS

tStw ijij

)()(

tw

EtS

ij

-2 2 6

slope of the error curve:


Folie 43

Resilient Propagation (RPROP)

– sign and size of the weight modification are calculated separately: bij(t) – size of modification

bij(t-1) + if S(t-1)S(t) > 0bij(t) = bij(t-1) - if S(t-1)S(t) < 0

bij(t-1) otherwise

+>1 : both ascents are equal „big“ step0<-<1 : ascents are different „smaller“ step

-bij(t) if S(t-1)>0 S(t) > 0wij(t) = bij(t) íf S(t-1)<0 S(t) < 0

-wij(t-1) if S(t-1)S(t) < 0 (*) -sgn(S(t))bij(t) otherwise

(*) S(t) is set to 0, S(t):=0 ; at time (t+1) the 4th case will be applied.


Folie 44

Limits of the Learning Algorithm

– it is not a model for biological learning

– we have no teaching output in a natural learning process

– In a natural neural network there are no feedbacks (at least nobody has discovered yet)

– training of a artificial neural network is rather time consuming


Folie 45

Development of an NN-application

calculate network output compare to

teaching output

use Test set data

evaluate output

compare to teaching output

change parameters

modify weights

input of training pattern

build a network architecture

quality is good enough

error is too high

error is too high

quality is good enough


Folie 46

Possible Changes

– Architecture of NN– size of a network– shortcut connection– partial connected layers– remove/add links– receptive areas

– Find the right parameter values– learning parameter– size of layers– using genetic algorithms


Folie 47

Memory Capacity - Experiment

– output-layer is a copy of the input-layer

– training set consisting of n random pattern

– error:– error = 0

network can store more than n patterns

– error >> 0 network can not store n patterns

– memory capacity:error > 0 and error = 0 for n-1 patterns and error >>0 for n+1 patterns


Folie 48

Summary

– Backpropagation is a Backpropagation of Error Algorithm– works like gradient descent– Activation Functions: Logistics, tanh– Meaning of Learning parameter

– Modifications– RPROP– Backprop Momentum– QuickProp

– Finding an appropriate Architecture:– Memory Size of a Network– Modifications in layer connection

– Applications


Folie 49

Binary Coding of nominal values I

– no order relation, n-values– n neurons, – each neuron represents one and only one

value:– example:

red, blue, yellow, white, black

1,0,0,0,0 0,1,0,0,0 0,0,1,0,0 ...– disadvantage: n neurons necessary,

but only one of them is activated lots of zeros in the input


Folie 50

Binary Coding of nominal values II

– no order-relation, n values– m neurons, of it k neurons switched on for one

single value– requirement: (m choose k) n

– example: red, blue, yellow, white, black

1,1,0,0 1,0,1,0 1,0,0,1 0,1,1,0 0,1,0,1 4 neuron, 2 of it switched on, (4 choose 2) > 5

– advantage:– fewer neurons– balanced ratio of 0 and 1

nk

m


Folie 51

A1: Credit historyA2: debtA3: collateralA4: income

Example Credit Scoring

• network architecture depends on the coding of input and output• How can we code values like good, bad, 1, 2, 3, ...?


Folie 52


A1:

A2:

A3:

A4:

class


Folie 53

Supervised Learning – feed-forward networks

Idea An artificial Neuron – Neural Network Supervised Learning – feed-forward

networks Competitive Learning – Self-

Organising Map– Architecture– Learning– Visualisation

Applications


Folie 54

Self Organizing Maps (SOM)

A natural brain can organize itself Now we look at the position of a neuron and

its neighbourhood

Kohonen Feature Map two layer pattern associator

- Input layer is fully connected with map-layer- Neurons of the map layer are fully connected

to each other (virtually)


Folie 55

Clustering

- objective: All inputs of a class are mapped onto one and the same neuron

f

Input set Aoutput B

ai

- Problem: classification in the input space is unknown- Network performs a clustering


Folie 56

Winner Neuron

Kohonen- Layer

Input-Layer

Winner Neuron


Folie 57

Learning in an SOM

1.Choose an input k randomly

2.Detect the neuron z which has the maximal activity

3.Adapt the weights in the neighbourhood of z: neuron i within a radius r of z.

4.Stop if a certain number of learning steps is finished

otherwise decrease learning rate and radius,go on with step 1.


Folie 58

A Map Neuron

– look at a single neuron (without feedback):

– Activation:

– Output: fout = Id

jii

ijj ownet

jnetjactje

netfa

1

1)(


Folie 59

Centre of Activation

- Idea: highly activated neurons push down the activation of neurons in the neighbourhood

- Problem: Finding the centre of activation:

- Neuron j with a maximal net-input

- Neuron j, having a weight vector wj which is similar to the input vector (Euklidian Distance):

z: x - wz = minj x - wj

ii

ijj

ii

iz owow max


Folie 60

Changing Weights

- weights to neurons within a radius z will be increased: wj(t+1) = wj(t) + hjz(x(t)-wj(t)) , j z x-input wj(t+1) = wj(t) , otherwise

- Amount of influence depends on the distance to the centre of activation:

(amount of change wj?)

- Kohonen uses the function :

z determines the shape of the curve: z small high + sharp z high wide + flat

2

2

2

)(

z

zj

jz eh


Folie 61

Changing weights

- Simulation by a Gauß-curve

- Changing Weights by a learning rate (t),going down to zero

- Weight change:: wj+1(t+1) = wj(t) + hjz(x(t)-wj(t)) , j z

wj+1(t+1) = wj(t) , otherwise

- Requirements: - Pattern input by random! z(t) and z(t) are monotone decreasing functions in t.

Mexican-Hat-Approach

0

0,5

1

-3 -2 -1

0 1 2 3


Folie 62

SOM Training

zpjpj

WmWm min

• find the winner neuron zfor an input pattern p(minimal Euclidian distance)

• adapt weights of connections• winner neuron -input neurons• neighbours – input neurons

Kohonen layer

input pattern mp

Wj

otherwisew

rzjdistifwmhww

ij

ijijzij

ij ,

),(,)(/

2

2

2

),(

r

zjdist

jz eh


Folie 63

A1: Credit HistoryA2: DebtsA3: CollateralA4: Income


• We do not look at the Classification

• SOM performs a Clustering


Folie 64

Credit Scoring

– good = {5,6,9,10,12}– average = {3, 8, 13}– bad = {1,2,4,7,11,14}


Folie 65

Credit Scoring

– Pascal tool box (1991)– 10x10 neurons– 32,000 training steps


Folie 66

Visualisation of a SOM

• Colour reflects Euclidian distance to input

NetDemo

TSPDemo

• Weights used as coordinates of a neuron

• Colour reflects cluster

ColorDemo


Folie 67

Example TSP

– Travelling Salesman Problem– A salesman has to visit certain cities and will

return to his home. Find an optimal route!– problem has exponential complexity: (n-1)!

routes

Experiment: Pascal Program, 1998 31/32 states in Mexico?


Folie 68

Nearest Neighbour: Example

– Some cities in Northern Germany:

– Initial city is Hamburg

KielRostock

Berlin

Hamburg

Hannover

Frankfurt

Essen

Schwerin

Exercise:• Put in the coordinates

of the capitals of all the 31 Mexican States + Mexico/City.

• Find a solution for the TSP using a SOM!


Folie 69

SOM solves TSP

inputKohonen layer

w1i= six

w2i= siy

Draw a neuron at position:

(x,y)=(w1i,w2i)X

Y


Folie 70

SOM solves TSP

– Initialisation of weights:– weights to input (x,y) are calculated

so that all neurons form a circle– The initial circle will be expanded to a

round trip– Solutions for problems of several

hundreds of towns are possible– Solution may be not optimal!


Folie 71

Applications

– Data Mining - Clustering– Customer Data– Weblog– ...

– You have a lot of data, but no teaching data available – unsupervised learning– you have at least an idea about the

result– Can be applied as a first approach to get

some training data for supervised learning


Folie 72

Applications

Pattern recognition (text, numbers, faces):number plates,access at cash automata,

Similarities between molecules Checking the quality of a surface Control of autonomous vehicles Monitoring of credit card accounts Data Mining


Folie 73

Applications

Speech recognition Control of artificial limbs classification of galaxies Product orders (Supermarket) Forecast of energy consumption Stock value forecast


Folie 74

Application - Summary

Classification Clustering Forecast Pattern recognition

Learning by examples, generalization Recognition of not known structures in large

data


Folie 75

Application

– Data Mining:– Customer Data– Weblog

– Control of ...– Pattern Recognition

– Quality of surfaces– possible if you have training data ...


Folie 76

The End

Documents

Wissenschaftliches Arbeiten - Auswertung Folie 1 Artificial Neural Networks Uwe Lämmel Wismar Business School laemmel [email protected]