Algorithm Engineering Parallele Algorithmen Stefan Edelkamp

Preview:

Citation preview

Algorithm Engineering „Parallele Algorithmen“

Stefan Edelkamp

Übersicht

Parallele Externe Suche Parallele Verspätete Duplikatselimination

Parallele ExpansionVerteilte Sortierung

Parallele Strukturierte DuplikatseliminationDisjunkte Duplikatserkennungsbereiche ”Schlöser”

Parallele AlgorithmenMatrix-MultiplikationList RankingEuler Tour

VerteilteSuche

Distributed setting provides more space. Experiments show that internal time dominates

I/O.

Exploiting Independence

Since each state in a Bucket is independent of the other –

they can be expanded in parallel.

Duplicates removal can be distributed on different processors.

Bulk (Streamed) transfers much better than single ones.

Distributed Queue for Parallel Best-First Search

P0

P1

P2

<15,34, 0, 100>

<g, h, start byte, size>

<15,34, 20, 100>TOP

<15,34, 40, 100>

<15,34, 60, 100>

Beware of t

he

Mutual E

xclusio

n

Problem!!!

Multiple Processors - Multiple Disks Variant

Sorted buffers w.r.t the hash val

Sorted Files

P1 P2 P3 P4

Divide w.r.t the hash ranges

Sorted buffers from every processor

Sorted File

h0 ….. hk-1 hk ….. hl-1

ParallelExternal A*

Parallel External A*

Distributed Heuristic Evaluation Assume one child processor for each tile one master processor

B3B1 B2

B8

B4 B5 B6 B7

B9 B10 B11

B12 B13 B14 B15

B0B3B1 B2

B8

B4 B5 B6 B7

B9 B10 B11

B12 B13 B14 B15

B0

Distributed Pattern Database Search

Only pattern databases that include the client tile need to be loaded on the client

Because multiple tiles in pattern, from birds eye PDB loaded multiple times

In 15-Puzzle with corner and fringe PDB this saves RAM in the order of factor 2 on each machine, compared to loading all

In 36-Puzzle with 6-tile pattern databases this saves RAM in the order of factor 6 on each machine, compared to loading all

Extends to additive pattern databases

Distributed Heuristic Evaluation

Same bottleneck in external-memory search

Bottleneck: Duplicate detection Duplicate paths cause parallelization overhead

A

C D

BB

C DDDD

Internal memory External memory

vs.

fast

slow

A

Disjoint duplicate-detection scopes

B1B0 B4

B0 B3B1 B2

B8

B4 B5 B6 B7

B9 B10 B11

B12 B13 B14 B15

B0 B1

B4

B3B2

B7

B2B3 B7

B12

B8

B13 B15B14

B11

B8B12 B13B11B15 B14

Finding disjoint duplicate-detection scopes

B1B0 B4

0 00 0

0

0 0 0 0

0 0

1

0 0 0 0

0 1

1

02

1

B2B3 B7

0 1 0

B8B12 B13B11B15 B14

1

2 2

01

2

2

2

2

1

2

2

2

2

2

0 1

1

1

0

1

0

2

3

3

2

B1B5B6

B4 B9

2

3

3

4

3

3

Implementation of Parallel SDD

Hierarchical organization of hash tablesOne hash table for each abstract nodeTop-level hash func. = state-space projection func.

Shared-memory managementMinimum memory-allocation size mMemory wasted is bounded by O(m#processors)

External-memory version I/O-efficient order of node expansions I/O-efficient replacement strategy

Benötigt nur ein Mutex “Schloss”

B3B1 B2

B8

B4 B5 B6 B7

B9 B10 B11

B12 B13 B14 B15

B0

ParallelleMatrix- Multiplication

ParalleleMatrixMultiplication

Exklusives Schreiben

ParalleleKopien

FazitMatrix Multiplication

Paralleles List Ranking

List Ranking

Erster Algorithmus

Prinzip

Komplexität

Verbesserungen

Strategie

Unabhängige Mengen

2-Färbung

Reduktion

Restauration

Beispiel

Variablen

Beispiel(ctd.)

PseudoCode

NächsterSchritt

Analyse

Backup

Algo

Algo

Speicher

Analyse

Ausblick:Randomisiertin O(n) whp?

Problememit DFS

IdeeEulerTour

ParallelDFS

DFSNummern

Allgemein

Allgemein

Allgemein

Beispiel

Ein Zyklusoder mehrere?

Korrektheit

Korrektheit

Beispiel

KonstruktionEulerTour

Fazit Euler Touren

GPU Architektur

Effektivität

Hierarchischer Speicher

Hash-based Partitioning

BFS

Kernel Functions

Recommended