The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen...

The Vampir Performance Analysis Tool

Hans–Christian Hoppe

Gesellschaft für Parallele Anwendungen und Systeme mbH

Pallas GmbHHermülheimer Straße 10D-50321 Brühl, Germany

info@pallas.comhttp://www.pallas.com

SCICOMP 2000 Tutorial, San Diego

Outline

Performance tools for parallel programming

Performance analysis for MPI

The Vampir tool

The Vampir roadmap

Why performance tools?

CPUs and interconnects are getting faster all the time

Compilers are improving

“Abundance of computing power”

Shouldn’t it be sufficient to just write an application and let the system do the rest?

In reality, there remain severe performance bottlenecks– slow memory access (instructions and data)– cache consistency effects– starvation of instruction units– contention of interconnection systems– adverse interaction with schedulers

The application programmer does the rest– excessive sequential sections– bad load balance– non–optimized communication patterns– excessive synchronization

Performance analysis tools can– help to diagnose system–level performance problems– help to identify user–level performance bottlenecks– assist the users in improving their applications

Achieved performance vs. effort

Effort

OpenMP

Code doesn’t work

Performance tools

KAP, Debuggers

Performance tools – goals?

Holy grail– Automatic parallelisation and optimization– One code version for sequential and parallel– One code version for all platforms– Automatic code verification– Automatic performance verification– Automatic detection of performance problems– Integration of performance analysis and parallelisation

Event–based MPI Analysis

Record trace of application execution– Calls to MPI and user routines– MPI communication events– Source locations– Values of performance registers or program variables

From a trace, a performance analysis tool can show– Protocol of execution over time– Statistics for MPI routine execution– Statistics for communication– Dynamic calling tree

Important advantage– Focus on any phase of the execution

Vampirtrace details

Vampirtrace™– Instrumentation library producing traces for Vampir and

Dimemas– Supports MPI–1 (incl. collective operations) and MPI–I/O– Exploits MPI profiling interface– Works with vendors MPI implementations– API for user–level instrumentation– Capability to filter for event subsets

Developed, productized and marketed by Pallas

Available for IBM SP, PE 3.x

Vampir details

Vampir™– Event–trace visualization tool– Analyzes MPI and user routines– Analyzes point–to–point, collective and MPI–IO operations– Focus on arbitrary execution phases– Execution and communication statistics– Filter processes, messages, and user/MPI routines

Jointly developed by TU Dresden and Pallas Productized and marketed by Pallas

Available for IBM RS6000, AIX 4.2/AIX 4.3

Dimemas details

Dimemas– Event–based performance prediction tool– Parameterized machine model

•CPU performance•Communication and network performance

– Predicts performance on modeled platform– What–if analysis determined influence of parameters

Jointly developed by UPC Barcelona and Pallas

Productized and marketed by Pallas

Available for IBM RS6000, AIX 4.2/AIX 4.3

Vampir main window

Vampir 2.5 main window

Tracefile loading can be interrupted at any time Tracefile loading can be resumed Tracefile can be loaded starting at a specified time offset Tracefile can be re–written

Aggregated profiling information– Execution time– Number of calls

Inclusive or exclusive of called routines

Summary chart

Vampir state model

User specifies activities and symbol grouping Look at all/any activities or all symbols

Summary chart

Calculation TracingMPI

MPI_Send

MPI_Recv

MPI_Wait

exchange

Activities

Symbols

Timeline display

To zoom, mark region with the mouse

Timeline display – message details

Click on message line

Message receive op

Messagesend op

Message information

Communication statistics

Message statistics for each process/node pair:– Byte and message count– min/max/avg message length, bandwidth

Message histograms

Message statistics by length, tag or communicator– Byte and message count– Min/max/avg bandwidth

Collective operations

For each process: mark operation locally

Connect start/stop points by lines

Start of opData being sent

Data being received

Stop of op

Connection lines

Collective operations

Click on collective operation display

See global timing info

See local timing info

I/O transfers are shown as lines

MPI–I/O operations

Click on I/O line

See detailed I/O information

Activity chart

Profiling information for all processes

Global calling tree

Display for each symbol:– Number of calls, min/max. execution time

Fold/unfold or restrict to subtrees

Process–local displays

Timeline (showing calling levels) Activity chart Calling tree (showing number of calls)

Effects of zooming

Select one iteration

Updated summary

Updated message statistics

Compare traces

Compare profiling information– To check load balance (between processes)– To evaluate scalability (different runs)– To look at optimization effects (different code versions)

Compare processes 6 and 19

Comparison by routine

Coupling Vampir and Dimemas

Actual program run

Ideal communication

Vampir/Vampirtrace roadmap

Ongoing developments– Scalability enhancements– Functionality enhancements– Instrumentation enhancements

Will be first available commercially on NEC and Compaq platforms

– Earth simulator– ASCI machines

PathForward developments for ASCI machines

Scalability challenges

Scalability in processor count– ASCI–class machines have 1000s of processors– High–end systems have 100s of processors– Applications use most of them

Scalability in time– Need to analyze actual production runs (hours/days)

Scalability in detail– Record and analyze system–specific performance data– Support for threaded and hybrid models

Scalability problems

Counter–based profiling tools are basically OK– Severely limited in the level of detail– Can’t focus into parts of application run

Event–based tools have problems– Event traces get really large– Display tools use huge amounts of memory– Many displays do not scale

Example: Vampir tracefiles for NAS NPB–LU– 128 processes: 3.000.000 records (120 Mbyte)– 256 processes: 15.000.000 records (600 Mbyte)– 512 processes: 150.000.000 records (6 Gbyte)

Threaded programming models

Enhance Vampir to display– Thread fork/join– Thread synchronization– Show a timeline per thread / aggregate threads into single

timeline– Display subroutine/code block execution for each thread

Create instrumentation library for thread packages

Integrate instrumentation capability into OpenMP systems

Cluster node display

Cluster information is already recorded Enhance Vampir to

– show aggregate execution information per node– show communication volume per node

Cluster timeline display

Display node–level information Show communication volume within nodes Show communication between nodes as usual Allow to expand nodes into processes

There may be more than two hierarchy levels ...

Structured tracefile format

Subdivide the tracefile into frames– Time intervals, thread/process/node subsets

Put frame data – All in one file (as today)– In multiple files (one per frame ...)– On a parallel filesystem (exploit parallelism)

Frame index file holds– Location of frame start/end– Frame statistic data for immediate display– “Frame thumbnail”

Structured tracefile format

Vampir loads the frame index Displays immediately available

– Global profiling/communication statistics– By–frame profiling/communication statistics– Thumbnail timeline

User gets overview of application run– Can load particular frame data– Can navigate between frames

User can refine instrumentation/tracing– Get detailed trace of interesting frames

Dynamic tracing control

What can be controlled– Definition of frames– Data to be recorded per frame

Control methods– Instrumentation with Vampirtrace API– Binary instrumentation (atom) or use of a debugger– Configuration file– Interactive control agent (debugger)

Tracing the right data is an iterative process!

For very large systems, still can’t look at complete system (too many nodes)

Display “interesting” nodes only– Regarding communication volume/delays– Regarding load imbalance– Regarding execution times of particular code modules

Scalable Vampir structure

Scalable user–interface Scalable internals

Data Control

Vampir SC

User Interaction

Trace Data Processing

Trace Data I/O

Data Control

Vampir DC

User Interaction

Trace Data Analysis

Display Handling

Structured Trace Data

runs on WS

runs on parallelsystem

may exploit parallel

Access to Pallas tools

Download free evaluation copies from

http://www.pallas.com

The Vampir Performance Analysis Tool Hans–Christian Hoppe Gesellschaft für Parallele Anwendungen...

Documents

Marianum...z.B.: Der Vampir in Märchen und Sage / Vampirinnen als Vorbilder weiblicher Emanzipation / Die Jagd — Eine Gegenüberstellung von Vampiren und deren Jäger / Vampire

Tattoo und der Vampir unter dem ... - cjd-braunschweig… · 2 3 Eine Weihnachtsgeschichte des CJD Tattoo und der Vampir unter dem Weihnachtsbaum

Wie tickt die Jugend heute? - Pallas Kliniken · Partizipativer Führungsstil Selbständigkeit bei alltäglicher Berufsausübung Aufstiegs- u. Karrierechance, Entwicklungsperspektiven

Über die Cell/B.E.-Architektur: Optionen zur Generierung von … · Vampir VampirTrace: freies Trace-Werkzeug – Unterstützt u.a. MPI, OpenMP, Regionen, Hardware-Counter – Erzeugt

Eine Geschichte vom kleinen Vampir Es ist Vollmond. Vater ......D Name: LÖSUNG Eine Geschichte vom kleinen Vampir Es ist Vollmond. Vater Vampir geht zu der alten Kiste im Dachboden

Ophtag 2019 Pallas - Kantonsspital St. GallenMicrosoft PowerPoint - Pallas_uveitis_ophtha2019.pptx Author andrea.eigenmann Created Date 6/24/2019 2:02:41 PM

LYNSAY SANDS - Weltbild...LYNSAY SANDS Ein Vampir für alle Lebenslagen 9532_LYX_Sands 19, Ein Vampir für CS5.5 (Bel.).indd 1 10.11.14 17:43 Die Romane von Lynsay Sands bei LYX: Die

Güterzug- Elektrolokomotiven · 2013. 11. 21. · Podzun-Pallas-Verlag, Friedberg 1985, ISBN 3-7909-0258-6 zLeitfaden für den Dampflokomotivdienst, Niederstrasser, Nachdruck der

Aktuelles zur Therapie des Vorhofflimmerns: ESC-Leitlinien und die Position von Dronedaron unter Berücksichtigung von ATHENA und PALLAS Dr. E. Luciani

Ergebnisbericht des Kooperationsprojekts Bilder der Pallas · GKSS-Forschungszentrum Geesthacht GmbH • Geesthacht • 2000 Ergebnisbericht des Kooperationsprojekts Bilder der Pallas

LYNSAY SANDS - Weltbild.ch · 2015. 1. 20. · LYNSAY SANDS Ein Vampir für alle Lebenslagen 9532_LYX_Sands 19, Ein Vampir für CS5.5 (Bel.).indd 1 10.11.14 17:43

III. Abteilung Wohnungsanzeiger P - Q · 2018. 11. 5. · 464 Pallasmann — Papefch I Pallas-traun Karl, JngDr., Ober- 513111155591.11.23111e115e|.,(5gge11= berg, Straßgangerstr

Dorint Pallas Hotel Wiesbaden 6. DEUTSCHER ......Klaus Mehler, Chefredakteur, Der Handel Revitalisierung und Projektentwicklung 9.00 Uhr Begrüßung und Einführung in den Kongresstag

Ergebnisse des 47. Landeswettbewerbs Jugend Musiziert in ... · PDF fileHoppstock, Johanna 52428 Jülich 17 3. Preis Gelgurt, Anastassia 50825 Köln 17 3. Preis Sonntag, Elena 50321

Souvenirs, Souvenirs - pallas-band.de · PDF fileSouvenirs, Souvenirs petit palais die beliebtesten deutschen Schlager der 50er und 60er Jahre

Der magische Vampir - Die rätselhafte Feder

Bodenebene Duschen im System · Atlantis System GmbH · Kölnstraße 47 · D-50321 Brühl Alle Beschreibungen und Abbildungen sind unverbindlich. Es bleibt das Recht vorbehalten,

Basic Edition BASIC EDITION - architext.dearchitext.de/files/architext-pallas-basic.pdf · praxiserprobte AVA & Controlling Software wird europaweit von öffentlichen Auftrag-gebern,

Anne Rice - Vittorio a Vampir

Der Speckkäfer Attagenus smirnovi ZHANTIEV Asiatische ... · Harmonia axyridis (PALLAS, 1773): two invasive species in the Saarland (Coleoptera: Dermestidae et Coccinellidae) Kurzfassung: