CArcMOOC 06.05 - Micro Benchmarks

Carc 06.05

alessandro.bogliolo@uniurb.it

06. Performance optimization06.05. Micro-benchmarks

• Incremental micro-benchmarks

• CPI characterization

• Latency characterization

• Repetition time characterization

• Branch characterization

• Dealing with multiple issue processors

• Dealing with OOO execution

Computer Architecture

Carc 06.05

Problem statement

• Objective: • Empirical characterization of the performance model of a CPU (CPI, LAT

and RT of each instruction)

• Issues:• Single contributions not directly accessible

• The overall CPUT of a benchmark can be measured with a limited resolution

• Approach• Isolation of the contribution under characterization by means of

differential measures of the CPUT of incremental synthetic benchmarks

• Amplification of the contribution under characterization by means of loops that repeat the execution of the same code segment a large number of times

Carc 06.05

Incremental benchmarks

Decoupling

Loop control

code segment A;

for (i = 0; i < N; i++) {

code segment 0;

code segment 1;

code segment B;

• CUT:– code under test

• Incremental benchmark– Benchmark B0 doesn’t contain CUT

– Benchamrk B1 contains CUT

• Differential measure:

• Implementation issues:– CUT must be independent from

code segments 0 and 1

• Property:

BCPUTBCPUTCUTCPUT

)0()1()(

BCPUTerrCUTCPUTerr

))(())((

Carc 06.05

Characterization of CPI

• CUT:• The CUT contains only the instruction under measurement

(instr)

• Differential measurement• Two measures are sufficient to compute the CPI from

equation (1):

CPI(intsr) = CPUT(CUT)

• Implementation issue:• Instruction instr must be independent from code segments

0 and 1

Carc 06.05

Characterization of LAT (1)

• CUT:

– To measure the LAT between instr1 and instr2

– Use a first CUT (called CUT0) that contains only

instr1

– Use a second CUT (called CUT1) that contains both

instr1 and instr2

• Differential measurement

– Use equation (1) twice to compute CPUT(CUT0) and

CPUT(CUT1)

– Compute LAT by means of:

)0()1()2,1( CUTCPUTCUTCPUTinstrinstrLAT (3)

Carc 06.05

Characterization of LAT (2)• CUT:

– To measure the LAT between instr1 and instr2

– Use a parametric CUT (called CUT<n>) that contains

instruction 1, followed by n NOPs, followed by instr2

• Differential measurement

– Use equation (1) twice to compute CPUT(CUT<n>)

for all benchmarks

– If CPUT(CUT<n>)=CPUT(CUT0), than the n NOPs

are masked by latency (i.e., they replace the CPU

stalls caused by the latency in CUT0)

– If N is the maximum number of masked NOPs

)()1()2,1( NOPCPINinstrCPIinstrinstrLAT (4)

Carc 06.05

Characterization of RT

• Objective:• Characterization of the repetition time between two

instructions instr1 and instr2

• Approach:• The repetition time can be measured by using the same

techniques outlined for characterizing the latency

• Implementation issue:• Instructions instr1 and instr2 must be data-independent

Carc 06.05

Characterization of branches

• Untaken branches:• The CUT contains only a branch conditioned to a never-true

condition

• Taken branches:• The CUT contains a conditional branch followed by one or

more dummy instructions

• The conditional branch has an always-true condition and leads to the first instruction of code segment 1

Carc 06.05

Dealing with multiple issue

• Issues:• A single instruction could provide no effect on the CPU

time, being issued together with instructions outside the CUT

• In superscalar processors the programmer has no control on the actual issuing rate

• Implementation requirements:• In order to appreciate the effects of the CPI of the

instruction under test, several incremental benchmarks must be used with the target instruction repeated an increasing number of times. The plot of the differential CPU time versus the number of repetitions should provide all the information required.

Carc 06.05

Dealing with OOO execution (1)

• Issues:• Dynamic scheduling makes it more difficult to build

effective benchmarks to characterize instruction-specific performance parameters

• The programmer has no control on the actual issuing order

• Implementation requirements:• The instructions in the CUT are executed in the order they

appear in the code if and only if they are in the most convenient position (otherwise dynamic reordering is performed, possibly interleaving the CUT with code segments 1 and 2)

Carc 06.05

Dealing with OOO execution (2)

• Micro-benchmarks with repeated CUT:

• B<m> contains m independent replicas of the CUT

• The CPUT of the benchmarks is plotted as a function of m

• For m larger than m’, the CPUT grows linearly with m

• The CPUT(CUT) is computed as

with m > m’• Code segments 0 and 1 are not strictly

required

Decoupling

Loop control

code segment A;

for (i = 0; i < N; i++) {

code segment 0;

code segment 1;

code segment B;

BCPUTBCPUTCUTCPUT mm )()(

)( 1 (5)

Carc 06.05

Inline Assembler

• Issues:• The code segments under test must be directly specified in

Assembler

• Solutions:• Several compilers (including GCC) support inline Assembler,

code segments written in Assembly embedded within the C source code capable of accessing C variables and labels to perform low-level tasks

• Refs:• https://gcc.gnu.org/onlinedocs/gcc/Using-Assembly-Language-with-

C.html#Using-Assembly-Language-with-C

• https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

• http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html

Carc 06.05

Example

CArcMOOC 06.05 - Micro Benchmarks

Education

Micro-Messtechnik - wuerth.ro · 678 Micro Messtechnik Micro-Messarmaturen von Indunorm sind speziell für Fluid-Systeme entwickelt worden. Sie werden über Messschläuche kleinster

Testing Trends und Benchmarks 2013

Agile Trends und Benchmarks 2013

Entwicklung eines Benchmarks für die Performance von In-Memory

JAHRESREPORT 2018 JOBMANAGEMENT...BENCHMARKS 2015-2018 Seite 7 von 44 3. BENCHMARKS 2015-2018 Arbeitsmarkt in Wien 2015 – 2018 Die beiden nachstehenden Abbildungen bieten einen Einblick

Micro Whitepaper_Umfassende_Abwehr

micro services

13 Tage Luxuskreuzfahrt vom 06.05. bis 18.05.2018| … · MS ALBATROS 13 Tage Luxuskreuzfahrt vom 06.05. bis 18.05.2018| MS Albatros RUND UM DIE BRITISCHEN INSELN - LUXUSKREUZFAHRT

12.02.20141 Die ESF- Bildungsmesse auf dem Messegelände Bremen Halle 4 06.05. – 07.05.2011

2015 AHSC Benchmarks

Micro blogging antunovic

244 PGConf12 OLTP Benchmarks

Micro Influencer

Testing Trends und Benchmarks 2013 De

Javaland 2016 Concurrency-Modelle auf der JVM - doag.org · Achtung bei Benchmarks! • Ungeeignet: CPU-lastige Benchmarks • Ungeeignet: Alles, was synchrone I/O verwendet • Es

Mirage Micro™ - ResMed€¦ · Mirage Micro™ nasenmaske Vielen Dank, dass Sie sich für die Mirage Micro™ entschieden haben. Verwendungszweck Die Mirage Micro führt dem Patienten

MIKRO Zerspanungswerkzeuge MICRO Cutting tools MICRO … · 2019-06-04 · MIKRO Zerspanungswerkzeuge MICRO Cutting tools MICRO Outils de coupe. ... ping customised tools for particular

Benchmarks 2015

Mit Benchmarks Bankenkosten senken

BREEDS: BENCHMARKS FOR SUBPOPULATION SHIFT