Download pdf - Pypy Benchmarks Pca

7/31/2019 Pypy Benchmarks Pca

1/18

PyPy benchmarks

in Principal Component Analysis (PCA)(DRAFT)

Progress is always about changes

Valery A. Khamenya

May 19, 2012

This report is a quick attempt to apply Principal Component Analysis (PCA) tothe benchmarking measurements of the fastest (as of 2012) implementation PyPy ofa quite nice language Python. The targeted audience is anyone, who loves PyPy andperhaps hates statistics ;)

The factors that influence the variance in benchmarking data are (more powerfultend to go first):

1. progress with time huge!

2. 32-vs-64

3. battle between sympy_expand/sympy_str and twisted_iteration

4. opposition ai vs spitfire-related tests

5. crypto_pyaes vs the rest

6. spectral-norm vs the rest

7. meteor-contest vs the rest

8. fannkuch vs the rest

9. nbody_modified vs the rest

1
http://en.wikipedia.org/wiki/Principal_component_analysishttp://en.wikipedia.org/wiki/Principal_component_analysis


2/18

1 What is it good for?

In short, PCA helps to reveal major factors causing variation in the data. This way one coulddo the following:

1. to guess whats going on in PyPy last year from benchmarking perspective.2. to figure out the biggest hidden games behind the PyPy efforts, what is primarily ad-

dressed, what are the prios

3. to find benchmarking tests that are too similar to each other and probably just increaseredundance

4. to estimate how a test increases the representativity/coverage of benchmarking in relationto others

5. to decrease redundancy in the set of benchmarking tests that are used to state the finalspeed-up of PyPy over CPython and therefore allow people outside from PyPy world rely

more on the PyPy speed-up factor http://speed.pypy.org

6. to see what was the influence of particular source control revision on the factors

7. to guess what kind of operations are mostly stressed in a new test (a group of simplisticbenchmarking unit-tests would be needed for this though: float arithmetics, integerarithmetics, strings, dictionaries, loops, recursion, flow control, exeption handling, OOP,garbage collection, multiprocessing, multithreading, I/O, arrays, etc)

Albeit, one should strongly understand that all conclusions about single test are made basedon analysis of variation of its benchmarking measurements, i.e. changes during multiple mea-surements. That is, PCA is like a snake that could miss a cold motionless prey. And of coursethis snake will be effective, if one has a vector of measurements, e.g. multiple measurements fora single benchmarking test from different git-revisions or, alternatively, multiple measurementsfrom different benchmarking tests for a given single git-revision.

2 Details to skip during 1st reading

The measurements we analyze represent PyPy progress from Jan 2010 (svn epoch) to May 2012(git epoch). First of all lets read the data into a matrix. To make things simple lets respectand consider only those benchmarking tests that have at least 300 measurements. Then letskick heartlessly those benchmarking rounds, where even one of these respected benchmark tests

failed to produce a time measurement.Oops, svn epoch seems to be kicked, but we dont worry for moment. Each column of the

benchmarking data matrix represents measurements for a benchmarking test. Here is, e.g. thetop-left part of the matrix:

> m[1:5, 1:4]

ai bm_chameleon bm_mako chaos

48277:39882f1dfd15-64 0.7149890 0.1392867 0.2094372 0.4828594

48277:39882f1dfd15 0.7056351 0.1474193 0.2078679 0.4821176

48354:10f7167b3e98-64 0.7221774 0.1460866 0.2096604 0.4682653

48354:10f7167b3e98 0.7279082 0.1362366 0.1952811 0.4751145

48400:adab424acda7-64 0.7506032 0.1451451 0.2010512 0.4759733

2
http://speed.pypy.org/http://speed.pypy.org/


3/18

The fun with PCA is that we could analyse the data matrix and then ...transpose it and getan alternative point of view. Well discuss it later.

Currently we get 31 benchmarking tests and 332 benchmarking rounds. This means, in termsof PCA we could dream of 31 factors that we might manage to discover.

3 Just to feed your interest before we go...

Statistical analysis is often underestimated. Just to feed the interest of those, who is too faraway from all PCA, ICA, whatever-else approaches. Lets apply PCA to the data matrix withbenchmarking measurements as-is.

3.1 32 vs 64 an easy Factor Nr. 2

And here are the first 2 strongest factors that influence the variation in benchmarking measure-ments.

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

qq

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

5 0 5

2

1

0

1

2

PC1 x PC2

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

32

64

32

64

32

64

32

64

32

64

32

64

32

32

32

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

32

64

32

32

64

64

32

64

32

64

32

64

32

64

64

32

64

32

64

32

64

32

64

32

32

32

64

32

64

32

64

32

64

32

64

32

64

32

64

64

64

32

32

64

32

32

64

32

64

32

64

32

64

6432

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

3232

32

64

32

64

64

32

64

64

32

64

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

32

64

32

64

32

64

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

64

64

64

32

64

32

64

32

64

32

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

64

32

64

3264

32

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

64

32

32

64

32

64

32

64

32

64

32

Each point represents a round of benchmarking measurements for the corresponding PyPysource control revision and CPU-platform. By the way, factors in PCA are often named PrincipalComponents (PC), therefore PC1, PC2, etc.

To get an idea about the second factor (Y-axis) the benchmarking rounds are marked with64 for 64-bit target platform, and similarly with 32 for 32-bit one. Of course, PCA is notable to interpret factors completely or even magically name them right ;) However it does helpus to sort them by the influence they make on the variances in benchmarking measurements.

3


4/18

The 32-vs-64 was just an easy puzzle about second-strong factor in measurement variance.How strong it is? This graph shows the contribution of each facor in squared variance of thedata:

Ordered factors, major factors are wellseparated

Variances

0

2

4

6

8

An important thing to know about this graph is the larger is difference between the factors,the better is algorithmical separation of factors (and more reliable is the interpretation)

Zoom in first 7:

4


5/18

Ordered factors, major factors are wellseparated (zoom)

Variances

0

2

4

6

8

3.2 No 64-bit anymore and Factor Nr.1

OK lets kick 64-bit for simplisity of the next steps. The graphs will be more sparse and easierto find interesting things during our first approaching to the data.

> m


6/18

q

q

q q

q

q

q

q

q

q q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

qq

q

q

q

q

q

q

q

qq

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q q

q

qq

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q q

q

q

q

q

q

q

q

q

q

qq

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

qq

q

q

q

qq

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

qq

q

q

qq

q

5 0 5

2.

0

1.

5

1.0

0.

5

0.

0

0.

5

1.

0

1.

5

PC1 x PC2 (no 3264 anymore)

48277:39882f1dfd15

48354:10f7167b3e98

48400:adab424acda748462:fc37961a668f

48482:a2b911e61392

48538:993b01fd53d4

48572:1847537fd4b5

48599:a3a5ac0a2daf

48616:b994f7ec222e

48630:fb26ce1b9d1b48649:06cddf70488a

48673:b387640aa6ba

48675:3150cc438a4248724:7cd8e99541db

48761:98bf21b80f

48778:c4dce4f412b1

48852:9e7c5b33e755

48885:c07fe33e541d

49344:73b76d76352b

49403:913f736ff114

49458:365410e9e95e

49500:f46e309f89bd

49539:7d9e78a91dce

49556:b64cba156148

49589:e3fa364982b2

49647:ddbc82ef4d8f

49672:d550918b20a6

49705:0a31d8ef2f8a

49743:bf59d657e73a

49823:d9ef0a8f3fa2

49840:17fd3198ef36

49857:87f2c234b924

50029:06acac97ffa5

50040:e4a0b9e4d23b

50079:87235ee9b8ab

50098:03e42e96479d

50138:4162bc8b5f4c

50181:0732486f6a76

50234:33ec28c6d811

50291:e37e4e6e97b8

50316:8de6f245c959

50321:03796662a8a0

50322:4efbd07c3e55

50358:969865e9cb30

50397:6fb87770b5d2

50452:ef8e9023100e

50481:b673742c84f1

50524:5a9a29b9c0ae

50594:2b3d72c181dd

50605:10601f705a55

50621:b3f614a9de14

50663:16d3f098e8ec

50701:5467c010ecde

50718:196c4e9bbd48

50758:85a5e1fe1ad8

50792:862207881328

50819:bb7e012d070a

50826:f6f8ddc1a2f0

50834:532130e19935

50836:819faa2129a8

50858:44b0e2106e2d

50886:c8ddbb442986

50911:94e9969b5f00

50935:69095778cbfd

50969:d0d0b1bbbee8

50979:6a589f1a038a

51002:ece227c225ab

51018:5afb4fd1f372

51078:b67e65d709e1

51123:7bb8b38d8563

51216:ca3f367e84af51251:30e3fdc262ca

51285:b09a9354d977

51333:eb0269c21eec

51365:5e43d79c76a7

51462:c85a96246d2f

51466:f254dc780358

51498:7745b3fcec92

51557:9adc55550ee851616:5aa09e8483d3

51647:7cd209e0414e

51701:173aa3e5cde0

51707:9901f428b3b1

51738:45af9fa4aed0

51745:29a811af16dc

51817:0586c5404983

51935:cf1a8868cf4e

52004:32425967effa

52024:f054c58ba588

52030:f91dd3570b06

52056:5bf9a08deeb4

52113:380432600a53

52140:646611ce782f

52207:5b7ecbf87681

52257:11d854db3e60

52336:4b90bae5c842

52382:00b830d7bd6a

52396:5e6014f89520

52432:fd14bc0aec12

52490:30cb1ba90150

52533:f272bf10ef94

52573:59a514e97b66

52576:80d15a9a3932

52607:0f03693b05ac

52658:a8321d3e8e9c

52672:b319183b838d

52741:48ef6cd6e2df52769:37fb24cc3dde

52823:f50a42098ae3

52889:0eaf96f13694

52943:092ee39048af

53000:836fcc2fe8d8

53032:f9f3b57f1300

53083:49afda04d4ce

53127:21d7882b8571

53155:3539e2d663f4

53215:edd5581881f4

53240:b608170d963a53248:1dcf738f99a553263:986d17b4a13c

53274:487174b08100

53290:76607038e429

53293:478bc4f20cb7

53305:81acfc4eadac

53817:62902925695c

53841:1c6dc3e6e70c

53869:65f628f558ca

53930:d148511060f8

53958:ea2751a04d47

71:2a83c08dcb0e

53975:8ae92dbdda48

54010:4dcfa3206067

54037:4d18306a2fb3

54046:3b48363cff78

54085:285ff15e1498

54107:5b9f7aa356a0

54131:76c5931f64cc54270:41c799d11717

54279:859f1579f2bd

54289:ad03b1c52876

54308:97c57afceef4

54315:1da1c1632353

54335:83dbfcb6f927

54368:57f6dff7fb22

54440:8ae7413e7b32

54487:11d96d5e877f

54509:b58494d41466

54599:442a3ea22328

54610:92cfbb56d39e

54700:7fc6072593dd

54737:1e469996fdab

54752:6dffe8f51e7b

54769:f6fbfecb93fd

54779:ac3066573611

54796:8cb0aa4c2211

54811:cc436eb0a04b

54836:52324a85becd

54841:38a19a4dd9f3

54854:4fc21e56dbc9

54887:b7558f5630d6

54891:5e8d21a87161

54898:629cfca82920

54915:573a6cacf459

54951:94956a840d5f

54978:89ed5aadced0

54998:d8e00a3ec08d

55013:8021ff42995c

What is special about this stand-alone cluster of git-revisions, where e.g. git-revisions 48761:98bf21b80fc5or, say, 48354:10f7167b3e98 are most extrem ? The answer will explain the benchmarking mea-surement variance Factor Nr.1 in terms of git-revisions.

It looks simply like after revisions in the range near 48277-49500 there was a considerablequalitative speed up, i.e. after Nov/Dec 2011. Well the first 2 factors were not that muchinteresting, but for those, who never saw PCA probably it was fun.

4 No old data

Lets kick old data und focus on the recent changes after git-revision 49600.

> isRecent

+ 49600

> m


7/18

no 64bit data, no old data

Variances

0.0

0.5

1.0

1.5

2

.0

7


8/18

q

q

q

q

q

q

q

q

q

q

q

q

q

qq

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

qq

q

q

q q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

qq

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

qq

qq

q

q

q

q

q

q

qq

q

q

q

q

q

q

q

q

q

qq

q

q

q

q

q

q

q

4 2 0 2 4

1.0

0.

5

0.

0

0.

5

1.0

1.

5

2.0

PC1 x PC2 (no 3264 anymore, no old data)

49647:ddbc82ef4d8f

49672:d550918b20a6

49705:0a31d8ef2f8a

49743:bf59d657e73a

49823:d9ef0a8f3fa2

49840:17fd3198ef3649857:87f2c234b924

50029:06acac97ffa5

50040:e4a0b9e4d23b

50079:87235ee9b8ab

50098:03e42e96479d

50138:4162bc8b5f4c

50181:0732486f6a76

50234:33ec28c6d811

50291:e37e4e6e97b850316:8de6f245c959

50321:03796662a8a0

50322:4efbd07c3e55

50358:969865e9cb30

50397:6fb87770b5d2

50452:ef8e9023100e

50481:b673742c84f1

50524:5a9a29b9c0ae

50594:2b3d72c181dd

50605:10601f705a55

50621:b3f614a9de14

50663:16d3f098e8ec

50701:5467c010ecde

50718:196c4e9bbd48

50758:85a5e1fe1ad8

50792:862207881328

50819:bb7e012d070a

50826:f6f8ddc1a2f0

50834:532130e1993550836:819faa2129a8

50858:44b0e2106e2d

50886:c8ddbb442986

50911:94e9969b5f00

50935:69095778cbfd

50969:d0d0b1bbbee8

50979:6a589f1a038a

51002:ece227c225ab

51018:5afb4fd1f372

51078:b67e65d709e1

51123:7bb8b38d8563

51216:ca3f367e84af

51251:30e3fdc262ca

51285:b09a9354d977

51333:eb0269c21eec

51365:5e43d79c76a7

51462:c85a96246d2f

51466:f254dc780358

51498:7745b3fcec92

51557:9adc55550ee8

51616:5aa09e8483d3

51647:7cd209e0414e

51701:173aa3e5cde051707:9901f428b3b1

51738:45af9fa4aed0

51745:29a811af16dc

51817:0586c5404983

51935:cf1a8868cf4e

52004:32425967effa

52024:f054c58ba588

52030:f91dd3570b06

52056:5bf9a08deeb4

52113:380432600a53

52140:646611ce782f

52207:5b7ecbf87681

52257:11d854db3e60

52336:4b90bae5c842

52382:00b830d7bd6a

52396:5e6014f89520

52432:fd14bc0aec12

52490:30cb1ba90150

52533:f272bf10ef94

52573:59a514e97b66

52576:80d15a9a3932

52607:0f03693b05ac

52658:a8321d3e8e9c

52672:b319183b838d

52741:48ef6cd6e2df

52769:37fb24cc3dde

52823:f50a42098ae3

52889:0eaf96f13694

52943:092ee39048af

53000:836fcc2fe8d8

53032:f9f3b57f1300

53083:49afda04d4ce

53127:21d7882b8571

53155:3539e2d663f4

53215:edd5581881f4

53240:b608170d963a

53248:1dcf738f99a5

53263:986d17b4a13c

53274:487174b08100

53290:76607038e429

53293:478bc4f20cb7

53305:81acfc4eadac

53817:62902925695c

53841:1c6dc3e6e70c

53869:65f628f558ca

53930:d148511060f8

53958:ea2751a04d47

53971:2a83c08dcb0e

53975:8ae92dbdda48

54010:4dcfa3206067

54037:4d18306a2fb3

54046:3b48363cff78

54085:285ff15e1498

54107:5b9f7aa356a0

54131:76c5931f64cc54270:41c799d11717

54279:859f1579f2bd54289:ad03b1c52876

54308:97c57afceef4

54315:1da1c1632353

54335:83dbfcb6f927

54368:57f6dff7fb2254440:8ae7413e7b32

54487:11d96d5e877f

54509:b58494d41466

54599:442a3ea22328

54610:92cfbb56d39e

54700:7fc6072593dd

54737:1e469996fdab

54752:6dffe8f51e7b

54769:f6fbfecb93fd

54779:ac3066573611

54796:8cb0aa4c2211

54811:cc436eb0a04b

54836:52324a85becd

54841:38a19a4dd9f3

54854:4fc21e56dbc9

54887:b7558f5630d6

54891:5e8d21a87161

54898:629cfca82920

54915:573a6cacf459

54951:94956a840d5f

54978:89ed5aadced0

54998:d8e00a3ec08d

55013:8021ff42995c

The PC1 (X-axis) is rather about time progress, but what abot Y-axis? What are its poles52382:00b830d7bd6a and 54509:b58494d41466 ?

8


9/18

5 Flip-flop!

As mentioned, we could rotate data matrix to see things from different point of view.

qqq

q

q

q

qq

q

q

qq

qq

qq

q

qq

q

q

q

qq

qqq

q

0 100 200 300 400 500

2

0

2

4

point is a benchmarking test

aibm_chameleonbm_makochaos

crypto_pyaes

django

fannkuch

float

go

html5lib

json_bench

meteorcontestnbody_modified

pyflatefastraytracesimple

richardsrietveld

slowspitfire

spambayesspectralnorm

spitfire

spitfire_cstringio

sympy_expand

sympy_integratesympy_str

sympy_sumtelcotwisted_iterationwisted_namestwisted_pb

twisted_tcp

What special about json_bench or html5lib? Nothing much interesting. They always showhigher avg_changed than the others. Lets normalize the avg_changed range for each test.

9


10/18

first factors are wellseparated, huray! But others... :(

Variances

0e+00

2e04

4e04

6e04

8e04

10


11/18

q

q

q

q

q

q

q

q qq

q

q

q q

q q

q

q

q

q

q

q

q

q

q

qq

q

qq

0.10 0.08 0.06 0.04 0.02 0.00 0.02 0.04

0.0

4

0.0

2

0.0

0

0.0

2

PC1 x PC2, point is a benchmarking test (Normalized)

ai

bm_chameleon

bm_mako

chaos

crypto_pyaes

django

fannkuch

float go

html5lib

json_bench

meteorcontest

nbody_modifiedpyflatefast

raytracesimplerichards

rietveld

slowspitfire

spambayes

spectralnorm

spitfire

spitfire_cstringio

ympy_expand

sympy_integrate

sympy_str

sympy_sumtelco

twisted_iteration

twisted_namestwisted_pbtwisted_tcp

So the main battle of the last PyPy year seems to be between sympy_expand/sympy_strand twisted_iteration.

The second big opposition is ai vs spitfire-related tests.Well, the problem is that these first two battles are the only well-recognizable

factors.OK, lets kick these measurements to hear the rest chorus:

> bigSolo toTake tm


12/18

no big solo poles changed a bit, but not much

Variances

0.0

0000

0.0

0005

0.0

0010

0.0

0015

0.0

0020

0

.00025

12


13/18

q

q

qq

q

q

q

q

q

q

q

q

qq

q

q

q

q

qq qq

q q

0.00 0.02 0.04 0.06

0.0

1

0.0

0

0.0

1

0.0

2

0.0

3

0.0

4

0.0

5

PC1 x PC2, no big solo, point is a benchm. (Normalized)

bm_chameleon

bm_mako

chaoscrypto_pyaes

django

fannkuch

float

go

html5lib

json_bench

meteorcontest

nbody_modified

pyflatefastraytracesimple

richards

rietveld

spambayes

spectralnorm

sympy_integrate

sympy_sumtelcotwisted_names

twisted_pbtwisted_tcp

Lets kick 3 more tests:

> backSolo toTake tm


14/18

no back solo too poles changed a bit, but not much

Variances

0.0

0000

0.0

0004

0.0

0008

0.0

0012

14


15/18

q

q

q

q

q

q

qq

q

q

q

q

q

q

q

q

q

q

q

q

q

0.04 0.03 0.02 0.01 0.00 0.01

0.

03

0.

02

0.

01

0.

00

0.

01

0.

02

PC1 x PC2, no back solo too, point is a benchm. (Normalized

bm_chameleon

bm_mako

chaos

django

fannkuch

float

go

html5lib

json_bench

nbody_modified

pyflatefast

raytracesimple

richards

rietveld

spambayes

sympy_integrate

sympy_sum

telco

twisted_names

twisted_pb

twisted_tcp

Lets kick 2 more tests:

> backSolo2 toTake tm


16/18

no back solo2 too poles changed a bit

Variances

0e+00

2e05

4e05

6e05

8e05

1e04

16


17/18

q

q

q

q

q

q

qq

q

q

q

q

q

q

q

q

q

q

q

0.02 0.01 0.00 0.010.

020

0

.010

0.

000

0.

005

0.

010

PC1 x PC2, no back solo2 too, point is a benchm. (Normalized

bm_chameleon

bm_mako

chaos

django

float

go

html5lib

json_bench

pyflatefast

raytracesimple

richards

rietveld

spambayes

sympy_integrate

sympy_sum

telco

twisted_names

twisted_pb

twisted_tcp

17


18/18

6 Appendix

An example of non-recognizable PCA factors from the 31*100 matrix of normally distributedrandom data.

> p plot(p, n = 31, "really a bad case, factors can' t be separated")

really a bad case, factors can't be separated

Variances

0.0

0.5

1.0

1.5

2.0

This report is generated using LATEX, Sweave and R.