Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Wissenschaftliches Rechnen im Paderborner Umfeld
ZKI‐Arbeitskreis Supercomputing14. März 2013
Prof. Dr. Holger Karl, Dr. Jens Simonholger.karl@uni‐paderborn.de, simon@uni‐paderborn.de
3/18/2013 1
Paderborn University
• Students total: 17.566– BA: 7.390, MA: 1.746, Lehramt: 6.767, PhD candidates: 815
• Staff total: 2.010– Professors: 196, Scientific.: 919, Non‐Scientific: 634
• Budget: € 180 Mio.– Third‐party funding (2011): € 37,8 Mio.
3/18/2013 2
Status 2012
Paderborn University • Faculties
– Faculty of Arts and Humanities– Faculty of Business Administration,
Economics and Business Computing– Faculty of Science– Faculty of Mechanical Engineering– Faculty of Computer Science,
Electrical Engineering and Mathematics
• Research– 9 central research institutes– 3 DFG collaborative research centers (SFB)– 1 DFG research group– 6 graduate colleges / schools– 2 Fraunhofer institutes
3/18/2013 3
Status 2012
Computer Science @ UPBInformatik (2012)• Students total: 1.307• Starters (only BA): 164• Bachelor‐/Master graduates: 146• Professors: 21• Scientific Staff: 100• Third‐party funding: ca. €7 Mio.• Rankings of the last years DFG (national science foundation): 7/129 in approval of
science funds CHE ranking: leading position among 77 universities
regarding 3rd party funds, teaching, infrastructure, research reputation
ACM: best German institution for software engineering
3/18/2013 4
IT Organization Structure• IMT
– General infrastructure• Networks (WAN, LAN, WLAN, VoIP, …)• Servers (Email, Web, hosts, VMs, …)• Storage (Backup, Archive, ..)
– IT security– Digital media (recording and live‐streaming of lectures)
• PC²– Scientific, high performance, high throughput computing – „Path finding“ (together with Informatik)
• ITD– IT services for university administration
3/18/2013 5
6
Paderborn Center for Parallel ComputingPC²
• Regional competence center for parallel and distributed data processing in research and application– Founded in 1991– Scientific institute of the Paderborn University– Goal: Foster and explore efficient use of parallel and distributed
systems– Funding from regional and federal government, industrial and EU‐
projects
3/18/2013 6
PC2 – Service & Research
• An infrastructure provider and service center for high‐performance computing – Main customers: UPB members and surrounding universities – Engineering and natural sciences communities
• A research institute, working on high‐performance computing and cloud computing topics – In collaboration with several chairs of computer science
3/18/2013 7
Compute Center• Located in building O, 2nd floor, main campus• Facility available since 2011, shared by IMT and PC²• Ca. 400 m2 floor space (200 m² dedicated to HPC)• Electrical Power
– Up to 550 KW for HPC – Up to 200 KW for basic IT services and server hosting– 2 x 200 KW UPS (batteries) and 1.6 MW Diesel Generator
• Cooling– Air cooling (1/3)
• 21‐28 Grad Celsius room temperatures, Cold Aisle Containments
– Water cooling (2/3)• 20/26 Grad Celsius, inrow cooling/backdoors (new HPC system)
3/18/2013 8
Compute Center Physical protection
• Security gate (two doors) with key card access
• Early fire detection system
• Nitrogen induction fire suppression system
• Certification possible (not done yet)
Network security
Hierarchical Firewall structure for PC²
Availability
Redundancy of servers for important services, data backup
3/18/2013 9
Resources of PC² Cluster• OCuLUS [612 compute nodes, >200TFLOPS] (03/2013) • Arminius+ [60 compute nodes]• PLING2 [57 compute nodes]• BIS‐Grid [8 compute nodes]• High Throughput [varying compute nodes]
Cloud• Green‐PAD [54 nodes Paderborn, 38 nodes Mainz]• Scientific Cloud [38 nodes]
Misc• 32 nVidia K20, 8 Intel Xeon Phi• 2*Maxler FPGA compute nodes, 1* Convey‐HC FPGA
103/18/2013
Storage• ISILON IQ9000x NAS 50 TB• ISILON x400 NAS 800 TB (05/2013)• FhGFS parallel filesystem 450TB
SMP• RX600 with 4 proc. sockets, 24 cores , 128GB• 2 x Dell R820 with 4 proc. sockets, 32 cores, 1TB
OWL Cluster – OCuLUS9.856 processor cores, 200 TFLOPS, 45 TByte main memory
• 612 compute nodes, each 2 x Intel E5‐2670 (2.6GHz)– 552 nodes with 64 GByte main memory– 20 nodes with 256 GByte– 32 nodes with 64 GByte and accelerator Nvidia Tesla GK110– 8 nodes with 64 GByte and accelerator Intel Xeon Phi 5110P
• 2 SMP nodes, each 4 x Intel E5‐4650 (2.7GHz) and 1 TByte main memory
• Parallel Storage System– 500 TByte disk capacity– Fraunhofer FhGFS– 7 Storage nodes and 2 Meta data nodes
• InfiniBand‐Interconnect– 40 Gbit/s Mellanox HCA and switches– Half bisectional bandwidth
• Redundant Login nodes for Linux and Windows
3/18/2013 11
Wikim
edia Com
mons
Research Topics • On‐going
– Resource management: Own system, planning‐based (OpenCCS) – Integrating hardware accelerators into HPC (e.g., FPGA) – Energy‐efficient HPC (e.g., project GreenPAD)
• Near future – Support for big data workloads in HPC clusters (e.g., Hadoop,
Map/Reduce jobs), supported by resource management – Support for software‐defined networking (SDN) in clusters
• Mid‐term– SDN and big data workloads jointly integrated in & supported by
a cluster resource management system • Leverages new Isilon‐based storage infrastructure
123/18/2013
Data CenterData Center
Custom ComputingCustom Computing
Research Projects
Computer Architecture– EPiCS: Engineering Proprioception in Computing Systems (EU)– Enhance: Enabling heterogeneous hardware acceleration using novel
programming and scheduling models (BMWF)– Gomputer: Accelerate Computer Go (Gomputer) (Industry)
Middleware & System Software– Infrastructure for On‐the‐Fly Computing (SFB 901)– GreenPAD: Energieoptimierte IKT für regionale Wirtschafts‐ und
Wissenscluster (BMWi)– Simba: Simulationsplattform für die Automobilentwicklung (BMWi‐ZIM)– Data Center Management: Virtualization technology (Industry)– OpenCCS: Resource management system (Long term project)– SCALUS: Scalable, Ubiquitous Storage (EU)
133/18/2013
© H
einz
Nix
dorf
Inst
itut,
Uni
vers
ität P
ader
born
14 – 02.04.2012 GIBU 2012 (Schloss Dagstuhl)
Friedhelm Meyer auf der Heide
Collaborative Research CentreOn-The-Fly ComputingIndividualized IT-Services in Dynamic Markets
© H
einz
Nix
dorf
Inst
itut,
Uni
vers
ität P
ader
born
15 – 02.04.2012 GIBU 2012 (Schloss Dagstuhl)
The Vision of “On-The-Fly Computing”
• Nearly automatic configuration and execution of individual IT services, whichare constructed out of base services traded in worldwide available markets
• Organization of these markets of services
configuration
SoA
execution
Cloud Computing
market support
Distributed Comp.
market organization
Mechanism Design
© H
einz
Nix
dorf
Inst
itut,
Uni
vers
ität P
ader
born
16 – 02.04.2012 GIBU 2012 (Schloss Dagstuhl)
The actors in the “On-The-Fly Computing” markets
User/Client
SupplierOTF Compute CenterOTF Software Provider
OTF Provider
Provision of IT services Market organization
© H
einz
Nix
dorf
Inst
itut,
Uni
vers
ität P
ader
born
17 – 02.04.2012 GIBU 2012 (Schloss Dagstuhl)
Challenges for the provision of IT services
OTF ProviderUser/Client
s
SupplierOTF Compute CenterOTF Software Provider
Project Area B: Modeling, Composition andQuality Analysis of Services
Project Area C:Reliable Runtime Environments
& Application Scenarios
■How to configure IT servicesout of base services?
■How to ensure the correct, reliable,and efficient execution of configured services?
© H
einz
Nix
dorf
Inst
itut,
Uni
vers
ität P
ader
born
18 – 02.04.2012 GIBU 2012 (Schloss Dagstuhl)
Challenges with the market organization
Project Area C:Reliable Runtime Environments
& Application Scenarios
■How can interactions in largedynamic markets be supportedand protected?
■How can an efficient functionalityof the market be guaranteed withstrategic behavior of the participants?
■How can the usefulness of ourconcept be demonstrated?
Project Area A:Algorithmic and
Economic Foundations
© H
einz
Nix
dorf
Inst
itut,
Uni
vers
ität P
ader
born
19 – 02.04.2012 GIBU 2012 (Schloss Dagstuhl)
SFB 901: Structural Overview
Project Area A
Algorithmic and Economic Foundations A1: Local Strategies in Dynamic Networks (Meyer auf der Heide, Scheideler) A2: Overlays over Physical Networks (Frey, Karl) A3: Market of Services (Briest, Frick, Haake)
Project Area B
Modeling, Composition & Quality Analysis of Services B1: Parametric Service Specifications (Engels, Schäfer) B2: Configuration & Rating (Kleine Büning, Kleinjohann) B3: Composition Analysis in Uncertain Contexts (Becker, Wehrheim) B4: Proof-Carrying Services (Platzner, Wehrheim)
Project Area C
Reliable Runtime Environments & Application Scenarios C1: Robustness & Security (Blömer, Scheideler, Sorge) C2: On-The-Fly Compute Centers (Meyer auf der Heide, Platzner, Plessl) C3: Modeling of Optimization Problems (Koberstein, Suhl)
Entwicklung, Erprobung und Transfer eines energieoptimalen IKT Infrastrukturmodells
für regionales Wirtschafts- und Wissenschaftscluster
Laufzeit: 01.06.2012 bis 31.05.2014
www.green-pad.de
Projektverbund GreenPAD
GreenPAD
Idee: Regionaler WWC
Bündelung der IKT-Infrastruktur an einem Standort (RZ der Uni) (economy of scale -> Energie-Optimierung)
Entwicklung attraktiver Service-Modelle für die regionale Wirtschaft (nicht nur, aber auch Cloud-Services -> IaaS, PaaS, SaaS)
Zielgruppe in der Projektphase: StartUps, KMU, Pilotkunden vorrangig im Technologiepark und Zukunftsmeile
Entwicklung von Übergangsmodellen von regionalen in einen überregionalen Cluster
Lastverteilungsverfahren in Abhängigkeit von vorhandener regional erzeugter regenerativer Energien
18.03.201321
Projektverbund GreenPAD
GreenPAD – Energiesparpotential
18.03.201322
IMTE.ON
PC2
Unilab
FUJITSU
Uni/ÖA
HPC Kunden
Technologie Park / KMU
Smart Grid KonzeptEnergieeffiziente Komponenten
Energieeffizientes RZ(Cloud, RessourcenManagement)
Green OfficeThin / Zero Clients
Energiesparpotential (Maßnahmen kumuliert)
regionales RZEnergie
Technologie und Lösungen
Beschaffung Produktion Absatz
Multi‐Site Cloud Computing
Scenario• Thousands of Cloud Center Locations• Distributed applications (multi tier, P2P, ..)Problems• Select Cloud Center• Matching applications’ requirements• Maximize Service Level Quality• Minimize expensesResearch• Deployment of applications
with respect to thegeographicaldistribution
3/18/2013 23
OpenFlow in Data Centers• OpenFlow: centrally computed routing/switching tables – Distributed to switches on‐demand, on per‐flow basis
• Possible benefits: – Fine‐grained control, e.g., load balancing – Support for data patterns of complex applications
• Goal: Integrate network configuration with resource management and application knowlegde
18.03.2013 24
OpenCCS
25
• Planning‐based – Shows planned start times – Deadline scheduling– Advance reservations usable by multiple groups / users
• Custom resources• Time‐varying limitations and validities of users / groups
– Group x gets not more than 4 GPUs on Monday and Friday between 14:00 and 17:30
• PBS Pro‐like resource description– 248:ncpus=7:vmem=20g:phi=true+128:ncpus=8:mem=48g, place=scatter:excl
• Submission rate up to 100Hz• Planned Features
– Management of Virtual Machines• „Give me 5 Linux and 2 Windows VMs in a separate VLAN using these images“
– Hierarchical Resources• User can specify resource (sub‐)classes: „Give me an accelerator whatever available“
3/18/2013
OpenCCS on OCuLUS
26
$ccsinfo -–allocatable -–classes
Name Class #HostsAvail/Max
============================ncpus 16 611/612
32 2/2mem 64g 592/592
1t 2/2256g 19/20
vmem 80g 592/5921t 2/2264g 19/20
arch SL 6.3 613/614ib_qdr true 613/614phi false 605/606
true 8/8tesla false 581/582
true 32/32
$ccsinfo --allocatable
Resource Amount DescriptionName Used/Avail/Max======================================ncpus 0/9840/9856 Coresmem 0/44.8t/45t Phys. Memoryvmem 0/54.4t/54.6t Virt. Memoryarch - Architectureib_qdr - Infiniband QDRphi - Intel Xeon Phitesla - Tesla GK110
3/18/2013
273/18/2013
And after OCuLUS? • Traditional clusters: Performance only relevant metric
• Today: Performance efficiency – E.g., power consumption
• Tomorrow: Usage efficiency? – Sacrifice how much performance to increase usability by how much? To broaden the user base how wide?
– “Number of relevant computations” as metric?
18.03.2013 28
Performance
Con
veni
ence
http://www.pc2.de
293/18/2013