73
Ahmed Hemani KTH – Dept of Electronics and Embedded Systems, School of ICT, KTH. Acknowledgement: Nasim Farahini, Muhammad Asad, Li Shuo, Hassan Sohofi, Muhammad Ali Shami, Adeel Tajammul, Omer Malik, Anders Lansnser, Christer Svensson The SiLago Method: Next Generation VLSI Architectures and Design Automation 1

The SiLago Method: Next Generation VLSI Design Automation

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The SiLago Method: Next Generation VLSI Design Automation

Ahmed HemaniKTH – Dept of Electronics and Embedded Systems, School of ICT, KTH.

Acknowledgement:Nasim Farahini, Muhammad Asad, Li Shuo, Hassan Sohofi, Muhammad Ali Shami, Adeel Tajammul, Omer Malik, 

Anders Lansnser, Christer Svensson

The SiLago Method: Next Generation 

VLSI Architectures and Design Automation

1

Page 2: The SiLago Method: Next Generation VLSI Design Automation

The Core Ideas behind the SiLago Method

Manufacturing Cost

5 MUSDs

Engineering Cost45 MUSDs

< 5 MUSDs<< 45 MUSDs

The SiLago Method

1. Higher abstraction of Physical Design Platform

2. A structured grid based physical design scheme

Page 3: The SiLago Method: Next Generation VLSI Design Automation

3

The Large Engineering and Manufacturing Cost

Software Centric Accelerator rich Platform Based Design

Loss in Silicon and Computational Efficiencies

Blocks Application CategoriesStifles Innovation

Page 4: The SiLago Method: Next Generation VLSI Design Automation

Generality comes at a huge cost ofSilicon, Computational and Engineering efficiencies

Custom solutions areOrders of magnitude 

more efficient

The SiLago Method

4

Generality vs. Customisation

Accelerator Rich Software Centric 

Platform Based SOC DesignHardware Centric Custom Design

Page 5: The SiLago Method: Next Generation VLSI Design Automation

5

Energy Breakdown in a GPP

Data Supply 28 %

Instruction Supply 42%

Clock & Control24%

Arithmetic 6%

William J. Dally, James Balfour, David Black‐Shaffer, James Chen, R. Curtis Harting,Vishal Parikh, Jongsoo Park, and David Sheffield, Stanford University, ”Efficient Embedded Computing”. IEEE Computer July 2008

Page 6: The SiLago Method: Next Generation VLSI Design Automation

6

The Impact of Customization

CPUCore i7

GPUGTX255

FPGALX760

100

101

102

SiLago ASIC

GFlops /w

FFT 2048Matrix Matrix Multiplication

Page 7: The SiLago Method: Next Generation VLSI Design Automation

A Brief History VLSI Design Automation

To Explain Why the Path of Customization has been abandoned

7

Page 8: The SiLago Method: Next Generation VLSI Design Automation

Abstraction Level

# of Solutions increases exponentially with abstraction gap

RTL/LogicSynthesis

Gates

Physical 

Physical Synthesis

High‐level Synthesis

RTL /‐architecture

Algoritims

Application‐levelSynthesis

The Design SpaceManual: Stick Diagram

, Mead 

Conway, Silicon Com

pilerSystem level SynthesisSystem

Application

The Mead Conway Era

Page 9: The SiLago Method: Next Generation VLSI Design Automation

The Mead Conway Era Survived As long as the complexity was of the 

order of O(10K gates)

9

Page 10: The SiLago Method: Next Generation VLSI Design Automation

One

time

Abstraction Level

# of Solutions increases exponentially with abstraction gap

RTL/LogicSynthesis

Standard‐Cell

Physical 

Physical Synthesis

High‐level Synthesis

RTL /‐architecture

Algoritims

Application‐levelSynthesis

System level SynthesisSystem

Application

The Standard Cell EraThe Design Space

Manual

Automated

Standard Cells

Page 11: The SiLago Method: Next Generation VLSI Design Automation

What Standard Cells Did 

Physical Design DisciplineStandard pitch and Row based layoutEnabled physical design automation

Improves efficiency of1. Synthesis from RTL to GDSII2. Verification at RTL3. System Design

AbstractionBoolean level abstraction

Hides circuit and physical design detailsEnabled logic synthesis

Page 12: The SiLago Method: Next Generation VLSI Design Automation

Standard Cells as building blocks are not scalable for 10‐100 million gate designs

Standard Cell~10‐100 K gates

~10‐100 Million gates

Page 13: The SiLago Method: Next Generation VLSI Design Automation

An Analogy

Page 14: The SiLago Method: Next Generation VLSI Design Automation

So what happens when you try to build skyscapers with bricks

14

Page 15: The SiLago Method: Next Generation VLSI Design Automation

Commercial HLS achieves local optimisation

ADC FDEC DEC RRC ↓2

EQFilter

CRComp SLICER

CarrierAdaptation

EQAdaptation

ClockAdaptation

System Control

Global constraints are manually partitioned to local partitionsThe synthesis tool does the local optimisation

Commercial HLS

Global Area, Energy and latency constraints are specified for the application

15

Local Optimisation  min (L); L is the # of algorithms in the applicationApplication

Algorithms

Page 16: The SiLago Method: Next Generation VLSI Design Automation

Commercial HLS: No synthesis of inter‐algorithm interfaces in an Application

ADC FDEC DEC RRC ↓2

EQFilter

CRComp SLICER

CarrierAdaptation

EQAdaptation

ClockAdaptation

System Control

16

The user has to manually refine the interface between the synthesized algorithms

This manual refinement induces a functional verification step because the correct by construction contract assured by machine translation is now 

violated

Commercial HLS

Page 17: The SiLago Method: Next Generation VLSI Design Automation

The 45 MUSD State of the Art SOC Design Flow

Functio

nal  Ve

rification

Constraints V

erificatio

n: 

Timing/Energy/Pow

er/Area

Chip

Automatic:High‐level, 

RTL / Logic & Physical Synthesis

Logic: Algorithm + RTL + Boolean

System: Multiple applications

Software Design

Architecture Definitionin terms of pre‐designed IPs

Stitch Architecture: Buy and Assemble

System Architecting1. HW/SW Partitioning2. Interface Design3. Memory & Interconnect 

Hierarchy4. I/O Design

17

Page 18: The SiLago Method: Next Generation VLSI Design Automation

Solution:

The SiLago Method

SiLago = Silicon Large Grain Object

18

Inspired by Lego

Page 19: The SiLago Method: Next Generation VLSI Design Automation

We shifted to pre‐fabricated wall segments

Page 20: The SiLago Method: Next Generation VLSI Design Automation

The First Proposition –Raise Abstraction to Arch level

SiLago Block(Register Files, DPUs, Switch boxes, 

Processors, SRAM banks etc.)Standard Cell

4‐5 orders larger than Sandard Cell

Characterised boolean operations

Characterised Micro‐architectural operations

SiLago BlocksAre NOT IPs – Soft or Hard

Page 21: The SiLago Method: Next Generation VLSI Design Automation

21

Solutions to VLSI Design Complexity:

1. Abstraction2. Physical Design Discipline / Regularity

The VLSI community has largely forgotten the second component

Page 22: The SiLago Method: Next Generation VLSI Design Automation

London

Manhattan 

Page 23: The SiLago Method: Next Generation VLSI Design Automation

A grid based structured layout scheme

1 2 3 4

6 5

7

8

Traditional SOC

9

SiLago Fabric based SOC

InnerModem

Protocol Processing Streaming Storage

DataStorag

e

System Ctrl

Program

Storage

DRAM CTRL

Flash CTRL

Ethernet

PLL/CGUPMC

InnerModem Outer

Modem

OuterModem Flexilators

Physical Design Regularity is the sword that can slay the demons of VLSI Design Complexity

Page 24: The SiLago Method: Next Generation VLSI Design Automation

The SiLago Method

24

Ahmed Hemani, Nasim Farahini, Syed M.A.H. Jafri, Hassan Sohofi, Shuo Li and Kolin Paul, ”TheSiLago Solution: Architecture and Design Methods for a Heterogeneous Dark Silicon Aware CoarseGrain Reconfigurable Fabric”, Chapter 3 in the book “The Dark Side of the Silicon” Springer, DOI10.1007/978-3-319-31596-6

Page 25: The SiLago Method: Next Generation VLSI Design Automation

The SiLago Concepts

A Virtual GRID

All SiLago design objects are alligned with grid lines

And occupy multiples of contiguous grid cells

Grid has not pre‐determined size, it is as big as the synthesis tool decides or the designer decides

Protocol Processing Streaming Storage

DataStorage

System Ctrl

ProgramStorage

InnerModem

OuterModem Flexilators

DRAM CTRLFlash CTRL

Ethernet

PLL/CGUPMC

REGIONS

A grid is divided into regions

Each region is specialized in a type of functionality

Some regions are infrastructural while others are functional

Regions are separated by corridors to accomodate NOCs to connect the regions

Each region has its own internal interconnect scheme.

SiLago Blocks

Each region is occupied by SiLago blocks that are region specific

These SiLago blocks occupy one or more contiguous grid cells

SiLago blocks are hardened and characterized with post layout data

SiLago blocks absorb, global nets including power grids, clock grid and connect to the neighbouring SiLago blocks by abutment

25

Page 26: The SiLago Method: Next Generation VLSI Design Automation

InnerModem

The SiLago Concepts

Protocol Processing Streaming Storage

DataStorage

System Ctrl

ProgramStorage

DRAM CTRLFlash CTRL

Ethernet

PLL/CGUPMC

InnerModem

OuterModem

26

OuterModem FlexilatorsN

OCs

NOCs

This is a SiLago Design Instance

It is automatically generated by the SiLago Syntheses tool chain

Number, size and position of regions vary from one instance to another

Page 27: The SiLago Method: Next Generation VLSI Design Automation

SiLago Interconnects are also hardened

The SiLago interconnects are not just logical interconnect, i.e., soft.

They are physical and electrical objects in a templatized or parametric manner

27

Page 28: The SiLago Method: Next Generation VLSI Design Automation

SiLago fabrics are composed by abutment

1. SiLago blocks absorbsa) Clock Tree & Power Ringb) Absorbs regional and global interconnectc) Pins on the periphery at right positions

2. Fabric Composition by abutment

28

Block 1

Block 2

Page 29: The SiLago Method: Next Generation VLSI Design Automation

SiLago Platform Cost Metricsare Space Invariant

Power RingSiLago BlockPower Ring Power StripesSiLago Block

29

1. 16 global wires in each cell varies by about 70% from cell to cell

2. This variation is a proof that even if it is hierarchical design, the cost metrics would vary

Power Stripes

1. The SiLago physical design discipline ensures that all wires are of exact same length

Page 30: The SiLago Method: Next Generation VLSI Design Automation

Clocking & STA• Clock

– Three levels of clocking: local, regional and global– Local

Each SiLago block is hardened to be timing clean and synthesized with a certain margin for skew and latencyThe Local Clock is synthesized using standard EDA flow

– RegionalEach Region is a synchronous region and the regional clock is manually synthesized to have sufficient buffers to maintain good edge and the delays balanced to keep the skew and latency within the margins of the local clock

– GlobalRegions communicate with each other on latency insensitive basis using a previously developed GRLS scheme. For more details seehttp://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6507330&tag=1

• STA – Static Timing Analysis– ILMs are created for each SiLago blocks– Once the regional clocks are synthesized and inserted back into the data base, a 

hierarchcial STA script is run to ensure that the entire design is timing clean.

30

Page 31: The SiLago Method: Next Generation VLSI Design Automation

Characterization1. Each SiLago block is hardened2. Sufficiently exhaustive simulation is performed for 

molecules of SiLago blocks at gate level with post layout data back annotated

– The SiLago blocks cannot be too large and complex– The same pipeline cannot be used for multiple 

operations3. Concurrent operations within and neighbouring 

SiLago blocks weakly couple and we model this coupling

4. The NOCs are parameterically hardened

31

Page 32: The SiLago Method: Next Generation VLSI Design Automation

The SiLago Proof of Concept

How are the ‐architectural design decisions made ?

32

Page 33: The SiLago Method: Next Generation VLSI Design Automation

Target Application Domain: Modems & Codecs

ADC FDEC DEC RRC ↓2

EQFilter

CRComp SLICER

CarrierAdaptatin

EQAdaptation

ClockAdaptation

Streaming Functons

Adaptive Functions

System Control System Control Functions

33

1. Streaming DSP functions2. Nearest Neighbour Connectivity3. Rich in address generation functionality

1. Adaptive Functions2. Spatial locality but not nearest neighbour3. Control intensive and non‐deterministic

Outer ModemBit Level Operations absorbed in AGUs

Page 34: The SiLago Method: Next Generation VLSI Design Automation

Proof of Concept SiLago Platform

Data Storage

Program Storage

DiMArch:Streaming Data 

Storage

System Control

Flexilators

Sensors

MemoryControl

PowerMngmt

PLL + CGU

RF/Analog

RF/Analog

DRRA:Streaming DSP

34

Adaptive Functions

Page 35: The SiLago Method: Next Generation VLSI Design Automation

DRRA – Computational FabricDynamically Reconfigurable Resource Array

DPU    

Register File

Sequencer

DPU & Register File Outputs 3 Columns to the Left and and to the Right

And this 3 column window slides

This is only a fragment 22 nm, 100 mm2

10 000 DRRA Cells

Page 36: The SiLago Method: Next Generation VLSI Design Automation

Distributed Memory Fabric – DiMARCH

StreamingRegister Files

ALU

Sequencer

Interconnect fabric

Memory banks

Instruction NOCPacket swtiched

Data NOCCircuit Switched

Page 37: The SiLago Method: Next Generation VLSI Design Automation

Private Execution Partitions

Memory Banks can be clustered to serve as one large bank

Programmed to stream data

Can be connected to clusters in computational fabric

Time Division to Space Division Multiplexing of Resources

Fine Grain Power ManagementComposable and Predictable Systems

Parallelism in computation is matched with parallelism in access to scratchpad memory

Page 38: The SiLago Method: Next Generation VLSI Design Automation

38

1. M.A. Shami, A. Hemani, Address generation scheme for a coarse grain reconfigurable architecture, in 2011 IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP) (2011), pp. 17–24

2. N. Farahini, A. Hemani, K. Paul, Distributed runtime computation of constraints for multiple inner loops, in 2013 Euromicro Conference on Digital System Design (DSD) (2013)

3. M.A. Shami, A. Hemani, Classification of massively parallel computer architectures, in 2012IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW) (2012), pp. 344–351

4. N. Farahini, A. Hemani, Atomic stream computation unit based on micro-thread level parallelism, in 2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP) (2015), pp. 25–29

5. N. Farahini, A. Hemani, H. Sohofi, S.M.A.H. Jafri, M.A. Tajammul, K. Paul, Parallel distributed scalable runtime address generation scheme for a coarse grain reconfigurable computation and storage fabric. Microprocess. Microsyst. 38, 788–802 (2014)

6. M.A. Shami, A. Hemani, Morphable DPU: smart and efficient data path for signal processing applications, in IEEE Workshop on Signal Processing Systems, 2009 (SiPS 2009) (2009), pp. 167–172

7. M.A. Shami, A. Hemani, Control scheme for a CGRA, in 2010 22nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) (2010), pp. 17–24

8. M.A. Shami, A. Hemani, An improved self-reconfigurable interconnection scheme for a coarse grain reconfigurable architecture, in NORCHIP, 2010 (2010), pp. 1–6

9. M. A. Shami, A. Hemani, Partially reconfigurable interconnection network for dynamically reprogrammable resource array, in IEEE 8th International Conference on ASIC, 2009. ASICON’09 (2009), pp. 122–125

10.M.A. Tajammul, M.A. Shami, A. Hemani, S. Moorthi, NoC based distributed partitionable memory system for a coarse grain reconfigurable architecture, in 2011 24th International Conference on VLSI Design (VLSI Design) (2011), pp. 232–237

Page 39: The SiLago Method: Next Generation VLSI Design Automation

+

*‐x0

x1 W1

*+

+

ci

xi xn‐i

+

*‐x0

x1 W1

+

*‐x0

x1 W1

+

Datapath can be clustered to create arbitrary DFG

Register files/SRAMs can be also clustered to create larger and/or more parallel data storage

Sequencers can be organized to create hierarchical FSMs

Datapath

Register File

Sequencer

Switchbox

39

Clustering SiLago blocks is clustering Standard Cells or LUTs in FPGAs

Variations in Function, Capacity and Parallelism created by clustering micro‐architectural operations in SiLago blocks

Page 40: The SiLago Method: Next Generation VLSI Design Automation

One TimeEngineering

Effort

SiLago Design Flow

1. Select Optimal Solution from ML solutions2. Global Interconnect, buffers and control3. Floorplanning

c

Application Model SimulinkL Algorithms

Sampling Rate, Total Latency

Number and types of SiLago blocks + Mapping

SiLago Platform

GDSII Macro Reports

Compose GDSII Macro

FSMD Library

M FSMDs

40

Page 41: The SiLago Method: Next Generation VLSI Design Automation

Abstraction Level

# of Solutions increases exponentially with abstraction gap

RTL/LogicSynthesis

Gates

Physical 

Physical Synthesis

High‐level Synthesis

RTL /‐architecture

Algoritims

Application‐levelSynthesis

The Design SpaceManual: Stick Diagram

, Mead 

Conway, Silicon Com

pilerSystem level SynthesisSystem

Application

The Mead Conway Era

Page 42: The SiLago Method: Next Generation VLSI Design Automation

One

time

Abstraction Level

# of Solutions increases exponentially with abstraction gap

RTL/LogicSynthesis

Standard‐Cell

Physical 

Physical Synthesis

High‐level Synthesis

RTL /‐architecture

Algoritims

Application‐levelSynthesis

System level SynthesisSystem

Application

The Standard Cell EraThe Design Space

Manual

Automated

Standard Cells

Page 43: The SiLago Method: Next Generation VLSI Design Automation

Onetim

eAbstraction Level

# of Solutions increases exponentially with abstraction gap

RTL/LogicSynthesis

Standard‐Cell

Physical 

Physical Synthesis

High‐level Synthesis

RTL /‐architecture

Algoritims

Application‐levelSynthesis

The Design SpaceAutom

aticSystem level SynthesisSystem

Application

The SiLago EraOne

time

Page 44: The SiLago Method: Next Generation VLSI Design Automation

SiLago achieves Global Optimisation

ADC FDEC DEC RRC ↓2

EQFilter

CRComp SLICER

CarrierAdaptation

EQAdaptation

ClockAdaptation

System Control

L:  Algorithms in ApplicationM: Number of ways of implementing each algorithm

Global Area, Energy and latency constraints are specified for the application

44

SiLago: Global Optimisation ‐Min (ML)

Commercial HLS : Local Optimization ‐ min(L)  

Page 45: The SiLago Method: Next Generation VLSI Design Automation

SiLago also automates Interface Synthesis

ADC FDEC DEC RRC ↓2

EQFilter

CRComp SLICER

CarrierAdaptation

EQAdaptation

ClockAdaptation

System Control

45

The interfaces are automatically synthesized depending on the chosen degree of parallelism of algorithms.

Machine translation ensures correct by construction guarantee

SiLago Application Level Synthesis

Page 46: The SiLago Method: Next Generation VLSI Design Automation

What the SiLago Method promises to achieve ?Functio

nal  Ve

rification

Constraints V

erificatio

n: 

Timing/Energy/Pow

er/Area

Chip

Automatic:High‐level, 

RTL / Logic & Physical Synthesis

Logic: Algorithm + RTL + Boolean

System: Multiple applications

Software Design

Architecture Definitionin terms of pre‐designed IPs

Stitch Architecture: Buy and Assemble

System Architecting1. HW/SW Partitioning2. Interface Design3. Memory & Interconnect 

Hierarchy4. I/O Design

46

Page 47: The SiLago Method: Next Generation VLSI Design Automation

Experimental Proofthat the proposed Solution Works

47

Page 48: The SiLago Method: Next Generation VLSI Design Automation

SiLago FSMD Library Development Efficiency

Energy Estimation ErrorSynthesis Runtime (Seconds)100‐1000X Better

0

4e4

8e4

12e4

20e4

16e4

0

200

400

600

1000

800

Physical SynthesisLogic SynthesisHigh‐level Synthesis

SiLago Standard Cell based Synthesis

0

100%

200%

300%

SiLago Standard Cell based Synthesis

1.69 %

48

Page 49: The SiLago Method: Next Generation VLSI Design Automation

Area Overhead

SiLago Standard Cell based Synthesis

And what do we pay for it ?

0

0.2

0.4

0.6

1.0

0.8

1.2

Energy Overhead

SiLago Standard Cell based Synthesis

0

0.2

0.4

0.6

1.0

0.8

1.2

49

Page 50: The SiLago Method: Next Generation VLSI Design Automation

50

Normalized Energy and Area overhead of the Systems generated by the SiLago Design Flow 

0

0.2

0.4

0.6

0.8

1.01.2

1.4

1.6

1.8

2.0

AreaEnergy

Page 51: The SiLago Method: Next Generation VLSI Design Automation

SiLago provides significant improvement in Predictability

51

Energy Estimation Error (%)

100

101

102

103

369 %

223 %

282 %8.3 %

5.9 %

8.1 %

5.5 %185 %

Std. cell based flowSiLago flow

46 %

73 %6.2 %

3.9 %

Page 52: The SiLago Method: Next Generation VLSI Design Automation

Design Space Exploration in SiLago Application‐level Synthesis

0

10

40

20

30

50

60

70

80

JPEGEncoder

WLANTx

LTEUplink

Num

ber o

f Solutions evaluated

 by SiLago

 SLS 90

Seconds

0

25

100

50

75

125

150

175

200

Time requ

red for D

SE

225

52

Page 53: The SiLago Method: Next Generation VLSI Design Automation

00.511.522.53X 1073.5

34

4.5X 106

050

100

150

Sample Interval Number of FSMDs

300

200

100

SiLego Design Space Exploration

53

Page 54: The SiLago Method: Next Generation VLSI Design Automation

Application of SiLago to Neuromorphic Computing

54

Page 55: The SiLago Method: Next Generation VLSI Design Automation

The Evolution of Embedded Systems

Interaction between machine and environment

Static and Finite

Dynamic and Infinite

Page 56: The SiLago Method: Next Generation VLSI Design Automation

Neuromorphic Machines are the answer

Reference: http://www.21stcentech.com/heard‐synapse/

Page 57: The SiLago Method: Next Generation VLSI Design Automation

Implementing Brain in Electronics is non‐trivial 

20 Watts

Riken ‐World’s most efficient supercomputer7 GFlops/watt. BCPNN  140 kWs

Abstract model of CortexBCPNN  1 PetaFlops

Realistically 1 Mega Watts

Page 58: The SiLago Method: Next Generation VLSI Design Automation

What can eBrain achieve ?

2 Kilo Watts20 Watts

~30 MilliWatts 2 Watts

eBrain@KTH

~1 Mega Watts

~1000 Watts

The most efficient Supercomputer

58

Page 59: The SiLago Method: Next Generation VLSI Design Automation

Functional Requirements

BCPNN Requirements

1. Realtime simulation2. 2 Million HCUs3. 1 Petaflops/sec – BCPNN Computation4. 40 TBs – HCU State Storage5. 130 TBs / s  ‐ Bandwidth6. 20 billion spikes / s

Infrastructural Requirements

59

Page 60: The SiLago Method: Next Generation VLSI Design Automation

100 MCUs

10 000

 Con

nections

HCUState 

Memory(20 MB)

MCU State Vector

MCU Row

HCU = MCUs

The BCPNN Computation Model

Input Spike Computation

10 000 Spikes/s

100  × 100 Spikes/s

Support Computation

100 / s

Output SpikeComputation

DelayBuffer

60

Page 61: The SiLago Method: Next Generation VLSI Design Automation

BCU 1 BCU 2

System Controllerto boot, initialize and save/restore the HCU 

State

(a) eBrain: Multi‐chip fabric of BCUs connected by inter‐BCU spike propagation network

Inter‐BCU Spike PropagationNetwork (SPN)

L: Number of BCU ChipsM: Number of H‐Tiles in each BCUN: Number of HCUs in each H‐TileL × M × N = 2 million HCUs

(b) BCU: The Brain Computation Unit is a regular fabric of 1000s of H‐Tiles

Intra‐BCU Spike Propagation 

Interconnect

Clock, Reset, Boot/Configuration, Power Management

H‐Tile

The eBrain System Concept

61

Page 62: The SiLago Method: Next Generation VLSI Design Automation

BCU Logic Chip Organization

BCU Controller Network Interface

Switch for the Inter‐BCU Spike Propagation Network

iSDIN – Incoming Spike Distribution Interconnect

oSDIN – Outgoing Spike Distribution InterconnectOutgoing Spike Dispatcher

Incoming Spike Dispatcher    

BCU Logic Chip ‐ Organisation

H‐Tile 1  

H‐Tile M

H‐Tile 2  

HMWI: H‐Tiles  to HCU

‐State Mem

ory Write InterconnectHM

RI: H

CU‐State M

emory to 

H‐Tiles Re

ad Intercon

nect

......

62

Page 63: The SiLago Method: Next Generation VLSI Design Automation

H‐Tile Organisation

Incoming Spikes Q

ueue & Controller

iSDIN

incoming Spike D

istribution Interconnect

Input Computation Controller

Output Com

putation Controller

Outgoing Spikes Q

ueue & Controller

oSDIN

outgoing Spike Distribution Interconnect

Delay Buffers & Controller for fanout spikes

Scratchpad MemoriesInput Computation

Input ComputationFSM

Input Computation Unit R1 SP FPUs

HCU State Storage Memory Interface

ms Timer

Scratchpad Memories Output Computation

Output ComputationFSM

Output Computation Unit R2 SP FPUs

1 Petaflops

Infrastructural Operations

63

Page 64: The SiLago Method: Next Generation VLSI Design Automation

The SiLago MethodA Structured Physical Design Scheme to enable System‐level synthesis

64

H‐ Tile

H‐ Tile

H‐ Tile H‐ Tile

H‐ Tile

H‐ Tile

H‐ Tile

H‐ TileH‐ Tile

TSVs + Controller

FPUs

SRAMs

H‐Tile Controller

NOC Interface

Ques + Controller

NOC Corridor

NOC Co

rridor

Page 65: The SiLago Method: Next Generation VLSI Design Automation

The SiLago MethodA Structured Physical Design Scheme to enable System‐level synthesis

BCU Ctrl

BCU SRAM

PMC

PLLCGU

NOC

NOC

NOC

NOCBCU 

NI+SW

65

Page 66: The SiLago Method: Next Generation VLSI Design Automation

The Basis for dimensioning

Technology22 nm node3D integrated custom DRAM16 X 82 mm2 die integrated on an interposer

Mouse31 250 HCUs71 MCUs and 1225 connections

ResultsPost layout data for Logic40 nm results conservatively scaled to 22 nm nodeQualified circuit level models of 3D DRAM from TU Kaiserslautern

66

Page 67: The SiLago Method: Next Generation VLSI Design Automation

Mouse eBrainPackage Level Organisation

67

InterposerComputation +Infrastructure Interposer

Computation +Infrastructure

InterposerComputation +Infrastructure Interposer

Computation +Infrastructure

Interposer based package level inegration

16 X 82 mm2 chip with 8 layers of 3D DRAM and32 channels per chip

32 H Tiles of 2.52 mm2

16 HCUs / H Tiles

Page 68: The SiLago Method: Next Generation VLSI Design Automation

68

Organisation and dimensions of H‐Tile

Layer 0

Layer 7Bank 0: 64 Mb

Bank 1: 64 Mb

TSV AreaTSV Area

RIB

Column

Column

RIB

1200m

584 m

200 m

500 m

4 HCUs per bank, 2*8 Banks per layer 64 HCUs per H‐Tile

Page 69: The SiLago Method: Next Generation VLSI Design Automation

Energy Consumption

Computation4.032 Joules

Infrastructure1.814 Joules

DRAM6.912 Joules

+9.878 Joules

Sparse activity, temporal locality, low resolution

~2 Joules

Page 70: The SiLago Method: Next Generation VLSI Design Automation

The SiLago Method also has the potential 

to lower the Manufacturing cost

70

Page 71: The SiLago Method: Next Generation VLSI Design Automation

InnerModem

SiLago can reduce the mask development cost

Protocol Processing Streaming Storage

DataStorage

System Ctrl

ProgramStorage

DRAM CTRLFlash CTRL

Ethernet

PLL/CGUPMC

InnerModem Outer

Modem

71

OuterModem Flexilators

All SiLago designs are composed of a finite number of SiLago block Types

All SiLago blocks can only have a finite types of neighbors

Each SiLago blocks’s mask depending on the neighbor types can be saved as a component mask

The entire design mask can be composed from such component masks

Page 72: The SiLago Method: Next Generation VLSI Design Automation

72

Future & Ongoing Work

SiLago Regions are being expanded to cover the 13 dwarfs of the Berkeley report on parallel computing

Extending Application Level Synthesis to System Level SynthesisAbility to deal with non‐determinism

Using SiLago Method to design1. Complex Radio Systems – project with Catena2. Custom Supercomputer for brain simulation and 

bioinformatics3. Resilient autonomous systems based on neural networks

Extending SiLago to 3D SiLago to achieve end‐to‐end parallelisms

Page 73: The SiLago Method: Next Generation VLSI Design Automation

Thanks for your attention !Questions ?