44
Institut für Integrierte Systeme Integrated Systems Laboratory VLSI II: Entwurf von hochintegrierten Schaltungen 227-0147-00 Dept. Informationstechnologie und Elektrotechnik 7. Semester Prof. Dr. H. Kaeslin, Dr. N. Felber Training 1: SoC Encounter for Designers II Beat Muheim Frank K. Gürkaynak Ausgabe: 25. Oktober 2011 Betreute Übungsstunden: 1. November 2011, ETZ D61.1 8. November 2011, ETZ D61.1 Erinnerung: Mit der Bearbeitung dieser Übung erklären Sie, dass Sie die Regeln für die Verwendung von CAD-Software an der ETH Zürich kennen und beachten. Diese Regeln können Sie jederzeit nachlesen unter http://www.dz.ee.ethz.ch/en/our-range/regulations.html.

Training 1

Embed Size (px)

Citation preview

Page 1: Training 1

Institut für Integrierte Systeme

Integrated Systems Laboratory

VLSI II:Entwurf von hochintegrierten Schaltungen 227-0147-00

Dept. Informationstechnologie und Elektrotechnik 7. Semester

Prof. Dr.H. Kaeslin, Dr. N. Felber

Training 1:SoC Encounter for Designers II

Beat MuheimFrank K. Gürkaynak

Ausgabe: 25. Oktober 2011Betreute Übungsstunden: 1. November 2011, ETZ D61.1

8. November 2011, ETZ D61.1

Erinnerung:

Mit der Bearbeitung dieser Übung erklären Sie, dass Sie die Regeln für die Verwendung von CAD-Softwarean der ETH Zürich kennen und beachten. Diese Regeln können Sie jederzeit nachlesen unterhttp://www.dz.ee.ethz.ch/en/our-range/regulations.html.

Page 2: Training 1

1 Overview

Unlike other exercises in the VLSI lectures, the back-end design flow requires you to learn how to use acommercial Electronic Design Automation (EDA) tool, in our case SoC-Encounter from Cadence DesignSystems. These exercises are therefore called ’Trainings’ and will teach you the basics of SoC-Encounterso that you can use it for your semester projects.

There will be three trainings:

• Training 1

Floorplanning, placement, clock tree synthesis, optimization, routing and timing analysis with SoC-Encounter.

• Training 2

Determining power consumption, IR drop analysis.

• Training 3

Tape-out preparation, performing Design Rule Check (DRC) and Layout Versus Schematic (LVS)on your final database.

Students who plan to work on an ASIC semester project should make sure to visit all three trainings.

Parts of the text that have a gray background, like the current paragraph, indicate steps required tocomplete the exercise.

2 Introduction

In this training we will start with a structural verilog design netlist (from synthesis) and create step bystep a physical layout that can be manufactured. To keep runtimes reasonably low, we will use an exampledesign with a (slightly) lower complexity than most student design projects.

2.1 Example Design

The example design is based on the FIR filter that we have been using in the past exercises. The filterhas been changed to include several pipelined filter stages as shown in the block diagram below1.

1The filter is basically useless and has only been engineered as an example circuit suitable for the exercise.

2

Page 3: Training 1

16

3216

48

4848

48

48

LUT

filter_stage8

16

3216

48

4848

48

48

LUT

filter_stage1

16

3216

48

4848

48

48

LUT

filter_stage2

16

3216

48

4848

48

48

LUT

filter_stage3

16

3216

48

4848

48

48

LUT

filter_stage4

filter

’0’

top

chip

DataInxDI

DataInReqxSI

DataInAckxSO

ResetxRBI

ClkxCI

ScanEnxTI

RamTestxTI

DataOutxDO

DataOutAckxSI

DataOutReqxSO

RamRDxD

RamWDxD

RamAddrxD

r256x72tb300xoSY180_2048X16X1CM8

Each filter stage contains a large multiplier, a look-up table and an accumulator. Note that the input ofthe first stage is tied to constants and therefore greatly simplified. The following is a short description ofall pins of the circuit:

Pin Descriptions

Name Bits Dir Description

ClkxCI 1 In Clock input

ResetxRBI 1 In Reset input, active low signal, 0: Reset

ScanEnxTI 1 In Scan Enable for testing, 1: Scan

RamTestxTI 1 In Ram bypass control, 1: Test (RAM bypassed)

DataInxDI 16 In 16-bit data input

DataInReqxSI 1 In Request signal for data input

DataInAckxSO 1 Out Acknowledge signal for data input

DataOutxDO 16 Out 16-bit data output

DataOutReqxSO 1 Out Request signal for data output

DataOutAckxSI 1 In Acknowledge signal for data output

3

Page 4: Training 1

3 Getting Started

You will need a terminal program to type in commands throughout this exercise. In the computersin the ETZ D61.1 you can get a terminal by accessing the menu on the top left corner and selecting"Applications . Accessories . Terminal".

Change to your home directory and install the training files with the script provided:

cd ~/home/vlsi2/t1/install_t1

Change to the design directory

cd training_1

The copied files and folders are arranged in a certain structure which is described in the next section.

3.1 Directory Structure

The following figure shows the directory structure for a design directory that was created by the cockpittool developed by the Design Zentrum (DZ) of ETH Zurich.

.cockpitrc

calibre

encounter

modelsim

simvectors

sourcecode

synopsys

tetramax

out

save

scripts

src

tech

design Configuration for the cockpit

Final layout, DRC and LVS

Simulation tool

Stimuli and expected responses

VHDL sourcecode

Synthesis environment

Test vector generation, test coverage

Final output files: netlist, layout, timing (Verilog,GDSII, SDF)

Save files for Encounter (Encounter native format)

Example scripts, run scripts (TCL)

Input source files: netlist, constraints, io placement

Links to technology files, etc.

sample Sample input files

lef

lib Links to timing libraries

Links to absracts and technology

docs Links to documents

In this structure, there are five subdirectories for SoC-Encounter. It is strongly recommended to use themin the following way:

out Place all final data to be exported from SoC-Encounter in this directory. This includes the finalnetlist (the initial netlist gets modified by clock tree insertion, optimization etc.), layout and delayfiles that will be used for postlayout simulation and/or physical verification and chip finishing. Asample script that generates all these files is provided (scripts/exportall.tcl).

4

Page 5: Training 1

save Put all SoC-Encounter save files, i.e. files in native SoC-Encounter format, in this directory.

scripts Contains TCL scripts. By default several example scripts for common tasks are provided. It ishighly recommended to develop a run script that contains all the commands used for your design.

src All user input files should be placed here. These include the initial verilog netlist, the I/O placementfile, timing constraints file and clock tree definition file (all will be explained later in section 3.2).

tech Holds links to technology specific files. Cockpit manages this directory automatically.

3.2 Input Files

The input files required for back-end design with SoC-Encounter can be divided into two categories:

• Design files that describe (or are closely related with) the circuit, first of all the verilog netlist ofour synthesized design.

• Technology files that describe the technology itself as well as libraries of standard building blocksimplemented in this technology.

Let’s start with the first category.

3.2.1 Verilog Netlist

The verilog netlist we obtain from synthesis contains standard cells, functional I/O pads and their inter-connection information. While the functionality including scan circuitry is already complete, some specialcells are still missing:

• Supply pads to provide power and ground to the core (pads ’VCCKD’ and ’GNDKD’) and to the padframe(pads ’VCC3IOD’ and ’GNDIOD’).

• Corner pads that need to be placed in the corners of the padframe to complete the power linesrunning inside the padframe (pad CORNERD).

Due to the arrangement we have with our ASIC manufacturer, student designs are strictly limited insize. As a consequence at most 48 pads (not including the 4 corner pads) can be placed in the padframe.Furthermore, to ease chip testing on the ASIC tester two predefined power schemes have been established:

1. 40 signal pads, 8 supply pads (recommended for normal designs)

2. 32 signal pads, 16 supply pads (extra power pads for fast designs)

Take a look at the following web page for an illustration of the power schemes and to obtain furtherinformation on constraints for the semester design projects.

http://www.dz.ee.ethz.ch/en/information/ic-technologies/umc/180/mini-asic-setup.html

With all this information we are now ready to add the missing corner and supply pads to our verilognetlist.

5

Page 6: Training 1

A typical verilog netlist that you will obtain from synopsys will contain many levels of hierarchy. Eachlevel of hierarchy is enclosed between the

module name ( pin names separated by comma )...

endmodule

statements, where ’name’ refers to the name of the module (module is the verilog equivalent of an entityin VHDL). In our case we need to add the pads to the top-level module which contains the rest of theI/O pads. The top-level design is almost always the last module definition in a Verilog file2.

Copy the verilog netlist to ’encounter/src/’ in order to have a clean copy of the initial netlist even ifsynthesis is rerun.

cd encounter/src/cp -p ../../synopsys/netlists/chip.v chip.v.initial

The file ’specialpads.v’ contains four corner pads and 8 supply pads corresponding to the power scheme1. As our design uses power scheme 1 no changes are required to this file. For power scheme 2 we wouldhave to comment out the eight additional supply pads (comments in verilog start with //).What remains to do is to add the contents of ’specialpads.v’ at the right point, i.e. where the otherpads are, to the initial netlist.

Using a text editor a , open ’chip.v.initial’ and find the definition of the top-level module ’chip’ bysearching for:

module chip

Below this declaration you should see lines that instantiate the pads. Insert the contents of’specialpads.v’ at this point. As long as you are in the module body, it does not matter where exactlyyou insert them.

Save the file as ’chip.v’ and exit the text editor.aThere are many text editors you can use. There are terminal based editors (vi, vim, nvi, joe, jed, pico, nano etc.), editors

that are mainly terminal based but have a simple GUI (emacs, xemacs, gvim etc), and GUI based editors (mousepad, gedit,nedit, kate etc). Out of these emacs, vi (and derivatives), and nedit are the most advanced editors.

Remark: In the future you can use a small Perl script to add the specialpads to the initial netlist, i.e.

./insert_specialpads ../../synopsys/netlists/chip.v ./specialpads.v > chip.v

inserts the contents of ’specialpads.v’ into the last module defined in ’../synopsys/netlists/chip.v’and write the modified netlist to ’chip.v’.

2The content of the module needs to be defined before it can be instantiated by a different module. Consequently thetop-level module is the last to be defined, however not all verilog files need to be hierarchical, a design can also be spreadbetween multiple files

6

Page 7: Training 1

3.2.2 I/O File

After the last step our Verilog netlist contains all pads. However there is no information that actually tellsthe tool where each pad should be placed. The pad placement is very important as it directly determinesthe PCB layout3. In our case, we want all designs to share a common power and ground pad locationsso that a single test board can be used on our ASIC tester. For practical reasons we have decided to usea 56-pin package for all designs. So even though the chip has only 48 physical pins, it will be placed in apackage that contains 56 pins4. Depending on the power configuration, a different bonding scheme willbe used. These two configurations can be seen on the following webpage:

http://www.dz.ee.ethz.ch/en/information/ic-technologies/umc/180/mini-asic-setup.html

The cockpit will copy sample I/O files automatically to the src/sample directory5 . All lines startingwith ‘#‘ are comments. The file consists of two sections main sections: ’globals’ and ’iopad’.

(globals[global definitions]

)(iopad

(topleft[pads that are on the top left]

)(left

[pads that are on the left side])

[definitions for other sides])

For us the relevant part is the ’iopad’ section. This part contains eight subsections that define the namesof the pad instances, and their locations in the four sides and four corners. We do not have to touch thecorner specifications6 as they will be the same for all designs. We have to distribute the pads among thefour sides of the chip top, right, bottom, left. If you look at the sample file you will see that for eachpad there is a single line entry in the following form

(inst name="NAME_OF_PAD" offset=OFFSET_VALUE ) # pin no: PIN_NUMBER

The last part following # is a comment, it is there just for your information. Regardless of the power schemeyou are using, we will use the same 56 pin package as illustrated in the webpage above. The PIN_NUMBERis just a reminder to show which particular location is being defined. The location is specified using theOFFSET_VALUE. SoC-Encounter uses a coordinate system that bases the coordinate (0,0) on the bottomleftcorner as shown in the figure below:

3A good pinout could simplify the routing on the PCB, allow you to use fewer layers and result in less parasitics48 pins will be left unconnected5For this technology there will be four files. There will be two template files chip.io-template and chip-ep.io-template

for the normal and extended power configuration respectively. These files have all the required power connections in place,and the data sections are commented out. There are also two example files that have fictional I/O placement where all pinsare defined.

6topleft, topright, bottomleft, bottomright

7

Page 8: Training 1

top

bottom

left right

0,0

Offset

Side

1

2

3

toprighttopleft

bottomleft bottomright

On the ’left’ and ’right’ side the pads will be ordered from bottom-to-top, and on the ’top’ and’bottom’ side the pads will be ordered from left-to-right. This ordering can be quite confusing, as it isneither clockwise, nor counterclockwise. Therefore the aforementioned comments showing the actual pinnumbers will be very useful.

The ’OFFSET_VALUE’s given in the template represent fixed locations for the given pad. It is very importantthat you do not change these values, as the chip-finishing part will rely on the pads being located exactlyat these locations.

You can assign your pads by writing the name of each pad into the corresponding ’NAME_OF_PAD’. Thename of the pad will be the name of the instance in the verilog file. For example assume that you areusing standard power scheme and your clock signal is assigned to a pad named ’pad_clock’. In yourverilog file you would have the following entry for this pad:

XMD pad_clock ( .I(ClkxCI) [other pin definitions] )

If you now want to place this pad on pin number 54 of your package, you will find the subsection ’top’ inthe I/O file and edit the line for pin 54:

...(iopad

...(top

...(inst name="pad_clock" offset=328.6 ) # pin no: 54...

)...

)

Be careful, do not modify the offset value while you are editing the I/O file7. Since we use a fixedbonding scheme for the power and ground pins, all we need to do is extract the instance names for all our

7Please note that if you used the extended power scheme the pin number 54 would have a different offset (234.36), sincein the extended power scheme, the pin assignment is slightly different.

8

Page 9: Training 1

signal pads and place them by inserting within the appropriate ’inst name=""’ statement correspondingthe ’OFFSET_VALUE’ which corresponds to the desired location.

Preparing the I/O file from scratch can be a lengthy and tedious task. To avoid unnecessary work duringthis exercise we will start with an almost complete I/O file, but before doing so we will describe the fullprocedure recommended when starting from scratch:

1. Start SoC-Encounter and proceed to design import8 by selecting "Design . Design Import". In thisform make sure that the "IO Assignment File" is empty.

2. If everything works well, the design will be loaded. Now we can write out a template file thatwill contain all the names of the pads. Use "Design . Save . I/O File ..." to save an I/O filesrc/chip-sequence.io. You can select the ’sequence’ checkbox, however it is not imperative.What we need is only the names of the pads.

3. Copy the template I/O file ’src/sample/chip.io-template’ to ’src/chip.io’. As noted earlier,this file includes all ’offset=’ statements, and all statements for corner and supply pads.

4. Using a text editor open the files ’src/chip.io’ and ’src/chip-sequence.io’. You need to move the’PAD_NAME’s from the file ’src/chip-sequence.io’ to the correct positions in the file ’src/chip.io’.

5. All entries for data pins in the template file are by default commented out using ‘#‘ character. Donot forget to remove the comment character for the pads you are using.

Now, for this exercise you can start with the almost complete I/O file ’src/chip.io-incomplete’instead of the template file. This file has all the pads placed properly with the exception of the 16 padsof the input bus ’DataInxDI’ which are still missing.

Furthermore the file ’src/chip.sequence.io’ mentioned above has already been generated for you.

The desired I/O assignment is depicted in the figure below and can also be found in the file’src/chip_io.ps’a.

Create the complete I/O file and save it as ’src/chip.io’.aPostscript viewers were very common in the earlier days, you can use gv, kghostview, or evince to view this file

You can use the utility ’src/io2ps.pl’ to generate a postscript file from your I/O file. This utility willalso verify if you have used the correct offset locations in you I/O file, and will report errors. For bestresults, you should also provide the verilog netlist file, which will enable the script to make even morechecks.

./io2ps.pl chip.io chip.v > chip_pin_diagram.ps

The ’src/io2ps.pl’ utility uses a configuration file with the extension ’.pads’. Per default the file’src/io2ps.pads’ will be used. If you are planning to use the extended power scheme, you will have toadd the configuration file ’src/io2ps-ep.pads’ to the command as well.

8Importing the design will be covered in detail in Chapter 4.

9

Page 10: Training 1

1

15

29

43

2

16

30

44

3

17

31

45

4

18

32

46

5

19

33

47

6

20

34

48

7

21

35

49

8

22

36

50

9

23

37

51

10

24

38

52

11

25

39

53

12

26

40

54

13

27

41

55

14 28

42

56NO_CONNECTION

DataInxDI_PAD_9

DataInxDI_PAD_8

DataInxDI_PAD_7

DataInxDI_PAD_6

DataInxDI_PAD_5

pad_gnd_c1

pad_vcc_c1

DataInxDI_PAD_4

DataInxDI_PAD_3

DataInxDI_PAD_2

DataInxDI_PAD_1

DataInxDI_PAD_0

NO_CONNECTION

pad_

vcc_

p1

Data

InxD

I_PA

D_10

Data

InxD

I_PA

D_11

Data

InxD

I_PA

D_12

Data

InxD

I_PA

D_13

Data

InxD

I_PA

D_14

NO_C

ONN

ECTI

ON

NO_C

ONN

ECTI

ON

Data

InxD

I_PA

D_15

Data

InRe

qxSI

_PAD

Data

Out

Ackx

SI_P

AD

Ram

Test

xTI_

PAD

Scan

EnxT

I_PA

D

pad_

gnd_

p1

NO_CONNECTION

ResetxRBI_PAD

ClkxCI_PAD

DataOutxDO_PAD_0

DataOutxDO_PAD_1

DataOutxDO_PAD_2

pad_gnd_c2

pad_vcc_c2

DataOutxDO_PAD_3

DataOutxDO_PAD_4

DataOutxDO_PAD_5

DataOutxDO_PAD_6

DataOutxDO_PAD_7

NO_CONNECTION

pad_

vcc_

p2

Data

Out

Reqx

SO_P

AD

Data

InAc

kxSO

_PAD

Data

Out

xDO

_PAD

_15

Data

Out

xDO

_PAD

_14

Data

Out

xDO

_PAD

_13

NO_C

ONN

ECTI

ON

NO_C

ONN

ECTI

ON

Data

Out

xDO

_PAD

_12

Data

Out

xDO

_PAD

_11

Data

Out

xDO

_PAD

_10

Data

Out

xDO

_PAD

_9

Data

Out

xDO

_PAD

_8

pad_

gnd_

p2

3.2.3 Timing Constraints

Just as for synthesis, we need to specify timing constraints for the backend design with SoC-Encounter.

With decreasing process geometries the impact of placement and routing on timing, power, etc. is steadilyincreasing. Therefore, timing analysis and optimization have become very important in order to arrive ata layout that (still) satisfies all requirements.

As SoC-Encounter supports most of the more common Synopsys commands/constraints it should berather straight forward to create an appropriate timing constraints file based on the constraints used forsynthesis.

There is an example constraint file ’src/sample/chip.sdc-sample’ that contains the most commonlyused commands along with many useful and important comments.

Copy this file to ’src/chip.sdc’ and modify it so that the following constraints get set (and nothingelse!):

• Define a 125 MHz clock

• Specify 3.5 ns input delay for all inputs

• Specify 5.0 ns output delay for all outputs

• Specify an input transition time of 0.8 ns at all inputs

• Specify a 15 pF output load for all outputs

10

Page 11: Training 1

3.2.4 Technology Files

The ’tech’ directory and the two subdirectorys contains technology files that describe the technologyitself as well as libraries of standard building blocks implemented in this technology, i.e. standard cells,pads, RAM/ROM.

• Technology files (UMCL180)

lef/header6_V55.lef Base technology description, defines metal layers, vias, spacing rules, rout-ing

umcL180.capTbl Table used to extract parasitic capacitances and resistances for signal and powerwires.

streamout.map Layer mapping table used when exporting the final layout in GDSII format.

• Library files (standard cells, pads, macro-cells)

lef/*.lef Physical description, shape and allowed orientation of cells, layer and shape of pins, block-ages, antenna information, ...

lib/*.lib Functional description, timing and power information, maximum load/fanout or transition-time allowed, ...

3.2.5 Macro-cells

The macro-cells for the umcL180 process are created using dedicated memory compilers. The specificmemory compiler we have access to is able to create five different types of macro-cells with variouscapacities:

• SU180_ : single-port static RAM

• SJ180_ : dual-port9 static RAM

• SY180_ : single-port register-file10

• SZ180_ : two-port11 register-file

• SP180_ : via programmable ROM

The following parameters are used for the macro-cells:

• wordsNumber of words in the memory

• sub-word sizeNumber of bits within a sub-word of the memory. The sub-word is the smallest unit used for dataaccess in the macro-cell12.

9dual-port memories have two completely independent access ports. At the same time two separate memory addressescan be accessed for both read and write.

10Although the name suggests that the memory is made out of individual registers, it is very similar in design to SRAM.11In two-port memories, the read and write ports are separate, so you can simultaneously read and write. There are

timing constraints for reads and writes to the same address, please refer to the memory compiler manual for details.12In many places this sub-word is referred to as ’byte’. This might be slightly confusing, since a byte is commonly accepted

to be an information unit consisting of 8-bits.

11

Page 12: Training 1

• number of sub-words per data wordThis parameter allows creating multiple sub-words. Each sub-word can be written to separately.For example, A 32-bit RAM can be configured as having a single 32-bit sub-word, or two 16-bitsub-words, four 8-bit sub-words and so on.

• column or block multiplexerThis parameter affects the geometry of the macro-block. This can have significant influence on theperformance of the macro-block. There is no general rule to determine this parameter. Once thememory requirements are known, all possible geometries will be considered and the most suitableone will be determined.

There are several available macro cells, their datasheets can be found under:

/usr/pack/designkits-1.0-ma/umc_L180/faraday/gen/memaker/200901.1.1/datasheet.dz

If none of the available macro-cells suit your needs more can be easily generated on demand. Pleasecontact the Microelectronics Design Center for this purpose.

Our example design uses a single-port RAM named SY180_2048X16X1CM8. This RAM has 2048 words of16-bits each (single sub-word) and a block multiplexer of 8. All necessary preparations to work with thismacro-cell have already been done, so you do not need to do anything additional for this exercise.

4 Importing the Design

Start SoC-Encountera either from your design directory by using cockpit

cd ~/training_1icdesign umcL180 &

or from the encounter directory by issuing the command

cd ~/training_1/encountercds_soc81 encounter

aThis exercise uses version 8.1 of the SoC Encounter. There are newer versions of these software (9.1.x and 10.1.x),however the main principles have not changed much so we will continue to use this version for this exercise, newer versionshave slightly changed GUI elements, and improved capabilities for some functions.

We will now import our design.

SoC-Encounter uses a large configuration file that defines the design and technology files to be loaded aswell as some global settings to be applied.

Cockpit does automatically generate an appropriate sample configuration file src/sample/chip.confthat should be used to start with.

12

Page 13: Training 1

Copy the sample file into the ’src’ directory.

cp src/sample/chip.conf src/

Select "Design . Import Design ..." to open the design import form. This form contains fields forall configuration options. At the bottom of this window, there are buttons to load and save theconfiguration from/to a file. Use the ’Load ...’ button to load the configuration file we have just copiedto the ’src’ directory.

On the “Basic” tab make sure that ’Verilog Netlist:’, ’Timing Constraint File:’ and ’IO Assignment File:’match your design. ’Common Timing Libraries:’ and ’LEF Files:’ should already be correct.

On the “Advanced” tab the only setting you might want to adapt for your design is the “Default DelayPin Limit:” in the category “Delay Calculation”. We will explain this a bit later.

Once you are happy with the configuration don’t forget to save your changes to the configuration file.

Click ’Ok’ to import your design. Monitor the messages on the console for errorsa.Pay attention to the messages where the timing constraint files is loaded (“Reading timing constraintfile”) to see if everything was accepted! If there are errors, you need to fix them!

aYou can ignore warnings (SOCLF-58), (SOCLF-200), (TECHLIB-436), (SOCSYC-2), (EMS-27)

We are now in the floorplan view of SoC-Encounter which displays an empty floorplan with only thepads placed. All top level module(s) of the netlist are shown as a pink/purple square to the left and allmacro-cells to the right. Note that all standard cells are inside the module(s).

13

Page 14: Training 1

5 Floorplanning

Now we will have to decide how cells and macro-cells will be placed on our chip. This process is calledfloorplanning. For a standard design, our main concern would be to find a floorplan that will result inthe smallest possible area, while fulfilling all performance and reliability requirements. This is purelydriven by economical reasons, since chip costs are mainly determined by the area. In some cases there areadditional geometrical constraints. The manufacturing company may impose certain limits to the aspectratio of the final layout13, or even dictate the maximum height or width of the layout.

Back-end design is not only used for complete chips. Macro-cells that will be part of a larger system-on-chip design can also be designed in this way. In such cases there might be even more restrictions. Forexample, certain metal layers might be reserved for the system level.

So the question is, “How small can my layout be so that I am still able to fulfill all specifications? ”. As alower bound, you will need enough area to place all your I/O pads and standard cells. Ideally, in terms ofarea (and assuming your design is not pad limited, see exercise 2), you will want to place standard cellswithout leaving extra space in between, completely filling out the core area. This is hardly ever possiblebecause:

13Especially in MPW runs, a lot of silicon area is wasted if all designs have wildly different dimensions.

14

Page 15: Training 1

• The number of interconnections that can pass through a certain area is limited by the numberof metal layers available14, wire width and minimum spacing requirements. Depending on theinterconnection overhead, the area above the cells15 may not be sufficient for routing.

• Timing is greatly affected by the placement of your cells. Placing them next to each other withno space in between not leave the tool any flexibility in placing cells. This in turn reduces theoptimization options of the tool, like the ability to cluster cells that are closely interconnected.

• All designs require power routing for operation. Some wires of the power connection limit wherethe cells can be placed, or restrict signal routing which in turn increases the area requirement.

• The majority of designs require a clock tree to function. This clock tree is added during the back-enddesign. This requires additional area for the buffers used in the clock tree. Furthermore, the clocktree synthesis algorithm can produce better results if it has more freedom to place its buffers.

• Macro-cells, like the RAM in our example, usually require some extra space along the edges so thatthey can properly be connected to power and signal lines.

• Designs that have a high switching activity require a lot of current for a short time which is calleda surge. The power distribution network may need additional decoupling capacitors to store somecharge that can provide some of the current of the standard cells during such a surge. Additionalspace for these decoupling cells may be required during placement.

As a consequence, the standard cell rows (which form the core area) can not be filled completely withstandard cells, in other words there needs to remain some free space in between cells.

Utilization indicates to what amount the standard cell rows are filled. 100% utilization is the upper boundwhere all cells are abutted and there is no extra space, while a utilization of 50% means that half of thecore area is empty.

Usually, it is not possible to predict whether or not it is possible to fulfill all requirements with a certainutilization16. You will have to try and find out. This is the main reason why back-end design is aniterative process17.

5.1 Semester Projects

The MPW provider used for the semester projects offers modules caled Mini Asic (mini@sic) with a sizeof 1379.5 µm × 1379.5µm. Therefore, the chip size for the semester project ASICs is fixed.

Please refer to the following web page to learn the details.

http://www.dz.ee.ethz.ch/en/information/ic-technologies/umc/180/mini-asic-setup.html

As a consequence, we only have to make sure that our design fits on this area, and there is no need tofind the smallest possible layout. We may however need to constrain the core area to make it smaller ifthe utilization is to low, since a spread out design has longer interconnections that may adversely affecttiming.

14For our technology there are 6 metal layers.15Cells in our technology use mostly the lowest metal layer Metal-1 and very rarely the Metal-2 for internal connections,

all other layers are free for routing.16Both placement and routing are separately NP complete problems, without completing the routing and placement you

will not know if it is possible to fulfill the requirements.17Obviously, technology plays an important role, and it is possible to give certain guidelines for a technology. However,

backend design is always highly dependent on the design itself. You will usually see in a few iterations what is possible andwhat is not.

15

Page 16: Training 1

5.2 Sketching a Floorplan

Before we go on with SoC-Encounter we need to make some planning and understand some key concepts.The figure on the following page is an example floorplan (not a very ideal one) that shows the importantconcepts.

In SoC-Encounter die area corresponds to the total silicon area available to place pads (excluding bondingarea for this technology) and core cells. For the semester projects this is strictly limited to 1379.5 µm× 1379.5 µm. All pads (I/O, power and corner) are placed in what is known as the padframe. Theremaining area can be used for the core of the chip. For semester projects the theoretical maximum forcore area is 1099.26 µm × 1099.26µm = 1.21 mm2.

As can be seen from the figure, the core area is surrounded by a core power ring. In its simplest formthis consists of two (one for VCC, one for GND) wide18 metal lines that evenly distribute the power allaround the chip. In order to leave room for the power ring, we need to leave a certain I/O to corespacing.

The standard cells are designed in such a way that, when placed next to each other their VCC and GNDpins can be connected with a horizontal power line. These horizontal lines are then extended to the corepower ring. These power connections are relatively narrow (0.76 µm in the technology that we use) andrun over the entire width of the core area. This could be a problem for designs that consume much power,since the cells towards the middle would not have a good power connection19. To improve this, verticalpower stripes that connect to the horizontal power lines can be added, thereby forming sort of a mesh.

The core area is filled with standard cell rows on which later all standard cells will be placed. In thesame area we will usually also need to make room for our macro-cells. Most macro-cells need some ’freespace’ around themselves. This free space is required to make signal connections, add a block power ringaround the macro-cell or simply to prevent standard cells from being placed too close to the macro-cell.We will define a block halo to specify this free space.

When placing a macro-cell, you should also take into account where the power and signal pins of theblock are located and what metal layer they are on. Often signal connections are only on two edges andyou want them to face the core and not the I/O pads.

Now, when we consider all the above, the core area that remains free to place core cells on is much smallerthan the 1.21 mm2 that we started with. Our example design has a total cell area (including RAM) of0.82 mm2 and should therefore comfortably fit into the designated area.

18The width of the metal line depends on the amount of current drawn from the line, you will be able to judge this betterafter exercise 3 which is dedicated to estimating the power consumption. We will mostly use a width of 20 µm, since this isthe widest metal that can be manufactured without slotting (wider metal lines require slots/holes which break up the metalshape).

19The problem is that if much current is drawn, there will be a significant IR drop along the power lines. The cells in themiddle will be supplied with a lower VCC than the ones on the sides. This could dramatically effect the performance of thesystem.

16

Page 17: Training 1

Standard Cell Row

I/O and Corner PadsPlaced on the Padframe

Macro Cell(RAM)

Core Power Ring

VDD

GND

Power Stripe

Block Power Ring

Block Power ConnectionBlock Halo

Standard Cells

Power Pad ConnectionsI/O to Core

Spacing

1379.5 µm1099.2

6 µ

m

Standard Cell Power Connections

5.3 Initialize Floorplan

We are now ready to proceed with SoC-Encounter.

From the menu select "Floorplan . Specify Floorplan...". A large window will open.

Select the ’Die Size by: Width and Height’ option and make sure that both values are 1379.5.

Now we need to specify the I/O to core spacing by filling in the four values under the ’Core Margins by:’entry. There must be sufficient room for the power ring around the core area. Larger values will reducethe area available to place the core cells thereby increasing core utilization.

As noted earlier, some iterations are usually required to find optimal values for a particular design.

In this exercise we will assume that we will use one VCC and one GND line of maximum width 20µm.We need some extra space between the lines and, for the moment, we can start with a distance of 45 µmfor all sides and click on ’OK’.

17

Page 18: Training 1

The floorplan should now look like shown in the screen-shot below. Note that the pads are all placed attheir proper locations as the I/O file used during design import specifies absolute locations and we madesure that the die size stays fixed to the proper size during the initialize floorplan step.

18

Page 19: Training 1

Next we need to place the RAM macro-cell. Change the cursor mode to ’Move/Resize/Reshape’ byselecting the appropriate icon (next to the ruler icon) or use the keyboard shortcut ’SHIFT-R’. Now youcan select the RAM macro-cell and drag it to any location you like. The blue lines displayed are socalled flightlines that show where the signal connections to the block are.

You can change the orientation of the RAM by either using "Floorplan . Edit Floorplan . Flip/RotateInstances ... " (or press ’r’), or with the attribute editor (press ’q’). Note that the RAM macro willcompletely block Metal-1, Metal-2, Metal-3 and Metal-4. Only Metal-5, Metal-6 will be available forrouting over the RAM macro-cell20.

5.4 Power Planning

The next step is to create the power distribution network.

The verilog netlist that we started with does not contain any power connections, therefore we needto create this connectivity now. We have to connect the power/ground pins of all instances to therespective global power/ground net that was specified on the "Design Import" form (category ’Power’ onthe ’Advanced’tab)21.

20By default, the internal structures within a cell or block are not displayed. You need to make “Cell Blkg” visible to seethe so called blockages within a cell.

21There is also a special rule required if there are logic one/zero values 1’b1/1’b0 instead of TIE1/TIE0 cells in yournetlist. You should however not have such logic values in your netlist.

19

Page 20: Training 1

This can be done using the "Floorplan . Connect Global Nets ... " form or you can use the ’globalnet.tcl’script provided.

Execute the script provided by typing on the command line of SoC-Encounter (not GUI):

source scripts/globalnet.tcl

Next we will add the core power rings that distribute power all around the core.

Select the menu "Power . Power Planning . Add Rings...". A large window will appear. The ’Net(s)’field on the top defines for which nets rings will be created. The default is to create power ’VCC’ as wellas ground ’GND’ rings.

In the ’Ring Configuration’ section you can specify on what layers the ring segments will be created.Select metal5 H for Top and Bottom and metal6 V for Left and Right. Specify Width as 20 µm, Spacingas 1.5 µm and Offset as 2µm and click Ok.

There are many alternative power distribution schemes that can be used. The one that we have chosenhere is a very simple one. We have selected the upper metal layersMetal-5 andMetal-6 for the ring, becausein this technology Metal-6 is thicker and consequently has less parasitic resistance which is desirable forpower distribution.

For your own designs, you should perform a power analysis (topic of übung 3) to find out the best powerdistribution approach that matches your design.

20

Page 21: Training 1

The width has been chosen as 20 µm for convenience reasons. Basically the wider the power connection,the better. But as already mentioned earlier, in this technology, metal lines wider than 20 µm need to beslotted (’stress relief slots’) which requires extra effort. As an alternative to slotting it is also possible tocreate several smaller parallel rings, e.g. two ’VCC’ and two ’GND’ rings.

’Spacing’ determines the distance between the two nets and ’Offset’ determines the distance between thecore area and the innermost ring.

We also need a (partial) ring around the macro-cell, you will see later why this is necessary.

Select the menu "Power . Power Planning . Add Rings..." just like before. This time in the ’Ring Type’box, select ’Block ring(s) around’. You can leave the selection at ’Each block’ since we have only oneblock anyway.

SoC-Encounter is usually smart enough to create wires only on the edges where no power lines are yet,i.e. to not create new wires on top of the core ring.

If this fails you can specify the segments and connections you want on the ’Advanced’ tab.

Fill in the values/settings similar to that of the Add Rings and click on ’Ok’.

At any point if you wish to delete part of the floorplan you can:

• use the ’Undo’ feature by simply pressing ’u’

• select and remove objects of a specific class (press ’d’)

• use the menu option "Floorplan . Edit Floorplan . Clear Floorplan..."

• select an object and hit the “Del” key on the keyboard

Also, you can save or load (restore) your floorplan at any time using the menu "Design . Save .Floorplan ..." and "Design . Load . Floorplan ..." respectively.

Save your floorplan to the ’save’ directory.

At this point power is to the standard cells arrives from the sides. Especially for fast designs the standardcells in the middle of the standard cell row will not receive sufficient power it is important to add verticalstripes to improve the power distribution.

Select "Power . Power Planning . Add Stripes ...".

The ’Set Configuration’ part of the window defines the properties of one stripe set.

The ’Set Pattern’ part defines how many stripes will be added. We can either choose to insert a fixednumber of sets or only specify the distance between two sets (’Set-to-set distance:’)

In the ’First/Last Stripe’ part, we select Relative from core or selected area. Add to ’X from left’ and ’Xfrom right’ a value stripe sets in such a way that the standard cell rows get divided into three equallylong pieces. See the screen shot for width, spacing and layer. Note: You can fine tune this later bymoving the stripe sets.

By default stripes will continue over macro cells. To prevent this, select the ’Omit stripes inside blockrings’ option in the ’Stripe Breaking’ section of the ’Advanced’ tab.

21

Page 22: Training 1

It is rather easy to move wires in SoC-Encounter. Click on the move wires button (or press ’m’), selectthe wires you want to move, and drag them to their new location. SoC-Encounter will make sure thatelectrical connections remain intact. If you want you can use this to fine tune the stripe placement.

We still need to define a block halo for the RAM macro-cell. This is necessary to keep standard cellsfrom being placed to close to the RAM and also to avoid problems when routing the power lines of thestandard cell rows.

The figure below illustrates one common problem with the block halo.

Macro-Block

Block Halo

Standard Cell Row

Standard Cell Row

Pow

er

Rails

Dangling Power Line (bad)

Terminated Power Line (good)

22

Page 23: Training 1

In this figure, only two standard cell rows are shown. The block halo around the first row extends farenough to cover the two power lines22. This is like it should be.

For the second row, the block halo does not cover the power rails, and when making the power connectionsSoC-Encounter will try to extend the power connection past the power rails as shown in the figure. Thisleaves a dangling power line23. While this will not render your chip useless, it should be avoided.

From the menu select "Floorplan . Edit Floorplan . Edit Halo...". A window will appear, where you canspecify a keep-out zone for routing and/or placement around the macro-cell.

Usually we only need a “Placement Halo”. The size will depend on your power routing/floorplan.

Create an appropriate “Placement Halo”.

Notice that the I/O pads are placed with some distance between them24. At some point in the design flowwe need to close the gaps between the I/O pads in order to complete the supply rings that run aroundthe core (within the pad cells) and are required to supply the circuitry within of the pad cells.

Instead of using wires, we will place so called ’filler cells’ that completely fill the gaps and establish therequired connectivity.

There is a script that will automatically insert matching filler cells. Type the following in the SoC-Encounter console window

source scripts/fillperi.tcl

Now we need to finalize the power connections of the chip. The following connections still need to bemade:

• The core ring needs to be connected to the core supply pads (VCC3IOD and GNDIOD).

• All standard cells need to be connected to VCC and GND lines.

• All macro-cells need to be connected to VCC and GND lines.22This is just for illustration. It is not possible to draw a block halo that has this (L) shape.23This sort of dangling wires are known as geometry antenna in SoC-Encounter24This is due to the contraints set by the company that bonds the chips. They specify that the minimum distance between

two adjacent pads can be 90µm. Since even a core-limited pad in this technology is roughly 60µ wide, we need to placethem with gaps in between.

23

Page 24: Training 1

Select "Route . Special Route ..." from the menu. SRoute is the special net router, and is only used tomake power connections.

The ’Route:’ part contains the different connection types we have listed above. ’Block pins’ aremacro-cell power connections, ’Pad pins’ are the connections from the core supply pads to the core ring.We will not need ’Pad rings’ since we have already used filler cells to complete these rings. ’Standardcell pins’ will add power lines to the standard cell rows. Finally, if you still have stripes that are notconnected to power (not very likely) you can use the ’Stripes (unconnected)’ option.

While it is possible to route all connections at the same time, it is strongly recommended to do it oneby one:

1. Start with ’Pad pins’. If nothing happens you have most likely forgotten to source theglobalnet.tcl script.

2. Route ’Block pins’. Check the result, did the router connect the macro-cell the way you wanted?If not you may need to study the ’Advanced’ tab of the SRoute window. If all fails you can editthe connections manually.

3. Route the ’Standard cell pins’. This should create many horizontal Metal-1 lines that connect tothe rings and stripes. Look for dangling wires around the block halo (adjust the block halo ifnecessary).

We are now finished with floorplanning. Your floorplan should look similar to the following screen shot.

24

Page 25: Training 1

6 Placement

We will now start with the placement of the standard cells in the core area. Placement is a very compu-tation intensive problem, and mostly heuristic algorithms are used for this purpose.

Select "Place . Standard Cells.. ...".

We want run a full placement and not an incremental or just the quick prototyping one.

"Include Pre-Place Optimization" however is very useful as it removes all buffers/inverters trees from thenetlist which will help us for timing analysis as you will see later.

To set advanced options click ’Mode’. Set "Congestion Effort" to "Low" and deselect "Run Timing DrivenPlacement" as timing driven takes much longer and might not help that much to improve timing. Thereare several other options that you can set, but at this time we will leave them as they are. Apply thechanges by pressing "OK"

You will come back to the placement window seen below, click "OK" to start placement. This may takesome time.

We have to warn you about the various performance related options such as "Congestion Effort" and "RunTiming Driven Placement" above. In the exercises sometimes we will advise you to use certain settingsfor these options in order to reduce runtime, or because for this particular design we have found out thata particular option gives better results. When you do your own designs, you should consider evaluatingwhich options are better suited rather than copying all options from this exercise.

For each standard cell, the placement algorithm will try to find the optimum location so that there is afeasible routing solution and the total length of the connections is minimized.

Examine the placement by using the design browser (switch to the physical view). You will notice thatstandard cells within the same entity are mostly placed next to each other.

The available space and the placement of macro-cells and I/O pads can have a great influence on theplacement of standard cells. Even though more space seems to be a good idea, too much space sometimesresults in placements where the average distance between standard cells and consequently the delayscaused by wire capacitance/resistance become larger. Only experience and several iterations will allowyou to find a placement for your circuit that is close to optimal.

Note: Visibility of ’Special Net’ is turned off in the next screen shot.

25

Page 26: Training 1

The results for placement (and later routing) are strongly design dependent. For example, structures withmany interconnections such as look-up tables will usually need much more space than synthesis predictedas the cells need to be spread out in order to have enough space to route all the interconnections. This iswhy generalizations for back-end design, such as "During back-end design, your circuit area will increaseby 10%" don’t work very well.

Let us save the entire design with "Design . Save Design As . SoCE". This will save the configurationfile, netlist, floorplan, special route, placement and routing files as well as the current mode, optionsand preferences. A design saved in this way can be restores using "Design . Restore Design ... . SoCE".

The space required is surprisingly small as most files are compressed and the library files do not getsaved along with the design.

Remember to save under the save directory.

Alternatively you could also just save the placement. Select Design . Save . Place ....

During synthesis, Synopsys Design Compiler assigns constant logic values to two special standard cellsnamed TIE0x and TIE1x, where x is a drive strength modifier. This creates a small inconvenience, asoften one of these cells is assigned to drive many outputs at the same time, creating relatively longinterconnections.

There is sufficient place on the chip to place several of these cells. We will use a script that first removesall these cells. Then we will set the rules for placing these cells. The example script scripts/tiehilo.tclsets the maximum number of connections driven by a single cell to 10, and the maximum distance betweenthe pin and the tie cell to 100 µm. And finally we insert the tie cells according to the rules we have defined.

26

Page 27: Training 1

At the command line type:

source scripts/tiehilo.tcl

7 Timing

The synthesis tools we currently use for HDL synthesis (Synopsys DC Shell/Design Vision) are not awareof any instance placement information. Therefore the interconnects can only be estimated based on astatistical model, i.e. the fanout of a net determines its length, capacitance, resistance and area. Now thatthe placement and even trial-routing is available the timing might differ considerably from the numbersobtained from Synopsys.

7.1 Analysis

SoC-Encounter has a very practical timing analysis function, where you usually only have to specify thestate of the design (see below) and the "Analysis Type" (Setup or Hold) you want to run.

Pre-Place design is not placed

Pre-CTS design is placed but clock tree is not yet inserted

Post-CTS design is placed and the clock tree is inserted

Post-Route design is placed and routed

Sign-Off will use extra tools for even more precise analysis. We will not use this as these tools are notinstalled/setup.

Depending on this state, trial route (a very simple, but fast routing) and/or parasitic extraction mightbe run automatically prior to the timing analysis. This will improve the accuracy and help to avoidunnecessary iterations.

Open "Timing . Analyze Timing" and make sure "Pre-CTS" and "Setup" is selected.

Start the timing analysis by clicking "Ok".

Note: You could also do this from the command line with

timeDesign -preCTS

As the design is not routed, SoC-Encounter will perform trial route and parasitic extraction before doingthe timing analysis. A short summary will be displayed on the console (the actual numbers may differslightly):

27

Page 28: Training 1

------------------------------------------------------------timeDesign Summary

------------------------------------------------------------

+--------------------+---------+---------+---------+---------+---------+---------+| Setup mode | all | reg2reg | in2reg | reg2out | in2out | clkgate |+--------------------+---------+---------+---------+---------+---------+---------+| WNS (ns):| -7.815 | -5.368 | -7.815 | -0.582 | -7.110 | N/A || TNS (ns):| -2113.3 | -1239.7 | -1969.2 | -1.269 | -38.582 | N/A || Violating Paths:| 757 | 708 | 375 | 8 | 6 | N/A || All Paths:| 1811 | 1344 | 819 | 18 | 6 | N/A |+--------------------+---------+---------+---------+---------+---------+---------+

+----------------+---------------------------+--------------+| | Real | Total || DRVs +--------------+------------+--------------|| |Nr nets(terms)| Worst Vio |Nr nets(terms)|+----------------+--------------+------------+--------------+| max_cap | 135 (135) | -3.518 | 136 (136) || max_tran | 370 (14467) | -7.767 | 388 (14485) || max_fanout | 0 (0) | 0 | 0 (0) |+----------------+--------------+------------+--------------+

Density: 78.864%Routing Overflow: 0.00% H and 0.23% V------------------------------------------------------------

The summary gives a very good overview of the current design timing. Some explanations:

• The analysis was run in setup mode, i.e. setup time checks were performed but no hold time checks.

• The columns contain numbers for all path in the design ("all") or for specific path groups, e.g.reg2reg for all register to register paths.

• Worst negative slack (WNS) reports the slack for the most critical path. Negative numbers meanthat the constraints are violated by this value.

• Total negative slack (TNS) is the sum of WNS for all violating paths. Together with the numberof violating paths this figure helps to see how severe the violations are.

• Real/Total DRV show (electrical) design rule violations, some libraries have a maximum transitiontime for all nets. The report above shows that 370 nets have a transition violation (the signal takestoo long to change from logic-1 to logic-0 or vice versa). In addition 135 nets have a maximumcapacitance violation (the total amount of capacitance driven by a net exceeds the limit set bythe design library). These violations are mostly related to excessive parasitic capacitance due tointerconnections, and generally cause timing violations as well. However, even if a DRV does notcause a timing violation it needs to be fixed.

• "Density" and "Routing Overflow" show the placement utilization and routing resources, i.e. are ameasure for the feasibility of the current floorplan/placement.

Remark: Refer to exercise 4 of VLSI I25 if you have problems with timing concepts.

The summary looks really terrible. Obviously we have many timing violations that we need to have acloser look at, before we try to optimize the timing with SoC-Encounter.

Here are some important points to consider when doing so:25You can access the exercise descriptions, files, and solutions under /home/vlsi1/u4.

28

Page 29: Training 1

• The timing depends entirely on the constraints you have specified in the file src/chip.sdc. Themost common mistake is to have errors in this file. Before you go any further make sure that yourtiming constraints are correct.

• Make sure to not accidentally use constraints that were written for the core level (chip withoutpads) at the chip level (with pads) and vice versa. The pads affect the I/O timing quite a bit andthe drive capabilities of a standard cell and an output pad are entirely different, i.e. ’set_load’needs to be very different.

• Inputs and outputs used for test and debugging may cause timing violations. Most of these signalsare not dynamic (they are not toggled during normal operation) and the timing paths originatingfrom these inputs or ending at these outputs should be ignored, i.e. left unconstrained or explicitlydisabled.

• To speed up delay calculation SoC-Encounter does not compute the timing of nets with a fanoutabove a certain limit but rather swaps in predefined values for delay, capacitance and transitiontime. All these numbers are specified on the "Design Import" form on the "Advanced" tab in the"Delay Calculation" category. As a result you will not see the real timing26 of these net in timinganalysis and furthermore optimization will not see (and therefore not fix) violations27 on these nets.However, this is usually the desired behavior as we give these nets a special treatment anyway (withCTS).

Let’s now examine the detailed reports that were generated by timing analysis and can be found in the’timingReports’ folder. Each analysis produces multiple files. Among these there are three files dedicatedto design rule violations (max capacitance: *.cap , max fanout: *.fanout, max transition time: *.tranviolations), and separate *.tarpt timing analysis report files for different path groups (in2out, in2reg,reg2reg, reg2out)

Where do the violating paths in the "in2out" path category start?

Where do the violating paths in the "in2reg" path category start?

Do the paths in "reg2out" and "reg2reg" look like normal path that should be optimized to meet timingor is there something wrong?

Why are the "reg2reg" paths too slow? Look for large numbers in the "Delay" column and check thedrive strength of the corresponding cell.

There are several different problems in the .sdc file that we have used. First of all, two of our inputsshould not be considered for timing analysis28. We also have several nets (clock, reset and scan enable)that we will take care of separately (using the clock tree synthesizer, which we will see later). These netswill show up in the DRV reports. We do not want to solve timing related problems for these nets (sincethey will anyway be solved later), the time and effort required to optimize these nets could prevent otherparts of the design to be optimized.

We can use the ’Default Pin Limit’ feature of SoC-Encounter to stop SoC-Encounter from extractingtiming information (and reporting timing violations) for the nets that we will be optimizing later on. By

26To see the real timing you can change the limit on-the-fly from 1000 to a very high value in the console with’setUseDefaultDelayLimit 100000’. More on this topic later.

27DRV violations will be fixed but no setup/hold violations. Clock nets are even more special, also no DRV fixing will bedone there.

28SoC-Encounter provides a special timing calculation mode that is called Multi-Mode Multi-Corner Analysis (MMMC).In this mode it is possible to define several scenarios (i.e. separate test and functional modes). The setup for MMMC isslightly involved and will not be covered as part of this exercise.

29

Page 30: Training 1

default the pin limit of SoC-Encounter is set to 1000. In our case this number is too high (we have slightlymore than 400 flip flops in our design).

Let us see the nets which have a large fanout. Report all nets with e.g. more than 400 pins. Use theconsole command:

report_net -min_fanout 400

Now set a suitable limit with the command

setUseDefaultDelayLimit <number>

so that the high fanout nets will not be considered for timing. Also make the necessary changes tothe timing constraints file ’src/chip.sdc’ to disable the offending input-ports. Reload the timingconstraints by selecting the menu "Timing . Load Timing Constraint ...".

Then rerun timing analysis.

If you have done everything correct, the only setup violations should be in the path group register-to-register and register-to-out. There should no longer be pins that belong to scan enable or reset networkin the transition time violation report.

7.2 Optimization

In order to (better) meet the constraints, SoC-Encounter can try to optimize the design at every stageof the design process. In our case, the worst setup time violation is about 5.8 ns (for a 8 ns period),although the netlist delivered by the synthesis tool had no timing violations. This is due to differencesin interconnect parasitics between the two tools. While the synthesis tool relies on an estimate (statis-tical model based) SoC-Encounter can use the real placement and (trial-)routing at hand. Consider thefollowing line from a timing report (broken down over many lines for readability)

Path 1: VIOLATED Setup Check with Pin i_top/u_filter/u_filter_stage_4/RegxDP_reg_47_/CKEndpoint: i_top/u_filter/u_filter_stage_4/RegxDP_reg_47_/D (v) checked withleading edge of ’ClkxCI’Beginpoint: i_top/u_ram_wrapper/i_ram/DO7 (^) triggered byleading edge of ’ClkxCI’Path Groups: {reg2reg}Other End Arrival Time 0.000- Setup 0.127+ Phase Shift 8.000= Required Time 7.873- Arrival Time 13.685= Slack Time -5.812

Clock Rise Edge 0.000= Beginpoint Arrival Time 0.000Timing Path:+-------------------------------------------------------------------------------------------------------------+| Instance | Arc | Cell | Slew | Load | Delay | Arrival || | | | | | | Time ||--------------------------------------+---------------+--------------------+-------+-------+-------+---------|| | ClkxCI ^ | | 0.000 | 1.828 | | 0.000 ||ClkxCI_PAD | I ^ -> O ^ | XMD | 0.000 | 0.000 | 0.000 | 0.000 ||i_top/u_ram_wrapper/i_ram | CK ^ -> DO7 ^ | SY180_2048X16X1CM8 | 0.115 | 0.026 | 1.739 | 1.739 ||i_top/u_ram_wrapper/i_test_bypass_mux7| A ^ -> O ^ | MUX2 | 8.451 | 1.876 | 3.975 | 5.715 |

30

Page 31: Training 1

The last line reports an standard cell instance MUX2 with low driving capability (2) that has to drive abig load on its output (1.876 pF). The propagation delay is therefore huge (3.95 ns).

The timing of the same cell as reported by synthesis are: Delay: 0.15 ns, Slew: 0.09, Load: 0.01.

While this is an extreme case you see how synthesis can be wrong without knowing the actual placementand wire loads.

Open the optimization form by selecting "Timing . Optimize ...".

"Design Stage" needs to be set to the current design stage. Some options are only available for certainstages, e.g. hold time optimization can not be performed during "pre-CTS" as it doesn’t make much sense.

Timing is not the only thing that can optimized. Most technologies specify design rules like maximumtransition time, maximum capacitance driven by a certain cell or maximum fanout.

After pressing the "Mode" button, within the "Thresholds" section you can find options that can beused to tighten the constraints in order to get some margina.

Set the options as shown in the figure below and hit "OK". Watch the progress of the optimization inthe console window. SoC-Encounter is very verbose with its actions.

aSoC-Encounter will already automatically add a small margin on its own (internally)

During optimization SoC-Encounter can select different drive strengths for cells, add/remove buffers andinverters, move instances or even restructure part of the logic (just like synthesis does).

Optimization is done using iterations of timing analysis, optimization, trial-route and parasitic extraction.

As a last step SoC-Encounter performs a timing analysis on the optimized design, prints the summary tothe console and writes the detailed reports to the ’timingReports’ directory.

Take a look at the summary and the final reports generated. There should be no violations left.

31

Page 32: Training 1

But what happens if we can not fix the violations with optimization? Again, first make sure to understandwhat your constraints are and why they are violated. Often there are errors in converting the designspecifications to constraints (is the input delay really 3.5 ns? Also for this pin?) and describing themproperly with the commands available. If you still have problems, there are three levels where you canreach a solution:

• Optimization during backend design (SoC-Encounter)

SoC-Encounter can optimize the design at every stage of the design process. In general, the earlierthe stage, the more changes can be done, e.g. "Pre-CTS" optimization has much more flexibilitythan "Post-Route" optimization. At the "Pre-CTS" stage registers can be moved and resized, thiswill no longer be possible after clock tree insertion. On the other hand, the parasitic interconnectinformation is much more accurate with later stages of design, so the timing information (and hencethe optimization goals) will be more accurate.

We can (re)run the optimization at various stages, try a new placement or even start with a newfloorplan. It is impossible to give general guidelines, you will have to see what works best for yourdesign. If you are far from meeting your target (e.g. for a 10 ns clock, if after all optimizations youstill have a timing violation of 2 ns), you may need to go back to synthesis.

• Optimization during synthesis

Once you have tried to place and route a netlist you will get a better idea about the relationshipbetween synthesis results and back-end results (area and timing wise). You may use this informationto adjust the timing constraints and re-synthesize the circuit.

• Architectural optimizations

If nothing else helps, you will have to modify your architecture. During this iteration you will havea much better idea about what is critical for your circuit.

If all of the above fails, you will have to see if the specifications could be changed.

Your design has changed considerably as the optimization algorithms have modified the netlist andplacement. Save it by using "Design . Save Design As".

8 Clock Tree Insertion

The fan-out of a net refers to the number of inputs driven by a particular output. High fan-out nets (thatdrive hundreds or even thousands of inputs) need to be handled differently from standard interconnections.Note: For timing analysis we did adjust the pin limit (setUseDefaultDelayLimit) in order to treat themdifferently.

Every synchronous circuit has at least one high fan-out net, namely the clock net. For most circuits resetand scan-enable signals have to be distributed to each and every flip-flop as well.

The main problem with high fan-out nets is the large load capacitance that needs to be driven. Each driveninput adds its own input capacitance to the total load capacitance and in addition, the interconnectionrequired to distribute the signal to all these inputs increases the load capacitance further. There are threeimportant parameters for such nets:

Transition time This is the time it takes to change the logic level of a node (e.g. 0 → 1). Basically,the more load an output has to drive, the more time is required to charge this load. CMOS drivers

32

Page 33: Training 1

consume additional short circuit current during the transition, therefore long transition times arenot very welcome. Furthermore, noise on signals with long transition times can result in glitching.Most libraries set an upper limit for the transition time (for the technology we are using this is 1.79ns for typical libraries). To lower the transition time, a tree of buffers can be inserted so that thetotal load is shared between the buffers. The lower the desired transition time, the more buffers arerequired.

Insertion delay The time required for the signal to travel from the driver to the end-points. This delayis usually different for each end-point. Each level of buffers in the buffer tree will add a delay to thesignal.

Skew The difference between insertion delays of different end-points. To minimize skew, a balancedbuffer tree has to be built. Generally, the lower the desired skew the more buffers are required.

What parameters are most important depends on the type of net:

Clock Our main concern is to reduce the skew, since it will effect our timing. The maximum skew dependson the clock period. As an example, for a 20 MHz clock a clock skew of 0.5 ns is acceptable. Butfor a 200 MHz clock, the same skew equals to 10% of the clock period and would be to high.

If you over-constrain your skew, you will need a deep (and large) clock tree and your insertion timewill rise, which will affect your input and output timing. Therefore you will want to balance theskew against insertion delay and the number of buffers. Constraining maximum insertion delay toolow will usually degrade results.

Usually, a tree that gives you an acceptable skew will also give you a decent transition time, so youdon’t have to worry about that.

Reset We are interested in propagating the reset within one clock cycle to all flip-flops in our design.For designs with on-chip reset synchronization this is strictly required. The insertion delay shouldtherefore be less than the clock period, transition times within the bounds imposed by the technologyand skew doesn’t matter at all.

Scan Enable Very similar to the reset signal. Usually a slower clock is used for scan testing, thereforewe can allow even a larger insertion delay. For transition time and skew the same holds true as forthe reset.

Buf Tran

Sink Tran

Sink Tran

Sink Tran

Sink Tran

Buf Tran

Buf Tran

Min Delay

Max Delay

Max Skew

AutoCTS

Root Pin

33

Page 34: Training 1

In SoC-Encounter, clock tree synthesis (CTS) is used to generate optimized buffer trees to drive highfan-out nets. It can be configured to satisfy a variety of constraints.

A sample clock tree synthesis configuration file can be found under ’src/sample/chip.ctstch-sample’.The sample file contains three different configurations for a clock, a reset and a scan enable signal.

Copy this file to the ’src’ directory and adapt the ’AutoCTSRootPin’ statements to match your design.

For educational purposes, change the clock tree specifications as follows: max. skew 0.2 ns, max.insertion delay 4 ns, max. transition time at buffers 0.6 ns and at clock pins 0.4 nsa

Take a closer look at the other two trees too.aIt is usually not a good idea to specify a small max. insertion time such that this becomes a limiting factor for CTS.

Results may degrade significantly and for most designs the insertion delay is not very important anyway.

If the design employs a reset synchronization register (the example design has one) the source of thereset tree must be the output of the synchronization register. Note that there is a special option named"SetASyncSRPinAsSync YES" for the reset tree definition. This allows set and reset pins to be consideredas targets for the clock tree optimization.

The scan-enable signal is also a special case. Normally the clock tree synthesis algorithm starts at theAutoCTSRootPin and traces through the netlist in order to find valid endpoints. Per default, combinationalgates will be traced through and clock and asynchronous input pins of sequential elements (flip-flops) willbe stopped at.

By specifying the "NoGating rising" option, we can make the tracer stop at the first gate encountered.This is necessary since the scan enable signal is often connected to multiplexers and we want their inputpins to be endpoints. Once this option is underway you need to specify the internal pin of the paddriving the scan-enable signal, otherwise tracing will stop prematurely at the pad cell.

Read in the clock tree specification by selecting Clock . Design Clock ... from the menu. Using thebrowser select the clock tree specification file you have just modified. Press ’Load Spec’. DON’TPRESS OK yeta. You should now see a summary for all three clock specifications on the console,check it.

Our netlist may have some buffers on the high fan-out nets we want to build trees on. We need toremove them prior to CTS with the following command:

deleteClockTree -allaPressing OK will start the clock tree insertion. We need to make sure that the clock tree specification is correct before

we go ahead with this step. If you accidentally pressed OK here, it is advised to restart from the last saved point.

A large number of errors can be discovered by analyzing the pins connected to these nets, even beforebuilding a clock tree.

34

Page 35: Training 1

Select Clock . Tracer Pre-CTS Clock Tree .... To start the trace, click on the icon on the top left andaccept the default trace file name. A summary will be displayed on the console and the content of thetrace file visualized in the GUI.

We can see how the trees currently look like and what pins are connected to them. Look also at the tracefile directly. Things to look for include:

• Clock, reset, or scan-enable connecting to unexpected input pins, e.g. the reset signal should notconnect to pins other than asynchronous set/reset pins of sequential elements.

• Unexpected latches on the clock tree can be discovered this way (G or GB pin).

• Discrepancy between the number of endpoints of clock, reset and scan trees. For our examplenumbers are as follows:

– clock tree: 443 with 442 flip-flop CK pins + 1 RAM CK pin

– reset tree: 441 flip-flop RB pins

– scan tree: 447 with 441 flip-flop SEL pins + 6 mux S pins, to choose between the functionaland test (scan chain) output signal.

As we see, 442 flip-flops are clocked but only 441 recieve a reset signal, this is due to the resetsynchronization register being connected to the external reset signal rather than the internal resettree. As the reset synchronization flip-flop is also not on the scan chain and we use full scan otherwisethe 441 flip-flops on the scan tree match perfectly. You get the idea...

Open the file ’chip.cts_trace’ and search for "Clock Tree" to examine the leaf pins.

If everything looks OK we can proceed with clock synthesis. In the "Synthesize Clock Tree" form press"OK".

35

Page 36: Training 1

After a few minutes clock tree synthesis will be completed. Detailed reports will be generated under thedirectory specified on the form (most likely ’clock_report’). This directory includes a simple report file(’clock.report’).

A summary report is also displayed on the SoC-Encounter console. The first column shows the achievedperformance while the second column reports the target specified in the configuration file.

Check your results (summary and detailed reports). How many buffers were added? How many levelscreated? What’s the insertion delay? Are all constraints met?

Note 1: You will get a max transition time violation on ClkxCI_PAD/I which can safely be ignored. Aswe have specified an input transition time of 800 ps on all primary inputs there is no way CTS couldfulfill the 600 ps requirement at this point.

Note 2: Unless the “RouteClkNet YES” option was used (more on this later), the timing figures reportedare only estimates and might change quite a bit with detailed routing.

9 Timing Revisited

At this point we will have to go into some more detail about timing. During different stages of the designflow, we have slightly different timing constraints (Refer to the following figure for the differences in thethree stages).

a) synthesis initially the design does not contain any pads. The input delay tidel and the output delaytodel should contain the contribution of the input tinpad and output toutpad pads.

b) pre-CTS during placement and routing phase, all required I/O pads and drivers will be present. Atthis stage there is no clock tree present. The timing should be adjusted, as at this moment theinput delay tidel and output delay todel no longer include the pad delays.

c) post-CTS once the clock tree is inserted, the timing will change slightly again. Due to the clockinsertion delay tdi the internal clock will be slightly offset when compared to the external clock. Atthe input, the data travelling towards the first flip-flop inside the chip, will have more time, sincethis flip-flop will be trigerred by a clock signal that has been delayed by tdi. At the output however,the data that is coming from the chip will be launched with the internal clock, but will have to besampled by the external clock. Consequently there will be less time for this signal.

It should now be clear why it might be desirable to set constraints on the clock insertion delay property byspecifying minimum and maximum values in the chip.ctstch file by MinDelay and MaxDelay parameters.The clock insertion delay can play an important part in the I/O delay. You may want to keep the insertiondelay within certain limits to ensure proper I/O timing.

Design tools have different mechanisms to deal with these three different cases. The simple solution is touse multiple constraint files for different stages. However, both Synopsys Design Compiler and CadenceSoC-Encounter accept several parameters to deal with this problem automatically. In the following we willdiscuss on how SoC-Encounter calculates delays in the presence and absence of clock tree. The followingtable summarizes the most important settings:

36

Page 37: Training 1

timing analysis mode clock propagation mode clock latency

(setAnalysisMode) (set_propagated_clock) (set_clock_latency)

-noSkew forced ideal no effect

-skew -noClockTree forced ideal SDCs in effect

-skew -clockTree SDCs in effecta SDCs in effectb

astill ideal mode unless set_propagated_clock is setbset_clock_latency command is overridden by overlapping set_propagated_clock constraints

The timing analysis mode is automatically updated by SoC-Encounter to match the design stage, i.e.before clock tree insertion it is set to ’-skew -noClockTree’ and afterwards to ’-skew -ClockTree’. Theanalysis mode can also be changed manually with the setAnalysisMode command.

The two synopsys design constraints (SDC) set_propagated_clock and set_clock_latency are usuallyspecified by the designer in the ’chip.sdc’ file. Furthermore, CTS tries to add a set_propagated_clockconstraint on-the-fly (in memory), which can cause a number of problems:

• This constraint will only be added if the AutoCTSRootPin pin/port in ’chip.ctstch’ and the clockwaveform source pin/port (from the create_clock command in ’chip.sdc’) are perfectly identical,i.e. not port vs. instance pin etc.

• This constraint is never written to your ’chip.sdc’ file, so if you reload that file the constraint islost.

• Before CTS, only a pointer to your constraints file is saved along with the database. Now, if aconstraint was added by CTS, all loaded constraints (including the new one) will be saved alongwith the database to a new file (*.pt). Restoring this database will then load this new constraintsfile instead of the one in ’encounter/src/’ that you might have expected.Note: As soon as you manually (re-)load a constraints file, the behavior is reverted to the normalone.

Now, as can be seen from the table above, to get the actual timing of the buffers/inverters on the clocktree instead of ideal mode, setting both ’-skew -ClockTree’ and set_propagated_clock is required.Also note that set_propagated_clock gets overridden for all pre-CTS design stages and could thereforebe set right from the start (as already mentioned earlier).

In ideal mode, the clock tree insertion delay is zero unless the set_clock_latency command is usedto specify a different number, preferably close to the delay of the real tree (that is still to be inserted).While this "placeholder" delay has the advantage that the I/O timing doesn’t change between pre-CTSand post-CTS phases, it renders timing reports more intransparent and is not handled exactly the sameacross different tools. Therefore, do not use this command unless you know what you are doing.

In conclusion, it is recommended to include set_propagated_clock right from the start, not useset_clock_latency and load modified timing constraints after CTS only if required, i.e. when the I/Otiming numbers (set_input_delay, set_output_delay) need to be adjusted to account for the actualclock tree29. For this training we will modify and reload the constraints30.

29For slower clock speeds and/or uncritical I/O timing this is often not required.30It might be more convenient to keep a separate post-CTS constraint file rather than changing the numbers back and

fourth when redoing the flow.

37

Page 38: Training 1

The following figure illustrates all three stages in some detail. Whereever possible the same namingconventions as the textbook have been used31

tinpad toutpad

todeltidel

Tclk

tin2reg treg2reg treg2out

tdi

Tclk Tclk

tpd ff tsu fftpd a tpd b tpd c tpd d tpd etpd ff tsu ff tpd ff tsu ff

todeltreg2out

Internal Clock

External Clock Clock insertion delay

Less time for output

Chip

Clk

a b c d e

tinpad toutpad

todeltidel

Tclk

tin2reg treg2reg treg2out

Tclk Tclk

tpd ff tsu fftpd a tpd b tpd c tpd d tpd etpd ff tsu ff tpd ff tsu ff

Chip

a b c d e

Clk

Clk

tinpad toutpad

todeltidel

Tclk

tin2reg treg2reg treg2out

Tclk Tclk

tpd ff tsu fftpd a tpd b tpd c tpd d tpd etpd ff tsu ff tpd ff tsu ff

Top

a b c d e

tin2regtidel

More time for input

a)

b)

c)

31Refer to page 235 “How to formulate timing constraints”, and page 346 “How to achieve friendly input/output timing”for more on this topic

38

Page 39: Training 1

Modify the I/O timing constraints to account for the insertion delay of the actual clock tree, make surethat the clock is set to propagated mode and load the constraints ("Timing . Load Timing Constraint ..."a)

Run timing analysis (make sure to select "Post-CTS" as design stage).

Examine the reports timingReports/chip_postCTS*. You should now see the real timing on the clocknetwork.

If you have violations, run a "Post-CTS" (!) optimization with default settings. This should fix allviolations.

Save the entire design.aCurrently loaded constraints will be purged before the new ones get loaded.

10 Signal Routing

We will now route the signal nets. What you have seen so far are only trial-route nets that are not DRCclean and can therefore not be manufactured.

There are two routing engines in SoC-Encounter. WRoute is the older one and NanoRoute is supposedto be the latest and greatest. Start NanoRoute by selecting "Route . NanoRoute . Route...". A largewindow will open. Enable the "Insert Diodes" option (you can leave the "Diode Cell Name" field blank)and leave all other settings at their defaultsa. Click ’OK’ to start routing. You can observe the progressin the console window.

aOn multi-CPU or multi-core machines you can increase the number of CPUs used by selecting "Set Multiple CPU".This gives almost a linear speedup.

39

Page 40: Training 1

The "Fix Antenna" and "Insert Diode" will cause the router to change layers and/or insert special protectiondiodes in order to avoid damages that can happen during manufacturing due to charges that accumulateon the wires and stress the gate oxide of input pins. Note that this is usually referred to as "ProcessAntennas" which is entirely different from geometrical antennas (which is related to dangling wires).

Our example design should route without problems. This is not always the case and we might get geometryviolations. Geometry violations include shorts between nets and design rule violations (for example metallines are drawn too close to be manufactured as separate wires). Needless to say that we must solve allthese violations.

You should always closely examine the violations in order to find out what causes them. Sometimesthere is an unfortunate placement of macro-cells or power lines to blame and sometimes there is just notenough space to route all connections. Solutions range from re-running routing to completely reworkingthe floorplan.

Now that we have the real signal wiring we need to perform a postroute timing analysis to see if we stillmeet all constraints. At this point not only a setup time analysis, but also a hold time analysis needsto be run. Usually it is not necessary to deal with hold time until this point.Note that you have to do two separate runs, one for setup and one for hold, as it is not possible dothis in one single step. Use the GUI (make sure to select "Post-Route) or type the commands below toperform the two analyses.

timeDesign -postroutetimeDesign -postroute -hold

Inspect the two summaries and the report files written to the ’timingReports’ directory. You will mostlikely have setup violations.

To fix violations or increase the hold margin we can now perform a postroute optimization. Internal holdtime violations need to be fixed in any case as, unlike internal setup violations, they can not be avoidedlater on (i.e. real chip) by lowering the clock speed32.

Further possibilities to improve timing include over-constraining the "Post-CTS" optimization and en-abling the "Timing Driven" option of NanoRoute. Earlier in the flow, "Timing Driven Placement" might beworth a try. Please note that the biggest improvements are possible with ’Pre-CTS’ optimization as theregisters can be moved and resized at that stage. Per default, clock tree insertion will "fix" the registersto preserve the clock tree, i.e. they no longer can be moved or resized.

If your "reg2reg" setup violations are larger than 0.2 ns, this step will take rather long, i.e. 30 minutesor even longer. Therefore we will change the clock period (only for this exercise!!!) in order to haveonly a small violation of about 0.1 ns.

Modify and reload the constraint file, then perform a postroute optimization "Timing . Optimize ...".Make sure to select hold time fixing and specify a small extra margin for hold slack by selecting the"Mode" button, e.g. 0.2 ns.

Optimization will delete and re-route all nets that are affected by the changes and run setup and holdmode timing analyses at the very end. Once again, inspect the reports.

32This does not necessarily hold true for multi-clock designs.

40

Page 41: Training 1

Now let us have a look at the postroute timing of our clock tree(s)

reportClockTree -postRoute

This will print a summary on the console and write a couple of report files chip.ctsrpt* to theencounter directory. There should be no (or only minor) violations of our clock tree constraints.

Please note that the previous postCTS and postRoute setup (and hold) analyses already consider clockskew as they time every single path from the clock root to the leaf pins separately. Therefore, even arather big skew reported here doesn’t really matter as long as the former analyses passed.

So far, the clock tree has been routed as any other signal net. This is usually good enough, but if youwant, for whatever reason, to further improve clock net timings, you can do the following (in CTS):

• In the clock tree constraint file, set "RouteClkNet YES". This is a per-tree setting that instructsCTS to call NanoRoute in order to route this clock net during clock tree insertion. The wires get astatus of "FIXED" and will therefore not be changed later during signal routing. While this improvestiming on the clock tree, overall routability gets worse.

• To further improve timing, you can tell NanoRoute to route this net not like an ordinary signal net,but to create a balanced routing (by following the so called "RouteGuide" computed by CTS). Todo so, set "UseCTSRouteGuide YES" in the clock constraint file33.

33This will persistently(!) alter the global CTS Mode to “setCTSMode -useCTSRouteGuide”

41

Page 42: Training 1

11 Timing Debug

To analyze timing violations, SoC-Encounter also offers a graphical interface ("Timing . Debug Timing")that visualizes paths and allows cross-probing with the layout. We will not explain the tool in detail here,but rather make some important notes:

• This functionality is sort of standalone, it does not use results from the "timeDesign" command butruns a new analysis that generates the file top.mtarpt. Then these paths are visualized.

• If the above file already exists, it will usually simply be loaded. This means that whenever yourdesign has changed you have to regenerate this file in order to get up to date data. This can bedone with the "Generate" switch on the form that opens when you click the folder icon.

• When generating the top.mtarpt, the current timing mode is relevant, i.e. to analyze hold pathstiming mode has to be set to hold mode.

42

Page 43: Training 1

12 Finishing

We are almost done with backend design, there are only a few steps required to finish the layout andverify that everything is correct.

12.1 Insert Filler Cells

Now that we don’t need the additional space within the standard cell rows anymore, we have to fillthese gaps with filler cells. This is required for fabrication. In addition, some of them contain capacitorsbetween VCC and GND that filter spikes on the power lines.

source scripts/fillcore.tcl

Note that your row utilization will be 100% after this step. This means that you will have no roomfor further optimizations. Make sure to insert filler cells after all optimizations have been completed.Note: It is also possible to remove the filler cells with "Place . Filler . Delete..." or by using the scriptfillcore.tcl.

12.2 Checking Connectivity and Geometry Violations

Now that we are completely finished with the layout, we should make sure that we have no connectionerrors, i.e. all logic connections from the netlist are also present in the physical layout.

Select "Verify . Verify Connectivity ..." from the menu. A window will appear. Run the analysis andcheck the console for the report summary. There should be no violations.

In a similar way let us verify all geometrical shapes. Select "Verify . Verify Geometry ..." from the menu.Run the analysis and check the report on the console. You should get no violations.

43

Page 44: Training 1

There is a script that will perform the last verification steps for you automatically. You can set avariable DESIGNNAME to assign the base name for all the files generated by this script.

set DESIGNNAME MyBeautifulChip source scripts/checkdesign.tcl

12.3 Evaluate the Physical Design

Take the time to examine the routing. This is the main feedback you need for a second back-end iteration.Try to view all metal lines separately to see how congested your routing is. If you see a lot of Metal-6(orange) you are probably close to the density limit. In our design you should not notice any congestionand Metal-6 will barely be used. If your design routed without problems and the routing was rathersparse then the next time you could assign a smaller core area and increase the row utilization. On theother hand if the design barely routed you have found the limits, in a second iteration you might considerassigning a little more core area timing degrades with congestion.

Check the connections of your macro-cells and pads, this may give you an idea how to place the macro-cells the next time around. You need to get used to evaluating the result of different back-end designruns.

12.4 Generate Output Files

Congratulations, you have completed the back-end design. That was not so hard now, was it?

Save your design using "Design . Save Design As ... . SoCE" to the ’save’ directory and make surethat you use a name that shows this is a finished design (i.e. ’chip_final.enc’).

Finally we need to export all data needed for post layout simulation and physical verification (DRC/LVS).There is a script that will write out all relevant files to the ’out/’ directorya.

source scripts/exportall.tcl

aTo get complete supply net connectivity in the verilog netlist for LVS, the missing connections for the power andground pins (GNDIO/VCC3IO) of the pads are added and removed on-the-fly. We could also define and handle these twonets in the same way as VCC/GND, but there are more drawbacks than benefits.

Similar to the checkdesign.tcl file, the variable DESIGNNAME will be used to assign the base name of thefiles. If you do not specify a name, final will be used. After you complete this step you will have thefollowing files:

*.v This is the final netlist. Make sure to use this netlist for post layout simulations.

*.gds.gz The layout in GDSII (Graphic Design System II) format. This is the standard format forexchanging layout data.

*.sdf.gz The SDF (Standard Delay Format) file to be used for post layout simulation.

*.spef.gz Standard Parasitic Exchange Format. Includes all parasitics, can be used for timing and/orpower analysis.

44