IC power distribution systems are designed to provide needed
voltages and currents to the transistors that perform the logic
functions of a chip. The supply voltages are assumed to be constant
across a chip, and are expected to operate reliably over the chip's
lifetime. However, with the advent of ultra-deep submicron (UDSM)
technology, the VDD and VSS grids actually fluctuate in value
during chip operation due to increased resistance of metal lines,
high current levels, and package pin inductances. Furthermore, the
use of narrow line widths reduces long-term reliability of a chip.
As a result, power systems have become so complex that they can no
longer be designed using intuition and "back-of-the-envelope"
calculations.
The conditions contributing to the complexity of power
distribution systems have a significant impact on chip performance.
Voltage (IR) drops on VDD lines due to resistance affect noise
margins, which impact overall timing and functionality. Ground
bounce, a similar effect, occurs on VSS lines. These effects are
made worse by the presence of Ldi/dt voltage variations at package
pins due to the increased rate of change of current used in deep
submicron. High currents also induce electromigration effects in
which metal lines begin to wear out during a chip's lifetime.
If IC designers do not design power systems with these
conditions in mind, they have a difficult time producing a reliable
design on the first pass. Worse yet, a chip may fail in the field
after it is embedded in a system and in a customer's hands.
A complete picture of power grid integrity can only be obtained
when effects such as IR drop, ground bounce, Ldi/dt, and
electromigration are considered together. Because of the enormous
complexity of power distribution systems, voltage fluctuations and
reliability issues are difficult to predict without some form of
detailed design analysis. These are full-chip issues that must be
addressed by interconnect verification tools that have the capacity
and performance required to analyze detailed representations of the
chip in a reasonable amount of time.
If a power system design does not work properly the first time,
it can lead to multiple respins of silicon, which costs time, money
and possibly a lost business opportunity. By analyzing and
verifying the power system at the full-chip level, designers can
tape out with a high degree of confidence and greatly shorten the
overall time required to get a chip to market.
In this article, the key issues of power system design—IR
drop, ground bounce, Ldi/dt and electromigration—are
described, along with the related analysis issues. Methodologies
used to identify IR drop violations and potential electromigration
violations in the power distribution system are also described,
along with approaches that reduce the severity of these problems.
Designing with these issues in mind and performing full-chip
interconnect verification enables designers to address what would
otherwise be an intractable problem.
Power Grid IR Drop
 |
| |
In practice, each line should be fractured
into polygons and discrete RC values extracted for each polygon.
This produces a tremendous number of resistors and capacitors for
an ultra-deep submicron design. For example, in a 0.35 micron
design with five layers of metal, the VDD grid of a five million
transistor circuit may contain 30 million resistors and 20 million
capacitors. A similar number of resistors and capacitors may exist
in the VSS grid.
|
|
 |
As device geometries decrease in UDSM technology, interconnect line
width decreases, increasing the resistance along the line.
Designers typically compute resistance along a conductor by
counting the number of squares along the line and multiplying by
the sheet resistivity, which is usually provided in ohms/square. As
line width shrinks, the number of squares increases, causing an
increase in the total resistance along the line. Similarly,
designers compute the approximate capacitance along the line by
multiplying the area of the line times the capacitance per unit
area. Of course, both the capacitance and resistance are
distributed along the metal line so the RC values derived this way
are inherently accurate.
The effect of increased resistance on a power distribution
system is that supply voltage is no longer an ideal reference.
Instead, the supply voltage varies during normal circuit operation.
The current flowing through the resistance in the power grid causes
IR drops that depend on the placement of blocks, their interaction,
current levels, and resistance levels. In the past, low resistance
in a power system and relatively low current levels made IR drop a
second-order effect that could safely be ignored. But in ultra-deep
submicron, with lower supply voltages yielding smaller noise
margins, IR drop is a first-order effect and can no longer be
ignored during the design process.
Effect of Pin Inductance
Another source of voltage drop in the power supply is due to
package pin inductance—typically around 10nH to 20nH. Ldi/dt
creates a voltage drop across an inductor, and while L has not
changed significantly over the years, the value of di/dt has
continued to increase. In the meantime, the supply voltage has been
decreasing from 5V to 3.3V to 2.5V and recently as low as 2.0V.
These effects have combined to a point where the Ldi/dt drop can
contribute significantly to an overall voltage drop in the power
grid, especially in peak demand situations. The overall voltage
drop is due to both Ldi/dt and IR, which are both dynamic phenomena
and therefore cannot be analyzed quantitatively in a static
context.
Ground Bounce
Validating that the ground voltage does not rise above a 10%
noise budget is as important as ensuring that VDD does not drop
below a 10% budget. Measuring ground bounce requires that the
substrate be modeled as a distributed RC network in parallel with
the metal routing for the ground grid. This significantly increases
the complexity the network, especially when pin inductance or a
more complicated pin model is included. A limited form of ground
bounce could be obtained by modeling substrate contacts as
individual ideal capacitances, but these values are difficult to
obtain. An even more conservative approach to ground bounce
analysis ignores the substrate entirely; however, using this
approach, you would see that the behavior observed during analysis
would be worse than the actual ground bounce on the chip.
Electromigration
 |
| |
If a chip fails in the field long before its
expected lifetime, the consequences may be severe. For example,
Intel's Pentium® was recalled in 1994 due to PLA programming
error. The total cost was estimated to be half a billion dollars,
not including the marketing and PR required to rebuild the image of
the company. Clearly, the cost of a recall may affect more than the
bottom line.
|
|
 |
Electromigration (EM) is another important issue in the design
of deep submicron power distribution systems. High current
densities and narrow line widths cause EM. In the past, low current
densities and wide metal lines, combined with special processing,
helped to avoid the effects of EM. But now, speeds of 100MHz and
higher and geometries of 0.35m and smaller have increased the
potential for EM problems.
Failures due to EM can be catastrophic because they occur in the
field when the chip is in a system and in a customer's hands.
Depending on the location and number of failures, the chip may
begin to operate incorrectly or shut down completely, which can
lead to catastrophic consequences for a chip design company.
Issues in the Design of Power Distribution
Systems
Designing a power distribution system requires the consideration
of both EM and IR drop in a full-chip context. For example,
consider the two blocks in Figure 1. If power distribution
for Block A is examined in isolation, the additional loading due to
the presence of Block B is not taken into account. If power is
routed through Block A to Block B, a larger IR drop will occur in
Block B since power is also being consumed by Block A before it
reaches Block B. As more and more blocks are added, the complex
interactions between the blocks determine the actual voltage
drops.
The placement of these blocks is typically based on the timing
requirements of a system rather than on IR drop, or else placement
is based on the size and shape of blocks at the floorplanning
stage. Therefore, sizing the buses properly to minimize IR drop
while satisfying the required timing and area constraints is a
design challenge that can only be met using full-chip analysis.
Figure 1: Routing through a block
Since the total IR drop is based on the resistance seen from the
pin to the block, one could route around the block and feed power
to each block separately, as shown in Figure 2. Ideally, the
main trunks should be large enough to handle all the current
flowing through separate branches. In this case, the T-junctions
have a high current density and may be prone to EM problems. It is
important in this type of grid to examine the current density at
all junctions, especially the corner providing large amounts of
current to each block, to ensure that EM problems do not exist. The
same argument holds for every block routed in this manner. Again,
power grid design is truly full-chip when voltage drop and EM
issues are considered.
Figure 2: Routing around the blocks
Although routing power this way is easier to control and
maintain, it also requires more area to implement. The large metal
trunks of power have to be sized to handle all the current for each
block. This requirement forces designers to set aside area for
power busing that takes away from the available routing area.
Another approach to minimizing IR drop, depicted in Figure
3, is to have a solid grid of Metal 4 and Metal 5 and use a via
array to connect the two layers, effectively tying the whole grid
to VDD. While this solves the problem at higher levels, it simply
shifts the problem down to the lower levels of metal. What about
Metal 3 and Metal 2? Are they wide enough to handle the current
levels they will sustain in terms of IR drop and EM?
Depending on the methodology, lower levels are often left
floating until final assembly. Low resistance, high current paths
can often be created by random placement of lower blocks. In fact,
when you design the logic circuitry in the block, it is not clear
where Metal 3 will tap to Metal 4, so you cannot predict the
current flow. And if you cannot predict it, you must analyze it.
The example in Figure 3 illustrates that you cannot avoid
such a problem by solving it locally; you just shift it elsewhere
in the design. Visibility into the global consequences of local
changes is required to truly analyze overall power integrity.
Figure 3: Vias in a mesh array methodology
 |
| |
Voltage drop on a power grid primarily
affects timing. IR drop compromises the drive capability of the
gates and increases the overall delay. Typically, a 5% drop in
supply voltage can affect delay by 15% or more. Delay in a clock
buffer has been known to increase by more than 100% due to IR drop.
Such an increase in delay is critical when you are managing clock
skews in the range of 100 picoseconds. Imagine the effect of this
type of unexpected delay along centrally located critical paths.
Then path delay is no longer predictable and, in fact, the critical
path may be somewhere else in the design due to IR drop. This means
that the performance or functionality of the design is
unpredictable. Ideally, timing calculations should take worst-case
IR drop into account to improve accuracy.
|
|
 |
Part of the grid may have to be removed to route some signals,
as shown in Figure 3. Which straps can be removed without
introducing problems? If you arbitrarily pick one that is
conducting a large amount of current, the excess current must flow
in adjacent straps which may push the current density in them
beyond acceptable levels. Clearly, such decisions cannot be made
without determining the current levels in the straps and then
picking ones that have lower current levels. The complexity of the
problem requires a set of power grid analysis tools. These examples
illustrate that design decisions must be made with a global
perspective in mind.
IR drop is a dynamic phenomenon due primarily to simultaneous
switching events in a chip such as clocks, bus drivers, and memory
decoder drivers. As large drivers begin to switch, the simultaneous
demand for current from the power grid stresses the grid. In a
static context, voltage drops are highest near the center of a
design and lowest near VDD connections to the power supply.
However, during dynamic operation, these simultaneous switching
events can cause severe voltage drops anywhere on the chip, and
these are the ones that must be identified. These events, usually
well known, can be triggered with typically fewer than 100
vectors.
The effect of IR drop on chip performance is significant. IR
drop compromises the voltage noise margins of logic gates, due not
only to voltage drops in the power grid during the rising edge of a
signal, but also to the increase in voltage in the ground grid
because of the same phenomenon during the falling edge. Once the
noise margins drop below the budgeted amount, typically 10%, the
design is not guaranteed to operate properly.
Over the years, supply voltage has been shrinking as device
dimensions are scaled to avoid transistor punch-through conditions,
hot-electron effects, and device breakdown. This has resulted in
smaller and smaller noise margins. With IR drop, the margins are
reduced even further which makes it even more difficult to manage a
multi-million-transistor design.
In Figure 4, a portion of a design is shown with two
metal lines connected by a narrow strap of metal. The metal lines
must be wide enough to carry the average current needed to feed the
circuitry connected to it. If the lines are too narrow, EM or IR
drop may occur.
Figure 4: Electromigration in the power grid
Since large currents flow in the periphery of a design, EM
problems are usually observed in the outer regions of a chip.
However, vias scattered all over the design may also be prone to EM
problems. Furthermore, the lower levels of metal connected to
devices are usually narrower and may cause EM problems depending on
the current levels. Therefore, it is important to look for EM
across the entire chip rather than just specific regions.
Finding all areas susceptible to EM prohibits any use of data
reduction. You must include all the detailed extracted resistance
data—otherwise, you may lose useful information. For example,
a via cluster that has been reduced to one via resistor may mask a
potential EM failure, and an EM analysis tool would miss the
problem.
In Figure 5, current flows from Metal 5 to Metal 4
through a via array. Crowding occurs as the current "hugs the
curve" going from one level to the other. Some of the vias in the
center of the layout have been tagged as ones that may suffer from
EM. If the 16 vias in the array were collapsed into one via, this
region would not be flagged as having a problem. In reality, the
nine indicated vias in the cluster may fail due to the high current
density in the narrow dimension of the cluster. Any extraction and
analysis for EM must have unreduced data to provide useful
feedback.
Figure 5: Electromigration in via arrays
Electromigration in the power grid is a DC phenomenon due to the
average current flow in metal lines and vias. Design guidelines for
EM are based on average current levels which, in turn, depend on
signal line capacitance. Therefore, obtaining an accurate EM
prediction requires the use of accurate capacitance information.
Furthermore, since metal lines vary in height and material
properties at different levels in the design, each metal layer has
different failure criteria. To identify all potential areas of EM
problems across a chip, the only solution is to perform full-chip
analysis.
Black's law is used to predict the mean-time-to-failure (MTTF)
of a metal line using the average current density, J, seen
by the line. The more accurate the average information, the better
the estimate of the MTTF. To obtain this information, you need to
use a large number of vectors to exercise the design. The average
current in every metal line must be measured and then divided by
the width and thickness of the line. This is clearly impossible to
do on a fabricated chip, and prohibitive to do using circuit
simulation.
An alternative to expensive transistor level simulation is to
obtain average currents from activity information, in the form of
toggle data, using a gate-level or higher-level tool. Toggle data
is simply the number of times a gate switches high or low during a
simulation of thousands of clock cycles. If the toggle data is
divided by the number of clock cycles, the activity information is
obtained. For example, the core of a memory circuit may have an
activity of 0.02% while a data path may be closer to 5%. These
factors can be converted into average current information for the
transistors connected to the power grid.
You must also determine the average flow of current in the
entire power grid to assess reliability risks of a given design. It
is not sufficient to determine the average behavior of a block
taken in isolation, because the block may only be exercised
periodically in a full-chip context. Furthermore, changes to the
power grid in one section tend to have a global impact. Data
reduction cannot be used either since some of the real EM problems
may be masked by the reduction itself. Therefore, an accurate
picture of EM risk cannot be obtained unless the entire chip is
verified as a single entity. Any tool used for this purpose must
have the capacity to analyze multi-million resistor grids.
Improving Full-chip Power Integrity and
Reliability
The problems described above must be identified and fixed before
going to silicon since they are very expensive to debug after
fabrication. Verification tools exist for this purpose. Clearly
iterations through a verification loop are preferable to more
expensive iterations through a fab-find-and-fix loop.
When you look at methods of performing full-chip verification,
it is clear that an engineering solution must be developed. A
single run simulation of multi-million transistor circuits, with
power grids and ground grids each containing more than 30 million
resistors and a similar number of capacitors, is prohibitively
expensive. Any simulation approach that attempts to solve the
transistors and grids together will suffer from severe capacity
limits. As mentioned above, block-based methods by themselves do
not suffice since power distribution planning is a full-chip issue.
Given the scope, trying to find the IR drops and EM risks is
certainly a daunting problem. But if verification is the goal
(rather than simulation), excellent approaches exist to address the
problem.
Reducing Voltage Drop
As described earlier, voltage drops in the power grid come from
two sources: IR and Ldi/dt. Reducing the impact of IR drop in a
power distribution system can be accomplished in several ways. The
simplest approach is to widen the lines that experience the largest
voltage drops since increasing the width decreases the resistance
(and the IR drop). However, this may not always be possible due to
constraints in the routing area. Since IR drop is due primarily to
simultaneous switching events, another approach is to stagger the
gates that are switching together such that they switch at slightly
different timesat least enough to keep the problem within the
noise budget. Alternatively, you could reduce the buffer size, but
this may not be possible if the design fails to meet performance
requirements with smaller devices. Device switching can be
staggered to reduce the peak demands of current by introducing
delays on the signals driving the gates.
One effective approach is to use decoupling capacitors between
power and ground, which can deliver the additional current needed
by the power distribution system. These decoupling caps are usually
scattered throughout the design in any available space, using
transistors with their gates tied to VDD and their source-drains
tied to VSS. All empty regions of the chip are filled with
decoupling caps using the philosophy that you can never have
enough. Ldi/dt effects can be mitigated by placing large
capacitances near the pins.
A more aggressive solution is to use a ball-grid array,
sometimes called solder bumps or C4™ bumps, where the power
supply connections can be at various points within the chip. This
expensive solution requires placing many C4 bumps across the chip
to minimize the worst-case IR drop in any location. This solution
tends to push EM problems to lower levels of metal that are usually
narrower. Also, this solution cannot be used in sensitive areas
such as memories and dynamic logic because C4 bumps generate alpha
particles that may cause logic value upsets in the sensitive nodes.
Nevertheless, when used appropriately, C4 bumps can reduce IR drop.
The key to design is proper placement of the C4 connections, which
can only be done effectively with full-chip analysis.
Reducing Electromigration Problems
Electromigration failures can be reduced in several ways. The
basic idea in all approaches is to reduce the average current
density seen by any metal segment. The simplest approach is to
widen the metal lines. However, increasing the width beyond a
certain point leads to over-design, which costs area and can reduce
yields. Another approach is to change the current flow in the power
grid itself by adding jumpers and straps between different points
in the grid. This would reroute current around the affected areas,
but such changes would require another verification pass to confirm
that the problem has not simply been moved to another area of the
design.
In Figure 6, note that the standard cell block on the
right would not shown any EM risk if analyzed by itself. However,
in a full-chip context, current flowing to adjacent blocks
overloads the power connections in the block, and the analysis tool
identifies an EM risk. Recognizing these problems at the planning
stage is helpful, but difficult to do. EM requires a detailed grid
with unreduced data. Therefore, a complete picture of EM risk can
only be obtained at the verification stage.
Figure 6: The most difficult aspect of power grid design with respect to EM is that no one block can be isolated from another. This plot demonstrates EM risk at the full-chip level.
A key point made earlier is that IR drop and EM problems cannot
be solved separately; they must both be considered during design.
To illustrate this, consider how to solve an IR drop problem in the
chip in Figure 7a. The figure shows a power flow diagram of
the VDD grid in a multimedia chip. Different shading indicates
various levels of voltage drops. The darkest areas are the lowest
points (valleys) of the IR drop contours. A significant voltage
drop occurs in the center region of the chip because only the top
portion of the power grid feeds the large drivers in the top
section. The upper and lower regions of the power system are not
connected.
Figure 7a: Power grid before changes
If we strap the upper and lower regions together in two places,
the voltage drop problem is reduced significantly, as indicated in
Figure 7b. The depth of the IR drop valleys has been reduced
to acceptable levels, and the voltage drops have been spread over a
wider area of the grid. The lower region is now supplying more
current to the upper region and therefore a better power
distribution has been obtained by adding the two straps.
Figure 7b: Power grid after changes
However, when examined in the context of electromigration, the
results show that fixing the IR drop problem has caused an EM
problem in the lower portion of the design. A review of Figure
6 (before the straps were added) shows EM problems at the
periphery of the chip due to the high current levels in those
regions. The lower half of the chip shows no EM problems.
But in Figure 8 (after straps were added), new EM
problems are evident in the lower half as indicated by the small
horizontal white lines. It was clear that the lower portion would
supply additional current to the upper half of the design once a
bridge was built between the two; however, it was not clear exactly
how current would flow and exactly where EM problems might
occur.
Figure 8: New EM problems in unexpected regions after changes
 |
| |
Does increasing a line width always improve
electromigration risk? No. Thin wires can have better EM
characteristics than wider wires due to the physics of
electromigration. Be aware that more is not necessarily better.
Proper EM analysis accounts for this width dependence.
|
|
 |
Repairing all the areas with potential EM problems would be
labor-intensive, time-consuming and, frankly, unnecessary. Since
every chip has a lifetime associated with it, the MTTF factor can
be used to compute a probability of failure due to EM in a given
lifetime. The goal of any changes to the power grid would be to
decrease the probability of failure to an acceptable level. This
limits the actual number of repairs needed and makes the job
manageable.
Summary
The design of power distribution systems for deep submicron ICs
is complicated by full-chip issues such as IR drop, ground bounce,
Ldi/dt, and electromigration. In the past, certain DRC and visual
checks were performed on the grid to ensure compliance with the
constraints imposed by these issues. Usually, over-designing was an
acceptable solution. But as technology moves deeper into UDSM, this
is not a viable approach. Too much performance is sacrificed or the
area penalty of over-designing leads to decreased yields. However,
pushing the edge of the envelope may lead to under-designing. Chips
that have been under-designed often fail on the test bench or later
in the field. Therefore, situations of over-design and under-design
must both be identified when evaluating the integrity of a power
distribution system.
In the end, the design tradeoffs that satisfy all the necessary
constraints are too complex to handle without tools that provide
visibility into specific problems and their locations on a chip.
Without these tools, today's designer has a formidable task in
designing a power grid that can handle the power demands over the
chip's lifetime. Designers are often required to tape out a design,
and are left hoping that nothing will go wrong when the chip comes
back from the fab. Murphy's Law is apropos for this situation: if
something can go wrong, it probably will.