Part of the  

Solid State Technology

  Network

About  |  Contact

Reliability for the Real (New) World

By Dina Medhat

There’s nothing more annoying than a device that doesn’t perform as expected. Nearly everyone has experienced the ultimate frustration of the “intermittent failure” problem with their laptops, or a cellphone that suddenly and inexplicably stops working. Now imagine that failure occurring in a two-ton vehicle traveling at highway speeds, or in a pacemaker implanted in someone you love. With electronics moving into virtually every facet of our lives, designers are facing unique challenges as they create (or re-engineer) designs for new high-reliability, environmentally-demanding applications like automotive and medical.

Significantly increased longevity requirements, coupled with new stresses, new circuits and topologies, increased analog content, higher voltages, and higher frequencies, make the task of ensuring performance and reliability harder than ever. The corollary to these new constraints and requirements is the need for verification technology and techniques that enable designers to find and eliminate potential electrical failure points and weaknesses.

Electrical overstress (EOS) is one of the leading causes of integrated circuit (IC) failures, regardless of where the chip is manufactured or the process used. EOS events can result in a wide spectrum of outcomes, covering varying degrees of performance degradation all the way up to catastrophic damage, where the IC is permanently non-functional. Identifying and removing EOS susceptibility from IC designs is essential to ensuring successful performance and reliability when the products reach the market.

When we discuss EOS, however, it’s important to understand that EOS is technically the result of a wide range of root cause events and conditions. EOS in its broadest definition includes electrostatic discharge (ESD) events, electromagnetic interference (EMI), latch-up (LUP) conditions, and other EOS causes. However, ESD, EMI, and LUP causes are generally differentiated, as shown in Figure 1.

Figure 1. Typical root causes of EOS events. See Reference 1.

Any device will fail when subjected to stresses beyond its designed capacity, due either to device weakness or improper use. The absolute maximum rating (AMR) defines this criterion, as follows:

  • Each user of an electronic device must have a criterion for the safe handling and application of the device
  • Each manufacturer of an electronic device must have a criterion to determine if a device failure was caused:
    • By device weakness (manufacturer  fault)
    • By improper usage (user fault)

Device robustness is represented by the typical failure threshold (FT) of a device. Because FTs are subject to the natural distribution of the manufacturing process, a product AMR is set to provide the necessary safety margin against this distribution (to avoid failures in properly- constructed devices). The safe operating area (SOA) of a device consists of parametric conditions (usually current and voltage) over which a device is expected to operate without damage or failure (Figure 2). For example:

  • Over-voltage tends to damage breakdown sites
  • Over-current tends to fuse  interconnects
  • Over-power tends to melt larger areas

Figure 2. Graphical interpretation of an AMR. The yellow line represents the number of components experiencing immediate catastrophic EOS damage. See Reference 2.

EOS events can result in a wide spectrum of outcomes. Electrically-induced physical damage (EIPD) is the term used to describe the thermal damage that may occur when an electronic device is subjected to a current or voltage that is beyond the specification limits of the device. This thermal damage is the result of the excessive heat generated during the EOS event, which in turn is a result of resistive heating in the connections within the device. The high currents experienced during an EOS event can generate very localized high temperatures, even in the normally low resistance paths. These high temperatures cause destructive damage to the materials used in the device’s construction [2].

As shown in Figure 3, EOS damage can be external (visible to the naked eye or with a low-power microscope), or internal (visible with a high-power microscope after decapsulation). External damage can include visible bulges in the mold compound, physical holes in the mold compound, burnt/discolored mold compound, or a cracked package. Internal damage manifests itself in melted or burnt metal, carbonized mold compound, signs of heat damage to metal lines, and melted or vaporized bond wires.

Figure 3. External and internal EOS damage. See Reference 3.

So, if preventing EOS conditions in your design is a good idea, just how do you do that? In the past, designers used a variety of methods to check for over-voltage conditions, relying mainly on the expertise and experience of their design team. Manual inspection is probably the most tedious and time-consuming approach, and hardly practical for today’s large, complex designs. Another conventional approach is the use of design rule checking (DRC) in combination with manually-applied marker layers. Manual marker layers are inherently susceptible to human mistakes and forgetfulness, and this approach also requires additional DRC runs, extending verification time. Lastly, there is simulation, which can take a long time to run, and is dependent on the quality of the extracted SPICE netlist, SPICE models, stress models, and input stimuli.

Voltage Propagation

Voltage propagation is an automated flow that propagates realistic voltage values to all points in the layout, eliminating the more fallible manual processes. An automated voltage propagation flow (Figure 4) generates the voltage information automatically, without requiring any changes to sign-off decks, or any manually added physical layout markers.

Figure 4. Automated voltage propagation flow.

Example

Let’s debug a typical over-voltage (EOS) condition. We’re using the Calibre® PERC™ tool for the voltage propagation, and the Calibre RVE™ results debugging environment for viewing and debugging the results. The debugging steps are illustrated in Figure 5.

(1)   The Calibre PERC run identifies a device with a 3.3V difference between propagated voltages to gate pin and source pin, which is greater than the allowed breakdown limit of 1.8V for this device type. To debug this violation, we first highlight the violating device in a schematic viewer

(2)   Next, we must understand how the gate can receive a propagated voltage of 3.3V. To do that, we initiate a trace of the gate pin using the Calibre RVE interface

(3)   The trace results provide the details of the voltage propagation paths in the voltage trace window (where “start” is the gate pin and “break” is the 3.3V net)

(4)   We can then click on specific devices/nets from the voltage trace window to highlight them in our design data in the schematic viewer.

(5)   Step 4 provides us with the information we need to analyze and resolve the voltage overload condition.

Figure 5. Calibre PERC voltage propagation interactive debugging.

Summary

Designers at both advanced and legacy nodes are facing new and expanded reliability requirements. New solutions are emerging to ensure continuing manufacturability, performance, and reliability. Automated voltage propagation supports the fast, accurate identification of reliability conditions in a design, enabling designers to analyze and correct the design early in the verification flow. Finding and eliminating often-subtle EOS susceptibilities before tapeout helps ensure that designs will satisfy the performance and reliability expectations of the market.

References

[1]         K. T. Kaschani and R. Gärtner, “The impact of electrical overstress on the design, handling and application of integrated circuits,” EOS/ESD Symposium Proceedings, Anaheim, CA, 2011, pp. 1-10. URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6045593&isnumber=6045562

[2]         Industry Council on ESD Target Levels, “White Paper 4: Understanding Electrical Overstress – EOS,” August 2016. https://www.esda.org/assets/Uploads/documents/White-Paper-4-Understanding-Electrical-Overstress.pdf

[3]         “Electrical Overstress EOS,” Cypress Semiconductor Corp. http://www.cypress.com/file/97816/download

Author:

Dina Medhat is a Technical Lead for Calibre Design Solutions at Mentor Graphics. Prior to assuming her current responsibilities, she held a variety of product and technical marketing roles in Mentor Graphics. Dina holds a BS and an MS from Ain Shames University, Cairo, Egypt. She may be contacted at dina_medhat@mentor.com.

2 Responses to “Reliability for the Real (New) World”

  1. Dr Basil Magdi (production Manager) Says:

    Good job..highly appreaciated

  2. Victor Avendano Says:

    Very good explanation on this topic. Thanks for sharing

Leave a Reply