Part of the  

Solid State Technology

  Network

About  |  Contact

How to Build CMP Models for Hotspot Detection

October 19th, 2017

By Ruben Ghulghazaryan, Jeff Wilson
Mentor, a Siemens Business

Over the last two decades, chemical mechanical polishing (CMP) has become a mainstay in the IC manufacturing process. Foundries employ it to remove excess materials from silicon wafers and to smooth wafer layers, such as front-end-of-line (FEOL) layers like shallow trench isolation (STI) and back-end-of-line (BEOL) layers like metal interconnect. As one might expect, with the introduction of each new process, CMP has become exponentially more sophisticated, and employed with greater frequency. And the process is not without risks—CMP can create new defects through over- and under-polishing.

What’s perhaps not so widely understood is that the root cause of many CMP issues actually originates in design-specific layout issues that can be corrected before manufacturing by the use of dummy fill, slotting, or simply the redesign of some cells. To address these CMP hotspot issues proactively, you first need an accurate CMP model and analysis method to find hotspots in your design that are likely to pose problems during CMP, and then you need to employ the appropriate methods to fix those problems.

Let’s take a look at how to build CMP models for CMP simulation, run analysis, and perform hotspot detection.

CMP Modeling

Due to the complex nature of today’s CMP process, creating an accurate CMP model that takes into account the complicated chemical and mechanical mechanisms of polishing is a multi-faceted challenge. Current CMP modeling includes modeling of the polishing processes and numerous deposition and etch processes, including copper (Cu) electrochemical deposition (ECD), chemical vapor deposition (CVD), high-density plasma (HDP) CVD, spin-on dielectric (SOD), and etchback.

The basic concept of CMP modeling is to extract geometrical properties of the patterns on the layout, and predict post-polishing thickness variation for each pattern dependent on its position on the chip. To begin, a full chip is divided by windows of fixed size. For each window, geometric characteristics of a pattern in the window are extracted and used for simulation, as shown in Figure 1.

Figure 1: Geometry data extraction from a layout.

Pattern density is defined as the ratio of total area of polygons in a window divided by the area of the window. Numerous studies show that pattern density, combined with the width of polygons, the space between them, and the perimeter of polygons, plays a critical role in characterizing the results of the polishing process.

After Cu ECD, the surface profile of BEOL metal layers is very non-planar, with numerous bumps and valleys. Because large bumps have a strong impact on polishing results, and may delay barrier layer opening under the bumps and nearby areas, modeling of the post-ECD surface profile is critical to ensuring high-accuracy CMP modeling. Non-planar surface profiles are also typical for FEOL layers after isotropic oxide layer (TEOS) deposition, HDP-CVD, SOD, flowable CVD, and other processes.

The CMP model must not only simulate polishing results for multiple materials due to the patterns’ geometry specifics, but also capture long-range polishing effects specific to a given CMP process. For example, it is well-established that pad pressure and bending, which lead to pressure redistribution within die and wafer, are mainly responsible for long-range effects in CMP, so the model must account for these factors.

CMP Test Chips and Measurements

A key step in CMP model building is the calibration of deposition and CMP planarity models with measurement data collected from test chips. This allows you to select the best model parameters that truly reproduce process conditions at your targeted foundry. After deposition and polishing steps, measurements are collected via line scans, cross-section images, and a measurement data table is filled with erosion, dishing, and thickness data. Layer stack information, deposited layers thicknesses, and CMP process conditions are used to fill the recipe file of a process. Using the recipe file and measurement data table, model parameters are calibrated. Figure 2 illustrates the CMP model building flow.

Figure 2: CMP model building flow.

Because test chips play such a critical role in accurate CMP modeling, test chip design must take into account long-range effects of the CMP process, the ability to collect high-quality measurements, and multi-layer stacking of test pattern structures for FEOL and some BEOL layers in a way that minimizes multi-layer effects. The size of a test chip and the number of structures must be selected in a way that provides good coverage of width, space, perimeter, and pattern density values supported by the technology node, without violating design rule checking (DRC) constraints.

For BEOL metal layers, a CMP test chip usually consists of periodically-placed array blocks of parallel trenches of different widths with differing spaces between them. Spaces between test patterns should be large enough to avoid pattern interactions due to CMP long-range effects. Process conditions usually require dummy fill between array blocks. To get high-quality line scan data, dummy exclude areas should be reserved between array blocks. Figure 3 shows examples of two CMP test chip layouts.

Figure 3: CMP test chip design layouts.

For CMP model building, an atomic force microscope (AFM) scanner or other profiler tool is often used to collect erosion and dishing data from the line scans over these test patterns (Figure 4).

Figure 4: Erosion and dishing data from line scans.

Normally, transmission electron microscopy (TEM) or scanning electron microscopy (SEM) cross-section images are used to obtain oxide, nitride, or metal thickness values. To avoid multi-layer stacking effects, either part of a layer or all of the layer may be covered by dummy fill to prevent the effect of surface profile variation of the underlying layer on test patterns at higher layers.

FEOL CMP Modeling

Design of CMP test chips is more challenging for FEOL layers. The restrictive design rules of advanced technology nodes don’t support long parallel trenches. Instead, short array lines of similarly-oriented rectangles separated by a variety of spacing values in both the horizontal and vertical directions, known as islands, are used in test pattern blocks. This layout poses a challenge for scanner positioning and data collection, because the scanner may pass between the rectangles and fail to collect the oxide-to-nitride transition height difference. To minimize this possibility, the space between rectangles that is orthogonal to the scanner direction is set to the minimal possible value, and the space in the scanner direction is varied (Figure 5). Also, a scanner will make two or three passes over test patterns, with each pass separated by some distance from the others, and the most appropriate scan line data can then be selected for modeling.

Figure 5: FEOL CMP test chip specifics and line scan directions.

High-K Metal Gate

The specifics of HKMG and Al RMG technology require that test chip patterns of POP and Al RMG be the inverse of each other. At POP, the sacrificial polysilicon (poly) layer is removed, and the Al layer is deposited and polished. The inverse (or negative) of the poly layer is used for oxide deposition and polishing at POP step modeling. For Al RMG, the poly layer is used. Sufficient test wafers must be reserved and processed to collect the required measurements for POP and Al RMG steps.

Etchback

Because the deposited oxide pattern depends on the underlying pattern, the surface profile after oxide deposition may be highly non-planar, with large variations in oxide thickness and density. In oxide polishing processes like shallow trench isolation (STI) CMP, inter-level dielectric (ILD) CMP, inter-metal dielectric (IMD) CMP, and others, a reverse etchback process is often used prior to the polishing step to prevent film pattern density mismatches over the design that lead to post-CMP film thickness variation.

In reverse etchback, a second mask is used to etch back raised areas in the deposited film by lowering the film density. An etchback mask is usually designed by shrinking the features of the layout by a fixed amount (etchback bias). For STI processes, the underlying nitride is used as an etch stop layer. For large features, this reverse etchback removes a majority of the raised material, resulting in lower oxide density.

Selective reverse etchback refers to the customization of the etchback mask that results in less material removal than the nominal etchback process. It is accomplished by replacing a large etchback feature with an array of selective etchback cells, or even the complete removal of etchback features in some regions. A selective reverse etchback mask actually consists of two masks: one mask selects the areas where the etchback is performed (or not performed), while the other mask defines the features for etchback. For example, selective reverse etchback may be used after the HDP-CVD deposition process over large space areas where large raised oxide islands appear after deposition, as shown in Figure 6.

Figure 6: Schematic view of HDP-CVD selective reverse etchback. (a) Initial pattern, (b) Deposited oxide pattern, (c) Selective reverse etchback mask, (d) Oxide pattern after etchback.

Modeling of the etchback process assumes modification of post- deposition profile geometry and height data due to oxide removal over large oxide areas, as defined by the selective and nominal etchback masks. The post-CMP profile trend may significantly change due to etchback (Figure 7).

Figure 7: Post-CMP surface profile change due to etchback.

Hotspots Detection Using CMP Simulation

To find hotspots in the design, electronic design automation (EDA) vendors offer CMP modeling tools and simulators/analyzers. For example, Mentor’s Calibre® CMP ModelBuilder tool supports models for the deposition processes mentioned above, and it is able to generate post-deposition profiles for polishing. The Calibre CMP ModelBuilder geometry extraction step calculates pattern density, weighted average width, space, perimeter, and other characteristics for each window, and passes them to the CMP model for simulation. The Calibre CMP ModelBuilder tool then calculates local pressure distribution due to surface profile height variation, and defines local removal rates depending on local pattern geometry and dishing. Time evolution of the polishing profile is modeled until the CMP stop condition is satisfied. Numerous CMP stop conditions used by CMP tools are supported by the simulator, and users may select the one appropriate for their process.

After the CMP model is built, designers can then use the Calibre CMPAnalyzer tool, which provides automated multi-layer CMP simulation, hotspot detection, and analysis. Designers input the GDS or OAS file of a design into the Calibre CMPAnalyzer tool, specify the layer numbers that must be simulated, and select the best recipe file of the process created by the Calibre CMP ModelBuilder tool.

With the Calibre CMPAnalyzer tool, designers can then perform numerous Boolean layer operations like OR, AND, and NOT to prepare the layers for CMP simulation. Moreover, they may use a large set of Calibre Standard Verification Rule Format (SVRF) commands to generate layers for CMP simulation. This is especially useful for modern FEOL layers construction, since numerous layers are used to define STI and other layers due to double and triple patterning.

The Calibre CMPAnalyzer multi-layer CMP simulation flow supports different layer stacking options that can be used for CMP simulation of each layer to study multi-layer stacking effects and possible hotspots due to multi-layer stacking. The Calibre CMPAnalyzer tool also supports custom hotspot scripts, in which users define their own criteria for hotspots checking. Users can also generate color maps and histograms of simulated data to easily visualize post-polishing profiles and possible hotspot areas (Figure 8).

Figure 8: Color maps and histograms of different layer properties for comparative analysis.

They can detect erosion, dishing, and depth-of-focus hotspots by using multi-layer simulation with defined threshold values for hotspots. Users can also generate simulated line scan and profile plots for measured line scan data comparison for CMP model validation and simulated surface profile analysis.

Conclusion

CMP modeling has become a powerful tool for both process engineers and chip designers. It enables design teams to detect potential CMP hotspots prior to manufacturing by providing visualization and analysis of simulated CMP. CMP simulation also contributes to the improvement of the design process by enabling designers to tune dummy fill solutions and enhance RC extraction accuracy, among others.

Authors

Ruben Ghulghazaryan is a lead R&D engineer in the Design to Silicon division at Mentor Graphics. He has extensive experience in both theoretical and applied physics research, with numerous industry and academic publications. He received a M.Sc. in Theoretical Physics and Biophysics at Yerevan State University and a Ph.D. in Physics from Yerevan Physics Institute. He may be reached at ruben_ghulghazaryan@mentor.com.

Jeff Wilson is a DFM Product Marketing Manager in the Calibre organization at Mentor Graphics in Wilsonville, OR. He has responsibility for the development of products that analyze and modify the layout to improve the robustness and quality of the design. Jeff previously worked at Motorola and SCS. He holds a BS in Design Engineering from Brigham Young University and an MBA from the University of Oregon. Jeff may be reached at jeff_wilson@mentor.com.

Reliability for the Real (New) World

September 21st, 2017

By Dina Medhat

There’s nothing more annoying than a device that doesn’t perform as expected. Nearly everyone has experienced the ultimate frustration of the “intermittent failure” problem with their laptops, or a cellphone that suddenly and inexplicably stops working. Now imagine that failure occurring in a two-ton vehicle traveling at highway speeds, or in a pacemaker implanted in someone you love. With electronics moving into virtually every facet of our lives, designers are facing unique challenges as they create (or re-engineer) designs for new high-reliability, environmentally-demanding applications like automotive and medical.

Significantly increased longevity requirements, coupled with new stresses, new circuits and topologies, increased analog content, higher voltages, and higher frequencies, make the task of ensuring performance and reliability harder than ever. The corollary to these new constraints and requirements is the need for verification technology and techniques that enable designers to find and eliminate potential electrical failure points and weaknesses.

Electrical overstress (EOS) is one of the leading causes of integrated circuit (IC) failures, regardless of where the chip is manufactured or the process used. EOS events can result in a wide spectrum of outcomes, covering varying degrees of performance degradation all the way up to catastrophic damage, where the IC is permanently non-functional. Identifying and removing EOS susceptibility from IC designs is essential to ensuring successful performance and reliability when the products reach the market.

When we discuss EOS, however, it’s important to understand that EOS is technically the result of a wide range of root cause events and conditions. EOS in its broadest definition includes electrostatic discharge (ESD) events, electromagnetic interference (EMI), latch-up (LUP) conditions, and other EOS causes. However, ESD, EMI, and LUP causes are generally differentiated, as shown in Figure 1.

Figure 1. Typical root causes of EOS events. See Reference 1.

Any device will fail when subjected to stresses beyond its designed capacity, due either to device weakness or improper use. The absolute maximum rating (AMR) defines this criterion, as follows:

  • Each user of an electronic device must have a criterion for the safe handling and application of the device
  • Each manufacturer of an electronic device must have a criterion to determine if a device failure was caused:
    • By device weakness (manufacturer  fault)
    • By improper usage (user fault)

Device robustness is represented by the typical failure threshold (FT) of a device. Because FTs are subject to the natural distribution of the manufacturing process, a product AMR is set to provide the necessary safety margin against this distribution (to avoid failures in properly- constructed devices). The safe operating area (SOA) of a device consists of parametric conditions (usually current and voltage) over which a device is expected to operate without damage or failure (Figure 2). For example:

  • Over-voltage tends to damage breakdown sites
  • Over-current tends to fuse  interconnects
  • Over-power tends to melt larger areas

Figure 2. Graphical interpretation of an AMR. The yellow line represents the number of components experiencing immediate catastrophic EOS damage. See Reference 2.

EOS events can result in a wide spectrum of outcomes. Electrically-induced physical damage (EIPD) is the term used to describe the thermal damage that may occur when an electronic device is subjected to a current or voltage that is beyond the specification limits of the device. This thermal damage is the result of the excessive heat generated during the EOS event, which in turn is a result of resistive heating in the connections within the device. The high currents experienced during an EOS event can generate very localized high temperatures, even in the normally low resistance paths. These high temperatures cause destructive damage to the materials used in the device’s construction [2].

As shown in Figure 3, EOS damage can be external (visible to the naked eye or with a low-power microscope), or internal (visible with a high-power microscope after decapsulation). External damage can include visible bulges in the mold compound, physical holes in the mold compound, burnt/discolored mold compound, or a cracked package. Internal damage manifests itself in melted or burnt metal, carbonized mold compound, signs of heat damage to metal lines, and melted or vaporized bond wires.

Figure 3. External and internal EOS damage. See Reference 3.

So, if preventing EOS conditions in your design is a good idea, just how do you do that? In the past, designers used a variety of methods to check for over-voltage conditions, relying mainly on the expertise and experience of their design team. Manual inspection is probably the most tedious and time-consuming approach, and hardly practical for today’s large, complex designs. Another conventional approach is the use of design rule checking (DRC) in combination with manually-applied marker layers. Manual marker layers are inherently susceptible to human mistakes and forgetfulness, and this approach also requires additional DRC runs, extending verification time. Lastly, there is simulation, which can take a long time to run, and is dependent on the quality of the extracted SPICE netlist, SPICE models, stress models, and input stimuli.

Voltage Propagation

Voltage propagation is an automated flow that propagates realistic voltage values to all points in the layout, eliminating the more fallible manual processes. An automated voltage propagation flow (Figure 4) generates the voltage information automatically, without requiring any changes to sign-off decks, or any manually added physical layout markers.

Figure 4. Automated voltage propagation flow.

Example

Let’s debug a typical over-voltage (EOS) condition. We’re using the Calibre® PERC™ tool for the voltage propagation, and the Calibre RVE™ results debugging environment for viewing and debugging the results. The debugging steps are illustrated in Figure 5.

(1)   The Calibre PERC run identifies a device with a 3.3V difference between propagated voltages to gate pin and source pin, which is greater than the allowed breakdown limit of 1.8V for this device type. To debug this violation, we first highlight the violating device in a schematic viewer

(2)   Next, we must understand how the gate can receive a propagated voltage of 3.3V. To do that, we initiate a trace of the gate pin using the Calibre RVE interface

(3)   The trace results provide the details of the voltage propagation paths in the voltage trace window (where “start” is the gate pin and “break” is the 3.3V net)

(4)   We can then click on specific devices/nets from the voltage trace window to highlight them in our design data in the schematic viewer.

(5)   Step 4 provides us with the information we need to analyze and resolve the voltage overload condition.

Figure 5. Calibre PERC voltage propagation interactive debugging.

Summary

Designers at both advanced and legacy nodes are facing new and expanded reliability requirements. New solutions are emerging to ensure continuing manufacturability, performance, and reliability. Automated voltage propagation supports the fast, accurate identification of reliability conditions in a design, enabling designers to analyze and correct the design early in the verification flow. Finding and eliminating often-subtle EOS susceptibilities before tapeout helps ensure that designs will satisfy the performance and reliability expectations of the market.

References

[1]         K. T. Kaschani and R. Gärtner, “The impact of electrical overstress on the design, handling and application of integrated circuits,” EOS/ESD Symposium Proceedings, Anaheim, CA, 2011, pp. 1-10. URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6045593&isnumber=6045562

[2]         Industry Council on ESD Target Levels, “White Paper 4: Understanding Electrical Overstress – EOS,” August 2016. https://www.esda.org/assets/Uploads/documents/White-Paper-4-Understanding-Electrical-Overstress.pdf

[3]         “Electrical Overstress EOS,” Cypress Semiconductor Corp. http://www.cypress.com/file/97816/download

Author:

Dina Medhat is a Technical Lead for Calibre Design Solutions at Mentor Graphics. Prior to assuming her current responsibilities, she held a variety of product and technical marketing roles in Mentor Graphics. Dina holds a BS and an MS from Ain Shames University, Cairo, Egypt. She may be contacted at dina_medhat@mentor.com.

Faster Signoff and Lower Risk with Chip Polishing

September 17th, 2017

By Bill Graupp, Mentor, a Siemens Business

Designing integrated circuits (ICs) today is a complex and high-risk endeavor; design teams are large and often scattered around the world, tool flows are complex, and time-to-market pressures omnipresent. It’s no surprise that product releases are often delayed because design teams can’t get to signoff on schedule. Schedules certainly account for the time required for full verification, as well as design optimizations like DFM fill and via enhancements, but all the delays along the way accumulate. Engineers are then pressured to compensate for those delays to stay on schedule. Typically, the final days of signoff are the worst—the deadline is looming, and each iteration between finding and fixing layout issues increases the risk of being late.

Engineers are all about increasing efficiency and reducing risk. When considering how to get to signoff faster, there are many ways to do that. You could hire more designers, but that makes coordination harder. You could increase design margins, but that reduces your product’s value. You can make sure to plan plenty of time for final verification and signoff, yet delays earlier in the flow can still impinge on that allotted block of time.

The counterintuitive solution? Add another step to the process flow—more verification performed at many levels throughout the design flow to catch and fix problems earlier. The phenomenon of putting in more thought and effort to get “less” isn’t unique to IC design. Mark Twain captured the idea when he said, “I didn’t have time to write a short letter, so I wrote a long one instead.”

IC designers already do this to find design rule checking (DRC) violations, starting in early implementation, but how about the non-DRC layout issues, like nano-jogs, space ends, mushrooms, dog bone ends, and offset vias? None of these items is necessarily a design rule error, but all of them are likely to affect manufacturability and lower yield. Fixing these issues is referred to as chip polishing, and is one of the keys to improving a product’s manufacturability. Figure 1 illustrates some typical chip polishing activities.

Fig1-chip-polishing-examples

Figure 1. Automated chip polishing modifies the layout to improve robustness of the design and yield. Modifications are inserted back into the design database.

There are software tools that automate these chip polishing tasks and can be easy to adopt and customize into any flow to reduce the risks associated with reaching signoff. A key to usability of chip polishing software is the ability for engineers to combine a focused set of commands into macros that can be peppered throughout a customized flow for engineering change order (ECO) filling, passive device insertion, custom fill to increase densities, jog removal, via enhancements, and programmable edge modification (PEM) commands to eliminate fragmented edges. If, for example, your power structures or capacitor placement rules cause system-level final verification issues, a solution can be implemented quickly and systemically across all blocks and top cells.

Categorizing issues by groups, based on the methodology needed to fix the issue, improves the efficiency of design closure. Correction of some issues requires the insertion of passive devices, while others require polygon shifts and edge movements. Some require the insertion of additional shapes for manufacturability. Each of these categories can best be handled by a custom electronic design automation (EDA) process designed to resolve that category of issue. When one process is used for each category, then all the processes can be combined into one final sign-off flow that can be customized for each design methodology, using a common programming language and database.

Many of the failures of today’s post-route sign-off flows can be solved by creating the conditions for an effective and timely solution to late-stage DRC errors and enabling engineers to insert and modify any shapes needed to achieve the final signoff. A well-designed automated sign-off flow can improve your product’s manufacturability, allow you to get to market faster, and enable you to create market differentiation.

For example, many issues that require or benefit from chip polishing arise from hierarchy conflicts, such as two lines from two cells being connected at the parent cell without the knowledge of the entire line shape or width. Other typical problematic layout features include:

  • Space Ends – Metal lines formed into a “J” due to the router passing a short adjacent track line and coming back to the far end. The connection bottom of a “J” can pinch if the loopback is too narrow.
  • T-Line Ends – Metal lines with a narrow cross “T” at the end can cause necking.
  • Mushrooms – A long metal line connected to the center of a short metal adjacent track line typically causes necking of the connection metal.
  • Nano-jogs – When two metals of slightly different widths are connected end to end, it creates breaks in long edges that cause unnecessary runtime in verification and mask generation.
  • Offset Vias – Manually-placed vias at an adjacent metal overlap that are not centered in the overlapping region create potential via coverage issues that can cause higher electrical resistance.

Chip polishing software can execute programmable edge modification (PEM) commands to correct for these issues, including polygon shifting, polygon sizing, edge-based polygon creation, feature-based edge identification (jogs, space ends, etc.), and polygon growth with spacing considerations.

By reducing the number of edges in the design through chip polishing, many chip release tasks can be improved or eliminated. It’s only logical that mask generation can optimize long edges more quickly when they do not contain jogs or notches, so it’s no surprise that final verification runtimes for large blocks and chip layout can be reduced by eliminating any edges broken into fragments due to accidental jogging. Mask generation is also faster with optimized line ends, because there are fewer edges that will require optical proximity correction (OPC). By having a faster mask flow with fewer issues to manage, the manufacturing process can be optimized for the consistency of the manufacturing models used to control the process. A more robust design will also create a more reliable product, as well as reduce yield variability over the life cycle of the product.

Getting to signoff faster, with less risk, while generating a layout that is highly manufacturable can be accomplished with automation tools with the types of analysis and fixing capabilities described here. PEM commands can improve a layout by automatically analyzing a design, then smartly removing or altering the offending edges. A well-designed automated PEM flow can improve your product’s manufacturability, allow you to get to market faster, and enable you to create market differentiation.

Author

Graupp_Bill_2015_2x2 Bill Graupp is a DFM Application Technologist for Calibre in the Design to Silicon division of Mentor, a Siemens Business. He is responsible for product marketing and customer support for the DFM product line, focused on layout enhancement and fill. Bill received his BSEE from Drexel University, and an MBA from Portland State University. After hours, he currently serves as the mayor of Aurora, Oregon, and as a director on his local school board.

Latch-Up Detection: How to Find an Invisible Fault

July 14th, 2017

By Matthew Hogan

Way back when, in the olden days (which, in the semiconductor industry, usually means last week), designers used visual inspections and manual calculations to check their layouts. The scale and complexity of today’s designs mean that everything’s changed now. Design margins have been driven to near-extinction by the market demand for lower power, higher reliability electronics. It doesn’t really matter whether you’re implementing a new design start at your current process node, migrating to your “next” node, or adding new functionality to a well-trusted design, meeting those time-sensitive tapeout schedules and tight time-to-market windows means you need more than a good eye and a quick hand on the calculator.

Latch-Up

One of the biggest challenges for verification engineers today is identifying and eliminating unintentional failure mechanisms formed by inadvertent combinations of geometry and circuitry, known as latch-up (LUP). LUP is a design phenomenon that often leads to chip failure through the unplanned creation of parasitic PNP and NPN junctions that are then driven (turned on/forward-biased). Typically, an unintended thyristor or silicon-controlled rectifier (SCR) forms, and is then triggered to generate a low-resistance parasitic path. LUP usually occurs as a temporary condition that is often eliminated by power cycling, but when it strikes, it can cause permanent damage that impacts chip performance or contributes to fatal chip failure. Increases in design complexity, larger pin counts, and multiple power domains all contribute to the difficulty of finding these LUP configurations, as do the moving targets of what process node and foundry will host your next design.

LUP “injectors” fall into two primary categories [1]:

  • Externally connected diffusion devices that are
    • Directly connected to an I/O pad, or
    • Connected to an I/O pad through a high current conducting path (small resistors, large switches, etc.)
    • Diffusion devices formed in grounded Nwell or “hot” Pwell

Typical latch-up prevention techniques include [1]:

  • Surrounding devices that can form a latching path with guard rings
  • Surrounding devices that can form a latching path with well or substrate ties
  • Keeping p and n diffusions far apart from each other

Figure 1 shows how lateral separation can be used to protect against the formation of latch-up parasitic elements.

Figure 1: Silicon-controlled rectifier (SCR) cross-section showing parasitic coupling between diffusions connected to VDD and VSS (See References - 2). Inserting sufficient distance (D) between these parasitic elements protects against LUP formation.

What CMOS Technology Are You Using: Bulk, FD-SOI, Or Both?

While much of the literature on LUP prevention assumes that the implementation technology impacted by LUP is entirely bulk CMOS, and that fully-depleted silicon-on-insulator (FD-SOI) is immune, there are hybrid technologies that leverage characteristics of both FD-SOI and bulk CMOS. One such technology is the ultra-thin body and box (UTBB) FD-SOI process used by STMicroelectronics [3]. UTBB leverages the benefits of a FD-SOI process for the design logic, while taking advantage of a “hybrid” bulk CMOS for electrostatic discharge (ESD) and IO devices (Figure 2).

Figure 2: The UTBB FD-SOI process leverages characteristics of both FD-SOI and bulk CMOS (© STMicroelectronics. Used with permission).

For ESD protection, the ESD device in thin silicon film is two times less robust than the bulk CMOS device (due to the smaller thickness of the Si film for power dissipation). Leveraging an open box structure to access hybrid bulk CMOS configurations to build ESD power devices provides benefits for device robustness. In doing so, however, designers must consider possible sources of susceptibility to LUP in areas of the design with hybrid bulk CMOS IO devices and ESD structures.

Why is LUP So Hard to Detect?

When you’re trying to eliminate LUP in a layout, it’s essential to be able to recognize the unintentional devices within your design, and understand how the layout impacts critical distances of specific LUP-susceptible structures. For example, to adjust the layout to prevent LUP, designers must identify the unfavorable conditions that lead to unintended parasitic devices formation in the PNP or NPN junctions as current is injected. Many generations of geometric design rule checks (DRC) have been created to help with LUP detection and prevention. However, DRC lacks one critical component—context-awareness.

While the distances and physical layout within the design are essential in LUP detection, designers must also be knowledgeable about the voltages used in the circuitry. Historically, designers manually added marker layers (either as text or polygons) to the layout with the expected voltage value. However, if the designer doesn’t add the correct marker, or forgets to add any marker, those mistakes can lead to substandard routing optimizations, false errors, or missed errors that result in device failure over time.

In addition, modern SoC designs often contain many voltage domains and voltage differentials, so designers can no longer apply just one spacing rule per metal layer. Moving to more complex designs and advanced process nodes greatly increases both the complexity of voltage-dependent spacings and the challenge of defining voltages in a layout. Voltage-aware spacing rules require different spacings based on either the operating voltage on the geometries being checked, or the difference in voltages between different geometries (wires/devices) that are next to each other.

Just as with voltage-aware DRC [4,5], accurate LUP checks require both spatial and voltage knowledge [1], because voltages have a significant impact on the applicable spacing rules. The relationship between the holding voltage and emitter-to-emitter isolation and guard ring strategy, combined with a through-context-sensitive construction and application of LUP design rules, enables designers to achieve area savings in mixed-voltage designs where high and low supply voltages intermingle [2]. The distance necessary to separate the interaction of these voltages can greatly influence the location of susceptible regions in the design, as well as the location and degree of change necessary to avoid this susceptibility (Figure 3).

Figure 3: Separation (distance) between p and n emitters weakens parasitic bipolars by increasing their base width (See References - 1).

If the layout has just a few voltages, a single simple spacing rule may be all that is required, but the complexity of the required protection increases as more power domains are included. How these domains switch, with different parts of the design being active at different times, adds to this complexity. The ability to leverage the power intent of your design, particularly through descriptions created using the Unified Power Format (UPF), enables a state-driven approach to determine what voltages are present in any given state.

Finding and Eliminating LUP-Susceptible Design Regions

Integrated circuit (IC) reliability verification, including LUP detection, has long relied on a plethora of home-brewed scripts and utilities constructed with traditional electronic design automation (EDA) tools designed for design rule checking (DRC), layout vs. schematic (LVS) comparison, and electrical rule checking (ERC). Historically, there were no foundry reliability rule decks or reliability verification tools to provide a central focus on, or an automated process for, implementing reliability checks. Traditional LUP geometrical rule checks using DRC tools only provide limited detection and verification capabilities.

Automated LUP Detection

In the last few years, collaboration between EDA companies and the world’s leading IC design houses and foundries resulted in the creation and availability of reliability-focused rule decks that can consider design intent. While DRC, LVS, and design for manufacturing (DFM) decks have been established deliverables for years, these new reliability decks enabled the development of qualified automated reliability verification solutions that help designers specifically address more complex reliability design issues like LUP accurately and efficiently. Automated and context-aware LUP checking flags violations that would be missed using traditional DRC alone, such as indirectly-connected current injectors, resistive guard rings, and the like.

As outlined by Anirudh Oberoi, et al. [2], an advanced latch-up verification flow looks similar to that shown in Figure 4.

Figure 4: Advanced LUP verification flow. (See References - 2)

This flow includes the following steps [2]:

  1. Identify all external nodes.
  2. Check externally-connected diffusions to identify possible latching paths.
  3. Establish LUP electrical context by propagating voltage down to the diffusions to assess LUP risk.
  4. If diffusions are protected with guard rings, validate them for their efficiency to collect injected carriers. This step involves both guard ring continuity and resistance checks.
  5. Establish full LUP layout context for the path at risk. Checks at this step include verification of diffusion and well spacing, tie frequency rules, etc.
  6. Based on the information collected in steps 1 through 5, use an electrical design automation checker to perform an analysis, and either validate the layout or report a LUP error.

As with any other automated verification solution, getting quick and accurate insight into problematic areas of the design that affect reliability earlier in the design process is extremely beneficial, reducing the extensive re-work and re-spins that destroy schedules and eat into profits when errors are discovered late in the flow. Many reliability rule decks have options to facilitate running reliability checks not only at the full-chip level, but also at the intellectual property (IP) block level. Using these capabilities in an incremental approach helps provide context for problematic areas, particularly for IPs that are being used in a different context from previous implementations, or whose geometries have been shrunk to accommodate a new process node.

Of course, reliability rule decks are only useful if there are EDA tools to implement reliability checks in a timely and accurate process. The Calibre® PERC™ reliability verification platform is one example of the new breed of reliability analysis tools that provide automated LUP analysis and detection. The Calibre PERC platform performs advanced net analysis in conjunction with layout topology awareness. This unique ability to consider both netlist and layout (GDS) information simultaneously enables the tool to perform complex electrical checks that require both layout-related parameters and circuitry-dependent checks, such as voltage-aware net checking. With this functionality, the Calibre PERC platform can detect net connectivity through current conduction devices, enabling it to identify LUP risks that would be missed with traditional DRC.

The Calibre PERC automated flow can propagate realistic voltage values to all points in the layout, eliminating the fallible manual process. It first identifies the supply voltages for the design, and then uses a voltage propagation algorithm to determine the voltages on internal layout nodes. The voltages are computed automatically based on static propagation rules, which can be user-defined for specific device types, or brought in from external simulation results. The algorithm is applied to the netlist to identify target nets and devices. Maintaining netlist information throughout the entire flow results in context-specific knowledge, improving the quality of the check, as well as providing enhanced debug opportunities. This integration between netlist, connectivity-based voltage analysis, and geometric analysis is what enables a comprehensive solution for both LUP and voltage-aware rules.

In addition, the Calibre PERC platform:

  • Tailors LUP checks to specific voltages used in the design to enable layout area optimization, rather than employing conservative (worst case voltage) rules.
  • Dynamically generates accurate markers internally, minimizing the number of manual marker layers required while also improving their accuracy.
  • Provides detailed debugging information, such as net by layer output, net path, etc., in addition to standard DRC output.

Figure 5 illustrates the type of complex voltage-aware checks that can be validated using the Calibre PERC reliability platform, without the need for complex marker layers. In this example, spacing to/from each block is different. These context-aware checks are necessary for implementing competitive design optimizations and realizing the space savings in today’s advanced SoCs without compromising reliability.

Figure 5: Voltage-dependent spacing errors can be accurately and automatically detected by the Calibre PERC platform, regardless of the number of power domains. (See References - 1)

LUP Guard Ring Identification

While the automated identification capabilities of the Calibre PERC platform can identify complex layout structures, there may be times when a guard ring marker will improve the quality of results. Using a guard ring marker layer enables both DRC and Calibre PERC processes to identify intended LUP guard rings (Figure 6).

Figure 6: Guard rings are identified with a guard ring marker layer. (See References - 1)

Once identified, guard ring efficiency requirements can be checked [1]:

  • Proper bias (N guard ring tied to highest potential, P guard ring tied to lowest  potential)
  • Low-resistance connection of guard ring to supply (VDD, VSS)
  • Minimum  contact density
  • Minimum width
  • Exclusivity (guard ring active area does not contain other devices that could interfere with carrier collection)

The identification of intentional rings with markers is of particular importance if there are bipolar junction transistor (BJT)-like structures in your design, which may look very much like a guard ring.

CONCLUSION

In the complex designs being implemented today across a wide variety of nodes, LUP has emerged as a critical issue affecting design reliability and lifecycle. Traditional DRC lacks the fidelity and context to fully identify LUP-susceptible regions in these dense, detailed designs. Learning and applying the latest reliability analysis techniques to solve these often intricate verification requirements for LUP detection, while also developing process improvements to avoid susceptible configurations in future designs, is critical from a best practices perspective and reliability perspective.

Having LUP checks available and implemented in your foundry’s reliability rule deck is a significant benefit during the verification process, and can provide market advantage in both time-to-market and product lifecycle performance. To assist designers looking to integrate this technology into their design and verification flows, the ESD Association (ESDA) has extended its educational offerings in the area of latch-up detection to include these types of complex verification. The ESDA Tutorial DD382: Electronic Design Automation (EDA) Solutions for Latch-up [6] reviews a typical latch-up prevention flow, and delves into details necessary for improvement.

Automated reliability analysis and verification tools help designers quickly and accurately implement and execute reliability checks, including LUP detection, across a broad range of designs. These tools ensure that designers can find and eliminate design issues that affect product reliability, performance, and expected lifecycle.

The continued evolution of your organization’s reliability verification checks and best practices, along with the evaluation and adoption of best practices from the industry as a whole, should not only be an aspiration, but a measurable goal to keep your design flows current. Incorporating new learnings into existing flows helps improve both their robustness and relevance for today’s complex designs, and leverages efficiencies learned for the development of new solutions. LUP, like many design flow challenges, provides significant opportunities for process improvement and flow automation in the ongoing effort to implement robust and repeatable verification solutions.

References

[1]                      Michael Khazinsky, “Latch-up Verification/Rule Checking Throughout Circuit Design Flow,” Mentor Graphics User2User Conference, April, 2016. https://supportnet.mentor.com/files/u2u/2016 Mentor U2U – Latch-up_Verification_ Throughout_Design_Flow_v02.pdf

[2]          A. Oberoi, M. Khazhinsky, J. Smith and B. Moore, “Latch-up characterization and checking of a 55 nm CMOS mixed voltage design,” Electrical Overstress/Electrostatic Discharge Symposium Proceedings 2012, Tucson, AZ, 2012, pp. 1-10. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6333300

[3]          Philippe Galy, et al. “ESD design challenges in 28nm hybrid FDSOI/Bulk advanced CMOS process,” 2014 International Electrostatic Discharge Workshop. https://www.researchgate.net/publication/261160656_ESD_design_challenges_in_28nm_hybrid_FDSOIBulk_advanced_CMOS_process

[4]          Dina Medhat, “Automated Solution for Voltage-Aware DRC,” EETimes SoC DesignLines, Dec. 23, 2015. http://www.eetimes.com/author.asp?section_id=36&doc_id=1328540

[5]          Matthew Hogan, et al., “Using Static Voltage Analysis and Voltage-Aware DRC to Identify EOS and Oxide Breakdown Reliability Issues,” EOS/ESD Association Symposium, 2013 http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6635948&tag=1

[6]          “DD382: Electronic Design Automation (EDA) Solutions for Latch-up,” EOS/ESD Association Tutorials. https://www.esda.org/index.php/training-and-education/esda-tutorials/

How Critical Area Analysis Optimizes Memory Redundancy Design

March 8th, 2017

By Simon Favre, Mentor Graphics

Introduction

As any design engineer knows, the farther downstream a design goes, the less likely a manufacturing problem can be corrected without a costly and time-consuming redesign. And it doesn’t matter if you are a fabless, fab-lite, or independent device manufacturer (IDM) company—reducing a design’s sensitivity to manufacturing issues should ideally be handled by the design teams. By identifying and resolving design for manufacturing (DFM) problems while the design is still in its early stages, many manufacturing ramp-up issues can be avoided altogether.

For example, embedded memories often cover 40-60% of the chip area in a large system-on-chip (SoC) design. The densely packed structures in memory cores make them very susceptible to random defects, so redundant elements are often added to embedded memories to improve final yields. However, if redundancy is applied where it has no benefit, then die area and test time are wasted, which actually increases manufacturing cost. Unnecessary redundancy can be a crucial and costly mistake. Using critical area analysis (CAA) to perform a detailed analysis of your design redundancy can accurately quantify the yield improvement that can be achieved, while minimizing impact on chip area and test.

Critical Area Analysis

The basic CAA process calculates values for the average number of faults (ANF) and yield based on the probability of random defects that introduce an extra pattern (short) or missing pattern (open) into a layout, causing functional failures (Figure 1).

Figure 1. Definition of critical area based on extra pattern (short) and missing pattern (open).

In addition to classic shorts and opens calculations, CAA techniques also analyze potential via and contact failures. In fact, once CAA is applied, via and contact failures often prove to be the dominant failure mechanisms (Figure 2). Other failure mechanisms can also be incorporated into CAA, depending on the defect data provided by the foundry.

Figure 2. Pareto of ANF values for defect types in a large SOC. The dominant defect type in this analysis is contact to diffusion.

As shown in Figure 3, critical area increases with increasing defect size. In theory, the entire area of the chip could be a critical area for a large enough defect size. In practice, most foundries limit the range of defect sizes that can be simulated, based on the range of defect sizes they can detect and measure with test chips or metrology equipment.

Figure 3. Critical Area CA(x) in square microns as a function of defect size in nanometers for one defect type.

Defect Densities

Semiconductor foundries have various proprietary methods for collecting defect density data associated with their manufacturing processes. To be used for a CAA process, this defect density data is converted into a form compatible with the CAA tool. The most common format is a simple power equation, as shown in equation (1). In this equation, k is a constant derived from the density data, x is the defect size, and the exponent q is called the fall power. The foundry curve-fits the opens and shorts defect data for each layer to an equation of this form to support CAA. In general, a defect density must be available for every layer and defect type for which critical area will be extracted. However, in practice, layers that have the same process steps, layer thickness, and design rules typically use the same defect density values.

(1)D(x)=k/xq

Defect density data may also be used in table form, where each specific defect size listed has a density value. One simplifying assumption typically used is that the defect density is assumed to be 0 outside the range of defect sizes for which the fab has data.

Calculation of ANF

Once the critical area CA(x) is extracted for each layer over the range of defect sizes, the defect density data D(x) is used to calculate ANF according to equation (2), using numerical integration. The dmin and dmax limits are the minimum and maximum defect sizes according to the defect data available for that layer.

(2)ANF=∫_dmaxdmin CA(x)∙D(x) dx

In most cases, the individual ANF values can simply be added to arrive at a total ANF for all layers and defect types. Designers take note: ANF is not strictly a probability of failure, as ANF is not constrained to be less than or equal to 1.

Calculation of Yield

Once the ANF is calculated, one or more yield models are applied to make a prediction of the defect-limited yield (DLY) of a design. One of the simplest, most widely-used yield models is the Poisson distribution, shown in equation (3). Of course, DLY cannot account for parametric yield issues, so care must be taken when attempting to correlate these results to actual die yields.

(3) Y = e-ANF

ANF and Yield for Cut Layers

Calculation of ANF and yield for cut layers (contacts and vias) is generally simpler than for other layers. In fact, most foundries define a probabilistic failure rate for all single vias in the design, and assume that via arrays do not fail. While this simplifying assumption neglects the problem that a large enough particle will cause multiple failures, it greatly simplifies the calculation of ANF, in addition to reducing the amount of data needed from the foundry. All that is required is a sum of all the single cuts on a given layer, and the ANF is then simply calculated as the product of the count and the failure rate, shown in equation (4).

ANF(via)=singleViaCount∙viaFailureRate

Once the ANF(via) is calculated, it can be added to the ANF values for all the other defect types, and used in the yield equation (3). Vias between metal layers may all use one failure rate, or use separate rates based on the design rules for each via layer. The contact layer can be separated into contacts to diffusion (N+ and P+ separately, or together), and contacts to poly, each with separate failure rates.

Memory Redundancy

As stated earlier, embedded memories can account for significant yield loss due to random defects. Typically, SRAM intellectual property (IP) providers make redundancy a design option, with the most common form of redundancy being redundant rows and columns. Redundant columns tend to be easier to apply, as the address decoding is not affected, only the muxing of bitlines and IO ports.

Memory Failure Modes

Every physical structure in a memory block is potentially subject to failures caused by random defects, classified according to the structures affected. The most common classifications are single-bit failures, row and column failures, and peripheral failures (which can be further subdivided into I/O, sense amplifier, address decoder, and logic failures). In terms of repair using memory redundancy, our primary interest is in single-bit row and column (SBRC) failures occurring in the core of the memory array.

To analyze SBRC failures with CAA, designers must define which layers and defect types are associated with which memory failure modes. By examining the layout of a typical 6-T or 8-T SRAM bit cell, some simple associations can be made. For example, by looking at the connections of the word lines and bit lines to the bit cell, we can associate poly and contact to poly on row lines with row failures, and associate diffusion and contact to diffusion on column lines with column failures. Because contacts to poly and contacts to diffusion both connect to Metal1, the Metal1 layer must be shared between row and column failures. Obviously, most of the layers in the memory design are used in multiple places, so not all defects on these layers will cause failures that can be repaired. There are also non-repairable fatal defects, such as shorts between power and ground. Given that a single-bit failure can be repaired with either row or column redundancy, we’ll ignore these differences for now.

Repair Resources

Embedded SRAM designs typically make use of either built-in self-repair (BISR) or fuse structures that allow designers to mux out the failed structures and replace them with redundant structures. BISR has greater complexity, with greater impact on die area. Muxing with fuses requires that the die be tested, typically at wafer sort, and the associated fuses blown to accomplish the repair. The fuse approach has the advantage of simplicity and reduced area impact, although at the expense of additional test time. Regardless of the repair method, placing redundant structures in the design adds area, which directly increases the cost of manufacturing the design. Additional test time also increases cost, and designers may not have a good basis for calculating that cost. The goal of analyzing memory redundancy with CAA is to ensure that DLY is maximized, while minimizing the impact on die area and test time.

Specification of Repair Resources

For a CAA tool to accurately analyze memory redundancy, it requires a specification of the repair resources available in each memory block. This specification must also include a breakdown of the failure modes by layer and defect type, and their associated repair resource. The layer and defect type together are typically called a CAA rule. Each rule with an associated repair resource must be in a list of all rules associated with that repair resource. Since some rules will be associated with both row and column failures, some means of specifying rule sharing is needed.

For each memory block, the count of total and redundant rows and columns is required. To specifically identify the areas of the memory that can be repaired, the designer must either specify the bitcell name used in each memory block, or use a marker layer in the layout database. This identification allows the CAA tool to identify the core areas of the memory.

Figure 4 shows a typical memory redundancy specification. The first line lists the CAA rules that have redundant resources for a particular family of memory blocks. The two lists are column rules, followed by row rules. The two lines at the bottom show SRAM block specifications and specify (in order) the block name, the rule configuration name, the total columns, redundant columns, total rows, redundant rows, dummy columns, dummy rows, and the name of the bitcell. In this example, both block specifications refer to the same rule configuration. Given these parameters, and the unrepaired yield calculated by the CAA tool, it is possible to calculate the repaired yield.

Figure 4. Memory configuration specification showing layers and defect types with redundant resources.

Yield Calculation with Redundancy

Once the CAA tool performs the initial analysis, it can calculate the yield with redundancy. The initial analysis must include the ANF(core) of the total core area of each memory block listed in the redundancy configuration file. Since the calculation method is the same, each row or column in a memory core can simply be referred to as a “unit,” and the calculation method only needs to be described once. If present, dummy units do not cause functional failures, and do not need to be repaired (in the initial analysis, dummy units do contribute to the total ANF, as the CAA tool has no knowledge of whether or not they are functional).

Calculation Method

The calculation method is based on the well-known principle of Bernoulli Trials. The goal is to get the required number of good units out of some total number of units. First, the tool calculates the number of active units in the core, as shown in equation (5).

(5) NA=NT-NR-ND

Where NA is the required number of active units, NT is the total units, NR is the redundant units, and ND is the dummy units. In equation (6), the tool derives the number of functional non-dummy units.

(6) NF=NT-ND

Next, it calculates the unit ANF in equation (7).

(7) ANF(unit)=ANF(core)/NT

To be consistent with probability theory, the tool converts ANF(unit) back to a yield, using the Poisson equation in equation (8). This value becomes the p term in the Bernoulli equation, which denotes probability of success. The probability of failure, q, is defined in equation (9).

(8) p=Y(unit)=e-ANF(unit)

(9) q=1-p

Now the tool must add together the probabilities of all cases that satisfy the requirement of getting at least NA good units out of NF available units. The result, calculated in equation (10), is the repaired yield for that memory core for that specific rule. This process is repeated over all rules in the memory configuration specification, and all memory blocks listed with redundancy.

(10) YR=∑k=0k=NR C(NF,(NF-k))∙p(N_F-k)∙qk

Note that the case where k=0 is necessary to account for the possibility that all units are good. The term C(NF,(NF-k)) is the binomial coefficient, which evaluates to 1 if k=0. For any memory core or rule where no repair resources exist, the calculation in equation (10) is skipped, and the result is simply the original unrepaired yield.

Calculating the effective yield for memory blocks with no redundancy is still valuable if the CAA tool has the capability of post-processing the calculations with a different memory redundancy specification. This enables a “what-if” analysis, which can be crucial for determining whether or not applying redundancy adds more value than the inherent cost of adding it to the design. If the what-if analysis can be done without repeating the full CAA run, then iterating on a few memory redundancy configurations to find the optimum is quite reasonable. In addition, if the tool reports the intermediate calculations for each term in the Bernoulli Trials, the point of diminishing returns can easily be identified. This prevents costly overdesign of the memories with redundancy.

Limitations

The technique presented has some limitations, but can still be applied with relative ease to determine optimal redundancy parameters. The obvious limitations are:

  • The test program must be able to distinguish the case where a failure on a redundant unit has occurred, but all the active units are good. This case requires no repair.
  • There is no accounting for fatal defects that cannot be repaired, such as power to ground shorts.
  • The redundancy calculation is applied only to the core bitcells, but redundant columns, for example, may include the sense amp and IO registers.
  • The CAA rules apply to specified layers and defect types anywhere within the memory core, not to specific structures in the layout. If a method existed for tagging specific structures in the layout and associating them with failure modes or rules, the calculation would be more accurate.
  • Algorithmic repair, such as data error correction, is beyond the scope of CAA analysis.

Conclusion

Memory redundancy is a design technique intended to reduce manufacturing cost by improving die yield. If no redundancy is applied, then alternative methods to improve die yield may include making the design smaller, or reducing defect rates. If redundancy is applied where it has no benefit, then die area and test time are wasted, which actually increases manufacturing cost. In between these two extremes, redundancy may or may not be applied depending on very broad guidelines. If defect rates are high, more redundancy may be needed. If defect rates are low, redundancy may be unnecessary. Analysis of memory redundancy using CAA and accurate foundry defect statistics is a valuable process that helps quantify the yield improvement that can be achieved, and determine the optimal configuration.

References

[1]   Stapper, C.H. “LSI Yield Modeling and Process Monitoring,” in IBM Journal of Research and Development, Vol. 44, p. 112, 2000. Originally published May 1976. http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5391123

[2]   Stapper, C.H. “Improved Yield Models for Fault-Tolerant Memory Chips,” in IEEE Transactions on Computers, vol. 42, no. 7, pp. 872-881, Jul 1993.
doi: 10.1109/12.237727
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=237727&isnumber=6095

Author

Simon Favre is a Technical Marketing Engineer in the Design to Silicon division at Mentor Graphics, supporting and directing improvements to the Calibre YieldAnalyzer and CMPAnalyzer products. Prior to joining Mentor Graphics, Simon worked with foundries, IDMs, and fabless semiconductor companies in the fields of library development, custom design, yield engineering, and process development. He has extensive technical knowledge in DFM, processing, custom design, ASIC design, and EDA. Simon holds BS and MS degrees from U.C. Berkeley in EECS. He can be reached at simon_favre@mentor.com.

Context-Aware Latch-up Checking

September 28th, 2016

By Matthew Hogan, Product Marketing Manager, Calibre Design Solutions, Mentor Graphics

Latch-up in CMOS circuits is a long-studied and troubling phenomenon that often leads to chip failure through the inadvertent creation of parasitic PNP and NPN junctions being driven (turned on/forward-biased). Typically, an unintended thyristor or silicon-controlled rectifier (SCR) is formed and then triggered to generate a low-resistance parasitic path. Latch-up presents itself as a temporary condition that may be resolved by power cycling, but it may also cause fatal chip failure or permanent damage.

Recognizing unintentional failure mechanisms present in an integrated circuit (IC) is a constant and often difficult task for design teams. Increases in design complexity, larger pin counts, more power domains, and the ever-changing landscape of what process node and which foundry will host your next design all contribute to the challenge. Additionally, many of the geometric design rule checks (DRC) traditionally employed for latch-up detection lack the context awareness that modern reliability verification tools can provide. However, getting it right, particularly when you are trying to find and eliminate latch-up in your designs, is of critical importance.

Although no design team enjoys further complicating design and verification flows by adding additional checks, a fully automated latch-up rule check is highly desirable, particularly when multiple power domains are involved. Just as voltage-aware DRC checking [1][2] has provided a significant improvement in the accuracy and control of interconnect spacing for reliability and the avoidance of time-dependent dielectric breakdown (TDDB), context-aware latch-up verification offers similar advantages and opportunities to automate these challenging design interactions.

When considering the impact of latch-up on a layout, understanding both the unintentional devices within your design and how the layout impacts critical distances of specific latch-up susceptible structures is critical. For example, to be able to adjust the layout to prevent latch-up, designers must recognize where unfavorable conditions may lead to unintended parasitic devices formation in the PNP or NPN junctions as current is injected. Figure 1 shows how lateral separation can be used to protect against latch-up formation.

.”]

Figure 1 - Latch-up prevention with lateral separation [3

Impact of voltages and devices

While understanding the distances and physical layout within the design is essential, consideration must also be given to the voltages being used. As with voltage-aware DRC, the voltages being analyzed for potential latch-up conditions have a significant impact on the spacing rules that must be applied. The interaction of these voltages can greatly influence the location of susceptible regions in the design, as well as the location and degree of change necessary to avoid this susceptibility (Figure 2).

.”]

Figure 2 - Accurate latch-up checks require voltage awareness [3

While a single simple spacing rule may be all that is required with just a few voltages, the complexity of the protection needed increases as more power domains are included. How these domains switch, with different parts of the design being active at different times, adds to this complexity. The ability to leverage the power intent of your design, particularly through descriptions created using the Unified Power Format (UPF), enables a state-driven approach to determine what voltages are present in any given state.

What CMOS technology are you using: Bulk, FD-SOI or Both?

While much of the literature on latch-up assumes that the implementation technology impacted by latch-up is entirely bulk CMOS, and that fully-depleted silicon-on-insulator (FD-SOI) is immune, there are hybrid technologies that leverage characteristics of both FD-SOI and bulk CMOS. One such technology that comes to mind is the ultra-thin body and box (UTBB) FD-SOI process used by ST Microelectronics [4]. UTBB leverages the benefits of a FD-SOI process for the design logic, while taking advantage of a “hybrid” bulk CMOS for electrostatic discharge (ESD) and IO devices. For ESD protection, the ESD device in thin silicon film is two times less robust than the bulk CMOS device (due to the smaller thickness of the Si film for power dissipation). Leveraging an open box structure to access hybrid bulk CMOS configurations to build ESD power devices provides benefits for device robustness. In doing so, however, verification needs to consider possible sources of susceptibility to latch-up in areas of the design with hybrid bulk CMOS IO devices and ESD structures.

Conclusion

While traditional DRC has contributed to a valuable verification methodology for latch-up, it lacks the fidelity and context to fully identify the latch-up susceptible regions in your design. Learning and applying the latest reliability analysis techniques to solve these often intricate and complex verification requirements for latch-up detection, while also developing process improvements to avoid susceptible configurations in future designs, is critical from a best practices perspective.

To assist designers looking to integrate this technology into their design and verification flows, the ESD Association (full disclosure: I am a volunteer and serve on the Board of Directors) has extended its educational offerings in the area of latch-up detection to include these types of complex verification. A new course, DD382: Electronic Design Automation (EDA) Solutions for Latch-up [5], reviews a typical latch-up prevention flow, and delves into details necessary for improvement.

The continued evolution of your organization’s reliability verification checks and best practices, along with the evaluation and adoption of best practices from the industry as a whole, should not only be an aspiration, but a measurable goal to keep your design flows current. Incorporating new learnings into existing flows helps improve both their robustness and relevance for today’s complex designs, and leverages efficiencies learned in the development of new solutions. Latch-up, like many design flow challenges, provides significant opportunities for process improvement and flow automation in the ongoing effort to implement robust and repeatable verification solutions.

Further Reading:

How to Check for ESD Protection Using Calibre PERC High Level Checks

References

[1] Medhat, Dina. “Automated Solution for Voltage-Aware DRC,” EETimes SOC DesignLIne, December 23, 2015.

[2] Hogan, Matthew, et al. “Using Static Voltage Analysis and Voltage-Aware DRC to Identify EOS and Oxide Breakdown Reliability Issues.” EOS/ESD Association Symposium, 2013.

[3] Khazinsky, Michael. “Latch-up Verification / Rule Checking Throughout Circuit Design Flow.” Mentor Graphics User2User, 2016.

[4] Galy, Philippe. “ESD challenges for FDSOI UTBB advanced CMOS technologies.” International Electrostatic Discharge Workshop, 2014.

[5] EOS/ESD Association Symposium Tutorials, EOS/ESD Association Symposium, 2016.

Author

Matthew Hogan is a Product Marketing Manager for Calibre Design Solutions at Mentor Graphics, with over 15 years of design and field experience. He is actively working with customers who have an interest in Calibre PERC. Matthew is an active member of the ESD Association—involved with the EDA working group, the Symposium technical program committee, and the IEW management committee. Matthew is also a Senior Member of IEEE, and a member of ACM. He holds a B. Eng. from the Royal Melbourne Institute of Technology, and an MBA from Marylhurst University. Matthew can be reached at matthew_hogan@mentor.com.

Established Technology Nodes: The Most Popular Kid at the Dance

August 24th, 2016

By Michael White, Mentor Graphics

I remember back in the day at high school dances, always wanting to dance with the most popular girl in school. I never could, because there were a constant stream of others queued up to dance with her. If you are trying to build an integrated circuit (IC) today, and trying to get fab capacity at 28nm and above, you are faced with the very same situation. Lots of suitors jockeying for access. There are two interesting points to be explored here: 1) why are these nodes experiencing such a long life, and 2) how is this long life driving new challenges for designers?

Why Established Nodes Are Experiencing an Unexpectedly Long Life

The Internet of Things (IoT) means many things to many people, but the segment of IoT related to sensors and connectivity is the answer to the longevity question. The functionality we crave, such as smart power management for longer battery life, and Wi-Fi and Bluetooth for more connectivity, are more cost-effective when implemented at established nodes between 40 nm and 180 nm. Consequently, the high consumer demand for these capabilities is driving increased demand for ICs manufactured using these processes. In a nutshell, the nodes that best support radio frequency (RF) and mixed-signal IC designs with low power, low cost and high reliability are seeing a much higher demand than in the past.

The other dynamic driving a longer than expected life of established nodes—40/45 nm and 32/28 nm in particular—is the wafer cost trend at 20 nm and below. 20 nm and below are well-suited for advanced CPUs, application processors, etc., but from a price/performance perspective, they are generally a poor fit for sensors, connectivity, analog mixed-signal (AMS) applications, etc.

Although you wouldn’t necessarily know it from reading press releases each week, designs at 65 nm and larger still account for approximately 43% of all wafer production and 48% of wafer fab capacity. Even more significant, nodes 65nm and larger account for approximately 85% of all design starts (Figure 1). Clearly, established nodes are not fading away any time soon.

Figure 1. Production data shows established nodes still comprise a significant portion of the IC market. (source: VLSI Research)

Today’s Established Nodes Have Evolved to Meet Market Requirements

Designs at these established nodes are certainly not static. Today’s established node designs are vastly different from the original designs developed when these nodes were new (Figure 2).

Figure 2. Design complexity at established nodes is increasing, measured here at 65 nm by the number and type of IP blocks in typical designs (Source: Semico Inc.).

Historically, when a node was brought on line, it was optimized for Bulk CMOS digital logic. That is, the process design rules, supported device types, voltages, etc., were all tuned for this application. Today, established process lines such as 65nm are being “retooled” for an assortment of product types (Figure 3). It’s common to see mixed-signal IC designs (e.g., Wi-Fi, Bluetooth, etc.) using process and design rules that never envisioned such products. They require more power, meaning more rails, domains and islands. They contain more analog and mixed-signal components, as well as high-speed interface solutions like silicon photonics. They require a variety of advanced design rule checks (DRC), and “smart” filling routines designed to maximize the use of fill. They often include large intellectual property (IP) blocks, either developed internally or purchased from third-party suppliers. They often have far more reliability constraints, due to new market requirements and standards. And lastly, they are more and more frequently incorporated into a 3D or 2.5D package. All of those changes impact the physical verification strategy and techniques for these designs.

Figure 3. Consumer electronics is one market that powers the relentless drive toward more functionality and sophistication.

Why is this important to you? As a reader of this periodical, you probably work within the IC ecosystem, developing these types of products using an “advanced” mature node. Of course, time to market for a new Wi-Fi or Bluetooth chip built on an advanced mature node is just as important as a 16/14 nm application processor for a next-generation smart phone. And because the design you are building is far more complex than the first designs built on the target process, it may be that your team is struggling, because the EDA tools you used when that process was introduced 5 or 10 years ago cannot handle the new requirements and complexity. Fortunately, electronic design automation (EDA) tools built for later nodes with additional capabilities can be easily redeployed for advanced mature nodes to improve design team productivity and the quality of your designs.

Some of the capabilities commonly employed at advanced mature nodes include:

- Circuit reliability

  • Reliability checking to identify design flaws associated with electrostatic discharge protection, electrical overstress, electromigration and others in single- or multi-voltage-domain designs
  • Ability to handle voltage-dependent design rules, that is, spacing rules that depend on the voltage potential between devices and wires
  • Ability to check for accurate device symmetry in sensitive analog circuits and other reliability-related analog/mixed-signal issues
  • Ability to check for reliability conditions that are unique to a particular design methodology

- Pattern matching functionality to identify specific shapes and configurations.

  • Ability to define and locate patterns of interest that can affect performance or detract from yield
  • Specialty device checking
  • Multi-layer structure definition
  • SRAM cell, cell interactions, and interface checking
  • Ability to detect IP manipulation

- Automated DRC waiver management

  • Elimination of time spent debugging waived errors
  • Ensure ISO standard compliance for consistent behavior and traceability

- Equation-based design rules, which allow designers to define rules as complex mathematical functions, greatly simplifying rule definition while increasing accuracy.

  • Precise tolerance determination on multi-dimensions (such as multi-faceted polygons)
  • Accurate antenna checking and property transfer

- Automated fill process that satisfies complex fill requirements

  • Maximization of fill shapes to minimize density variation
  • Critical-net-aware fill
  • Analog-structure-aware fill (symmetry requirements)
  • Alternating and symmetrical fill for diffusion and poly
  • Matched fill for sensitive devices, cells, nets

Naturally, EDA vendors are stepping up to the challenge, and working to ensure these capabilities are available to design companies working at established nodes. At Mentor, we see extensive use of the tools in our integrated Calibre® nmPlatform being used for verification across the circuit and physical layout domains. Designers and foundries see that leading-edge tools such as the Calibre PERC™ reliability solution, the Calibre eqDRC™ functionality of Calibre nmDRC™, the Calibre Pattern Matching tool, SmartFill™ functionality in Calibre YieldEnhancer™, and others can provide as much value to the established nodes as they have for the newest processes.

Summary

The latest IC design and verification challenges are not all at the latest and greatest process node. Competition and market demand continue to challenge designers working at established nodes as well. In addition, industry economics and specialized applications are creating a growing volume demand for designs based on established nodes. While the market potential of established nodes is growing, so is the complexity and difficulty of validating designs that push these nodes far beyond their original capabilities. The new reality is challenging designers using EDA tools that were not available when the nodes were brand new. We’re learning that EDA tools are not frozen to the node, but must advance at these advanced established nodes just as they do at the leading-edge nodes. Design teams working at advanced established nodes have the option to upgrade their tools and make their life much easier. They might even feel like dancing…

Further Reading: Is Complexity Increasing For Designs Done at Older Process Geometries?

Michael White is the Director of Product Marketing for Calibre Physical Verification products at Mentor Graphics in Wilsonville, OR. Prior to Mentor Graphics, he held various product marketing, strategic marketing, and program management roles for Applied Materials, Etec Systems, and the Lockheed Skunk Works. Michael received a BS in System Engineering from Harvey Mudd College, and an MS in Engineering Management from the University of Southern California.

Leveraging Reliability-Focused Foundry Rule Decks

July 27th, 2016

By Matthew Hogan, Product Marketing Manager, Calibre Design Solutions, Mentor Graphics

Not that long ago, all designers had for integrated circuit (IC) reliability verification was a plethora of home-brewed scripts and utilities they combined with traditional design rule checking (DRC), layout vs. schematic (LVS) comparison, and electrical rule checking (ERC) tools. There were no foundry reliability rule decks or qualified reliability verification tools to provide a central focus on, or automated process for implementing reliability checks. While SPICE simulation is still widely used for small blocks, the ease with which reliability issues can be overlooked at the circuit level (particularly for electrical overstress) is staggering. Missing an input vector or running too few simulation cycles to expose an issue are some typical concerns (and weaknesses) of the SPICE methodology. On the interconnect side, traditional reliability verification means using your favorite parasitic extraction tool, selecting the paths you know/care about for export, and running SPICE on your parasitic netlist to determine resistance. Quite the laborious and error-prone undertaking. Understanding the circuit structure (topology), interconnect, and physical layout of your design are critical when looking at reliability-focused issues, especially those involving electrostatic discharge (ESD) and latch-up (LUP). Despite the challenges of these approaches, the question from designers always seemed to be “How can I leverage this technology if I don’t write the rules myself?”

Reliability-Focused Foundry Rule Decks

The fabless ecosystem relies on the availability of comprehensive, well-qualified foundry rule decks for a broad range of process nodes. Over the last decade, collaboration between electronic design automation (EDA) companies and the world’s leading foundries have resulted in the creation and availability of reliability-focused IC verification rule decks that consider design intent. While DRC, LVS and design for manufacturing (DFM) have been well-ingrained deliverables for this ecosystem for years, these new decks have enabled the development of qualified automated reliability verification solutions, like the Calibre® PERC™ reliability verification platform from Mentor Graphics [1], to help designers specifically address more complex reliability design issues accurately and efficiently.

Because new node development allows for the introduction of new tools and design flows, and creates the opportunity to solve new problems, many recent press releases focus on emerging node technologies [2][3][4][5][6]. However, while established nodes like 28 nm and 40 nm may not get much press these days, reliability rules are also available for them, focusing primarily on ESD and LUP.

Many designers are now beginning to understand the value of using these foundry reliability rule decks and automated reliability verification to augment their internal reliability checking flows for a wide variety of complex reliability issues.

Early and often

As with other verification solutions, getting insight into problematic areas of the design that affect reliability earlier in the design process is extremely beneficial, reducing the extensive re-work and re-spins that destroy schedules and eat into profits when errors are discovered late in the flow. For example, ensuring that the interconnect at the intellectual property (IP) level of your design is robust is a check that can be run early in the design process, as can cross-domain and similar topology-based checks. Many rule decks have options to facilitate running reliability checks not only at the full-chip level, but also at the IP level. Utilizing these capabilities in an incremental approach helps provide context for problematic areas, particularly for IPs that are being used in a different context from previous implementations, or whose geometries have been shrunk to accommodate a new process node.

I often hear the statement that early reliability analysis cannot be done because the chip is not “LVS-clean.” False! While making sure you have no power or ground shorts when doing ESD or other power-related checks is critical, there are a whole slew of LVS errors that have no impact on ESD protection structures and evaluation. By understanding your design, and identifying the LVS errors that can impact the reliability verification results, you can achieve significant design closure benefits from employing early reliability verification. Of course, final sign-off verification can’t happen until your design is both DRC- and LVS-clean, ensuring accurate results, but the adoption of an “early and often” policy towards reliability verification will help you influence critical aspects of the design implementation while there are fewer barriers and lower cost to changes. Such checks as point-to-point (P2P) resistance, or current density (CD) issues due to inadequate metallization and/or insufficient vias, can be readily identified and rectified in the layout, as can topology issues for important protective structures like ESD or cross-domain circuits. Leveraging the foundry’s reliability checks with an automated reliability verification tool early in the design/verification cycle establishes an important baseline to identify potential issues without incurring significant costs in time and resources.

Conclusion

Foundry rule decks and qualified EDA tools have permitted the fabless ecosystem to flourish. Together, their trusted and well-qualified content and processes provide the foundation for your verification flows. With the proliferation of reliability-focused foundry rule decks, early verification of reliability issues and comprehensive full-chip runs can now leverage their guidance. As with more traditional DRC, LVS and DFM rule decks, augmenting your processes and flows with these foundry offerings and qualified tools provides you with the flexibility to implement reliability verification early in your design process, while ensuring confidence in the results.

Related resource: Improving Design Reliability by Avoiding Electrical Overstress

References

[1] Fabless/Foundry Ecosystem Solutions, https://www.mentor.com/solutions/foundry/?cmpid=10167

[2] Mentor Graphics Enhances Support for TSMC 7nm Design Starts and 10nm Production https://www.mentor.com/company/news/mentor-tsmc-7nm-design-starts-10nm-production?cmpid=10167

[3] Mentor Graphics Announces Collaboration with GLOBALFOUNDRIES on Reference Flow and Process Design Kit for 22FDX Platform, https://www.mentor.com/company/news/mentor-collaboration-globalfoundries-22fdx-platform?cmpid=10167

[4] Intel Custom Foundry Expands Offering with Reliability Checking Using Calibre PERC, https://www.mentor.com/company/news/mentor-intel-custom-foundry-calibre-perc?cmpid=10167

[5] UMC Adds Calibre Reliability Verification and Interactive Custom Design Verification to Design Enablement Offering, https://www.mentor.com/company/news/mentor-umc-calibre-reliabillity-verification?cmpid=10167

[6] SMIC Adds Reliability Checks to IP Certification Program Based on Mentor Graphics Calibre PERC Platform, https://www.mentor.com/company/news/smic-to-ip-cert-program-mentor-calibre-perc-platform?cmpid=10167

Matthew Hogan is a Product Marketing Manager for Calibre Design Solutions at Mentor Graphics, with over 15 years of design and field experience. He is actively working with customers who have an interest in Calibre PERC. Matthew is an active member of the ESD Association—involved with the EDA working group, the Symposium technical program committee, and the IEW management committee. Matthew is also a Senior Member of IEEE, and a member of ACM. He holds a B. Eng. from the Royal Melbourne Institute of Technology, and an MBA from Marylhurst University. Matthew can be reached at matthew_hogan@mentor.com.

Reliability Scoring for the Automotive Market

June 23rd, 2016

By Jeff Wilson, DFM Product Marketing Manager, Calibre, Mentor Graphics

Introduction

The annual growth for car sales is typically in the single digits, but the electronic content inside those cars is rapidly expanding as we enter the age of the digital car. Current estimates posit up to 30% of the production cost of a new vehicle come from the electronic systems. The typical new automobile now contains over 100 microprocessors, performing various tasks from safety (braking control and sensors) to comfort (heating, cooling, seat positions) to infotainment (navigation and communication systems), as well as one of the fastest-growing uses—advanced driver assistance systems (ADAS). This explosion of automotive electronics is one of the bright spots in the current semiconductor industry, making these devices an attractive market for semiconductor companies looking to expand their markets. The challenge for any company new to the automotive market is to understand the market requirements and performance standards, especially in the area of quality and reliability. Safety, efficiency, and connectivity are the primary drivers for automotive electronic components.

Expanding Automotive Market

As more companies expand into this market, a key element to their success is ensuring that designs properly account for the environmental variability associated with automotive use, the stringent quality and reliability requirements with which they must comply, and consumer expectations for performance and reliability. Design teams must understand these conditions and apply the appropriate technology to solve design issues and achieve compliance.

There are a number of factors driving the need for reliability. First, there is the physical environment in which these devices must operate, which includes extreme weather conditions and broad ranges of temperatures. In addition to the climate, other environmental conditions that these devices must endure include ambient heat, vibration, and both extended and start-stop operation. Designing to meet this extended set of requirements is typically a new experience to those who have recently made the decision to produce chips for the automotive market.

Another reliability requirement that is new to most designers is the expected lifespan for their designs.  While consumer products typically operate for a few years, an automotive device is expected to last at least 10-15 years. In addition, an automobile creates its own system, with a significant amount of connectivity between devices that compounds the criticality of device reliability because, in many cases, if one device fails, the entire system is compromised. This forces designers to consider previously trivial design stresses, such as time-dependent dielectric breakdown (TDDB), and learn how to analyze and account for these effects. This expected life also puts a strain on new technologies that don’t yet have a longevity track record.

In addition to environmental variability and cumulative system reliability, there is variability in the breadth of the complexity between designs. At the high end, there is the in-vehicle infotainment (IVI) market, which is simply defined as combining information and entertainment for the benefit of both the driver and the passenger. IVI brings together video display, audio, touchscreens, and connections to other devices such as smartphones and media players. The controlling systems or host processor in IVI typically utilize the latest semiconductor technology to deliver the required functionality.  Memory chips, especially NAND flash, are another important semiconductor component in navigation and IVI systems.

At the low end, established technologies are known and proven for such items as safety (e.g., air bags), braking systems, power train operations, and ignition system control. The need for these chips is a major driver (along with the Internet of Things) of capacity at established nodes. This market demand puts pressure on designers to ensure they consistently maximize both yield and reliability even in these long-established designs.

Reliability Drivers

There are two major areas of reliability that must be considered during the design and verification process—electrical performance and manufacturing optimization. These two reliability-related issues have both unique requirements and overlaps. One of the biggest overlaps is the eco-system required to deliver a complete design solution, which includes the foundry, design team, and electronic design automation (EDA) solution providers. The foundry has in-depth knowledge about the manufacturing process, and can link a layout configuration to yield/reliability/robustness by putting this knowledge into a rule deck.  The EDA providers supply automated functionality that allows designers to analyze their design against this rule deck to find out what and where changes can improve their design, either for electrical performance or manufacturing optimization. Now that designers have an automated solution that helps improve design reliability, they can put it to good use on their designs. In addition to improving the yield/reliability/robustness score for each design, they can use this capability to establish best practices across the company. By comparing scores from different design groups, they can determine what design techniques to use going forward. Standardizing on the best flows for their company helps improve the quality of all designs.

Designers have the responsibility of ensuring that their designs are reliable by verifying electrical performance before tapeout.  The AEC electrical component qualification requirements identify wearout reliability tests, which specify the testing of several failure mechanisms:

  • Electromigration
  • Time-dependent dielectric breakdown (or gate oxide integrity test) — for all MOS technologies
  • Hot carrier injection — for all MOS technologies below 1 micron
  • Negative bias temperature instability
  • Stress migration

Design verification against these failure factors ensures that the actual device electrical performance will meet reliability expectations. However, traditional IC verification flows leveraging design rule checking, layout vs. schematic, and electrical rule checking techniques may have trouble validating these requirements, because these tools each focus on one specific aspect of design verification. New EDA tools like Mentor’s Calibre® PERC™, which provides the ability to consider not only the devices in a design, but also the context in which they are used, as well as their physical implementation, can help designers understand weaknesses in their designs from a holistic approach. This “whole problem” view of a design provides visibility to interoperability issues of intellectual property (IP) used in the design.

Manufacturing reliability is driven by what is commonly referred to as design for manufacturing (DFM).  DFM is about taking manufacturing data and presenting it to designers so they can improve the yield/reliability/robustness of their designs by eliminating known manufacturing issues. The most effective way to make this work is to have the same type of eco-system used to improve electrical reliability, where the participants include the foundry, designers and EDA providers.  Manufacturing reliability checks are an extension of the rule deck, such as the manufacturing analysis and scoring (MAS) deck developed by Samsung and GLOBALFOUNDRIES for use with Mentor’s Calibre YieldAnalyzer™ tool.  A key element in creating a functional eco-system is to provide the feedback from actual manufacturing results, so the designers understand why a particular layout structure is not suitable for complying with reliability requirements. This feedback is especially critical for those that are new to the automotive market and its additional reliability requirements. A productive solution is much more than just providing a DFM score for a layout—designers need to recognize the most important and relevant geometries, and what changes will return the greatest improvements in reliability. The ability to prioritize design work is critical to producing designs that are both cost-efficient and successful.

Summary

There is no doubt that electronics are impacting the automotive market, and this trend is expected to continue increasing.  As companies move into the market to take advantage of the opportunities they see, they will need to understand how layout variabilities relate to design quality and reliability requirements. Foundries can provide the relationship between the layout and the reliability, while EDA providers supply the tools that present this data to designers in an easy-to-use automated system. As the final piece of the eco-system, designers must understand both the requirements and the solutions  to ensure the design meets the stringent electronic reliability requirements while remaining profitable to manufacture.

Additional Resources

Understanding Automotive Reliability and ISO 26262 for Safety-Critical Systems

Migrating Consumer Electronics to the Automotive Market with Calibre PERC

Enhancing Automotive Electronics Reliability Checking

Author

Jeff Wilson is a DFM Product Marketing Manager in the Calibre organization at Mentor Graphics in Wilsonville, OR. He has responsibility for the development of products that analyze and modify the layout to improve the robustness and quality of the design. Jeff previously worked at Motorola and SCS. He holds a BS in Design Engineering from Brigham Young University and an MBA from the University of Oregon. Jeff may be reached at jeff_wilson@mentor.com.

Interconnect Robustness Depends on Scaling for Reliability Analysis

May 25th, 2016

By Matthew Hogan, Product Marketing Manager, Calibre Design Solutions, Mentor Graphics

The safety net of design margins that were once available to designers has disappeared. Whether you’re implementing a new design start at your “next” node or an established node, the desire for greater functionality has eroded what margins used to exist. This tightening of design margins is further exacerbated by an increasing industry-wide focus on reliability, driven by both consumer demand and an expanding array of standards for performance-critical electronics. This focus seems to be landing equally on both devices and interconnect. Gone are the days when (rough) hand calculations or visual inspection of designs were sufficient to provide the level of confidence needed to proceed against time-sensitive tapeout schedules and tight time-to-market windows.

Now present in both digital and analog designs is the need to validate interconnect robustness, or resistance to failure. The old technique of “counting squares,” where each “square” of a specific size was given a resistance value for each metal layer, and other manual methods seldom provide the necessary accuracy.

For a design to be “LVS clean,” all that’s required is a single connection. Not a great way forward if you are expecting to shunt any reasonable current through those connections. The same is true for blocks connecting to wide power busses, with slender metallization. Figure 1 shows several examples of LVS-clean layouts with very low robustness, and how they could be improved.

Figure 1. Inadequate via and interconnect connections within layers.

Parallel paths and unexpected layer transitions make point-to-point resistance (P2P) simulations an invaluable tool for validating that low resistance paths between design elements exist. Current density (CD) simulations provide more detail, and not only allow designers to consider the suitability of the metal width, but also provide an opportunity for detailed analysis of layer transitions.

Early interconnect evaluation

Validation of individual intellectual property (IP) blocks before final integration into the system-on-chip (SoC) provides an early look at possible robustness issues. Far too often, design teams feel the need to wait until final chip assembly to validate full path interconnects. While this is an important task that must be completed, validating each of the IP blocks early in the design process, when changes can more easily be made, provides important feedback on what to expect in the final design. In addition to focusing on each IP block, designers must also consider functional assemblies, even before they actually exist. Where will the electrostatic discharge (ESD) protection blocks be integrated? How will the lower levels of IP be validated for interoperability? These are all important design flow considerations.

At smaller process nodes, particularly those using FinFETs, ESD circuits require a larger number of (often interdigitated) devices to provide adequate protection. The ESD target levels that you design to can greatly impact the area and number of these devices. Verification of these structures, particularly the interconnect to clusters of these devices, is of critical importance. Validation at the lowest design level possible, as early in the design process as possible, enables efficient design flows for each technology node. Depending on the design style and its robustness/reliability requirements, it may be necessary to critically look at detailed combinations of input/output (IO) pads to power clamp devices. This type of analysis may require a significant number of individual simulations to capture all combinations of IO1 through each of the power clamps (Figure 2).

Figure 2. Multiple simulations are needed to capture all combinations of IO1 through each of the 3 power clamps.

Full chip evaluation

In addition to this focused analysis at the IP level, understanding the context of IP use in the full-chip SoC is also an important consideration. As with validating the IP-level interconnect, full-chip evaluation requires a strategy that matches your workflow. Do you only need to validate interconnect to the ports of your IP, or must you go all the way to the device level?

As is standard in LVS full-chip runs, designers performing interconnect robustness analysis may exercise their verification tools for P2P and CD simulations at the device level. Leaving nothing to chance, this evaluation looks at the entire full-chip path, often looking at different combinations of ESD protection paths. If you use a comprehensive verification toolset, the good news for all these simulation paths is that you can parasitically extract all of the pin-pairs that need to be evaluated at the same time. The combinations that must be simulated can re-use these parasitics to perform the next simulation (Figure 3). This re-use is critical for minimizing turnaround time (TAT) while scaling to the number of simulations required for detailed analysis.

Figure 3. Scaled simulations are essential to minimizing TAT while ensuring accurate and complete analysis.

Conclusion

The need to validate interconnect robustness is now a given at advanced nodes. However, accepting simulation runtimes that take days, or even weeks should be a thing of the past. With interconnect robustness a critical aspect of reliability, fast simulation and parasitic extraction is essential for both schedule and market success. Early analysis within the design flow helps alleviate last-minute discovery of critical errors, providing the opportunity for fixing without significant adverse impacts to product schedules. Detailed analysis of interconnect, particularly for P2P and CD in ESD environments, with a reliability verification tool capable of quickly performing complex simulations, provides both accuracy and the necessary coverage in an acceptable timeframe. The early discovery of interconnect robustness issues, combined with the ability of your verification tools to easily and efficiently scale from IPs to SoCs, can ensure timely design completion while enhancing design reliability, a combination that can provide a new safety net—for your bottom line.

Next Page »