How Critical Area Analysis Optimizes Memory Redundancy Design
By Simon Favre, Mentor Graphics
As any design engineer knows, the farther downstream a design goes, the less likely a manufacturing problem can be corrected without a costly and time-consuming redesign. And it doesn’t matter if you are a fabless, fab-lite, or independent device manufacturer (IDM) company—reducing a design’s sensitivity to manufacturing issues should ideally be handled by the design teams. By identifying and resolving design for manufacturing (DFM) problems while the design is still in its early stages, many manufacturing ramp-up issues can be avoided altogether.
For example, embedded memories often cover 40-60% of the chip area in a large system-on-chip (SoC) design. The densely packed structures in memory cores make them very susceptible to random defects, so redundant elements are often added to embedded memories to improve final yields. However, if redundancy is applied where it has no benefit, then die area and test time are wasted, which actually increases manufacturing cost. Unnecessary redundancy can be a crucial and costly mistake. Using critical area analysis (CAA) to perform a detailed analysis of your design redundancy can accurately quantify the yield improvement that can be achieved, while minimizing impact on chip area and test.
Critical Area Analysis
The basic CAA process calculates values for the average number of faults (ANF) and yield based on the probability of random defects that introduce an extra pattern (short) or missing pattern (open) into a layout, causing functional failures (Figure 1).
In addition to classic shorts and opens calculations, CAA techniques also analyze potential via and contact failures. In fact, once CAA is applied, via and contact failures often prove to be the dominant failure mechanisms (Figure 2). Other failure mechanisms can also be incorporated into CAA, depending on the defect data provided by the foundry.
As shown in Figure 3, critical area increases with increasing defect size. In theory, the entire area of the chip could be a critical area for a large enough defect size. In practice, most foundries limit the range of defect sizes that can be simulated, based on the range of defect sizes they can detect and measure with test chips or metrology equipment.
Semiconductor foundries have various proprietary methods for collecting defect density data associated with their manufacturing processes. To be used for a CAA process, this defect density data is converted into a form compatible with the CAA tool. The most common format is a simple power equation, as shown in equation (1). In this equation, k is a constant derived from the density data, x is the defect size, and the exponent q is called the fall power. The foundry curve-fits the opens and shorts defect data for each layer to an equation of this form to support CAA. In general, a defect density must be available for every layer and defect type for which critical area will be extracted. However, in practice, layers that have the same process steps, layer thickness, and design rules typically use the same defect density values.
Defect density data may also be used in table form, where each specific defect size listed has a density value. One simplifying assumption typically used is that the defect density is assumed to be 0 outside the range of defect sizes for which the fab has data.
Calculation of ANF
Once the critical area CA(x) is extracted for each layer over the range of defect sizes, the defect density data D(x) is used to calculate ANF according to equation (2), using numerical integration. The dmin and dmax limits are the minimum and maximum defect sizes according to the defect data available for that layer.
(2)ANF=∫_dmaxdmin CA(x)∙D(x) dx
In most cases, the individual ANF values can simply be added to arrive at a total ANF for all layers and defect types. Designers take note: ANF is not strictly a probability of failure, as ANF is not constrained to be less than or equal to 1.
Calculation of Yield
Once the ANF is calculated, one or more yield models are applied to make a prediction of the defect-limited yield (DLY) of a design. One of the simplest, most widely-used yield models is the Poisson distribution, shown in equation (3). Of course, DLY cannot account for parametric yield issues, so care must be taken when attempting to correlate these results to actual die yields.
(3) Y = e-ANF
ANF and Yield for Cut Layers
Calculation of ANF and yield for cut layers (contacts and vias) is generally simpler than for other layers. In fact, most foundries define a probabilistic failure rate for all single vias in the design, and assume that via arrays do not fail. While this simplifying assumption neglects the problem that a large enough particle will cause multiple failures, it greatly simplifies the calculation of ANF, in addition to reducing the amount of data needed from the foundry. All that is required is a sum of all the single cuts on a given layer, and the ANF is then simply calculated as the product of the count and the failure rate, shown in equation (4).
Once the ANF(via) is calculated, it can be added to the ANF values for all the other defect types, and used in the yield equation (3). Vias between metal layers may all use one failure rate, or use separate rates based on the design rules for each via layer. The contact layer can be separated into contacts to diffusion (N+ and P+ separately, or together), and contacts to poly, each with separate failure rates.
As stated earlier, embedded memories can account for significant yield loss due to random defects. Typically, SRAM intellectual property (IP) providers make redundancy a design option, with the most common form of redundancy being redundant rows and columns. Redundant columns tend to be easier to apply, as the address decoding is not affected, only the muxing of bitlines and IO ports.
Memory Failure Modes
Every physical structure in a memory block is potentially subject to failures caused by random defects, classified according to the structures affected. The most common classifications are single-bit failures, row and column failures, and peripheral failures (which can be further subdivided into I/O, sense amplifier, address decoder, and logic failures). In terms of repair using memory redundancy, our primary interest is in single-bit row and column (SBRC) failures occurring in the core of the memory array.
To analyze SBRC failures with CAA, designers must define which layers and defect types are associated with which memory failure modes. By examining the layout of a typical 6-T or 8-T SRAM bit cell, some simple associations can be made. For example, by looking at the connections of the word lines and bit lines to the bit cell, we can associate poly and contact to poly on row lines with row failures, and associate diffusion and contact to diffusion on column lines with column failures. Because contacts to poly and contacts to diffusion both connect to Metal1, the Metal1 layer must be shared between row and column failures. Obviously, most of the layers in the memory design are used in multiple places, so not all defects on these layers will cause failures that can be repaired. There are also non-repairable fatal defects, such as shorts between power and ground. Given that a single-bit failure can be repaired with either row or column redundancy, we’ll ignore these differences for now.
Embedded SRAM designs typically make use of either built-in self-repair (BISR) or fuse structures that allow designers to mux out the failed structures and replace them with redundant structures. BISR has greater complexity, with greater impact on die area. Muxing with fuses requires that the die be tested, typically at wafer sort, and the associated fuses blown to accomplish the repair. The fuse approach has the advantage of simplicity and reduced area impact, although at the expense of additional test time. Regardless of the repair method, placing redundant structures in the design adds area, which directly increases the cost of manufacturing the design. Additional test time also increases cost, and designers may not have a good basis for calculating that cost. The goal of analyzing memory redundancy with CAA is to ensure that DLY is maximized, while minimizing the impact on die area and test time.
Specification of Repair Resources
For a CAA tool to accurately analyze memory redundancy, it requires a specification of the repair resources available in each memory block. This specification must also include a breakdown of the failure modes by layer and defect type, and their associated repair resource. The layer and defect type together are typically called a CAA rule. Each rule with an associated repair resource must be in a list of all rules associated with that repair resource. Since some rules will be associated with both row and column failures, some means of specifying rule sharing is needed.
For each memory block, the count of total and redundant rows and columns is required. To specifically identify the areas of the memory that can be repaired, the designer must either specify the bitcell name used in each memory block, or use a marker layer in the layout database. This identification allows the CAA tool to identify the core areas of the memory.
Figure 4 shows a typical memory redundancy specification. The first line lists the CAA rules that have redundant resources for a particular family of memory blocks. The two lists are column rules, followed by row rules. The two lines at the bottom show SRAM block specifications and specify (in order) the block name, the rule configuration name, the total columns, redundant columns, total rows, redundant rows, dummy columns, dummy rows, and the name of the bitcell. In this example, both block specifications refer to the same rule configuration. Given these parameters, and the unrepaired yield calculated by the CAA tool, it is possible to calculate the repaired yield.
Yield Calculation with Redundancy
Once the CAA tool performs the initial analysis, it can calculate the yield with redundancy. The initial analysis must include the ANF(core) of the total core area of each memory block listed in the redundancy configuration file. Since the calculation method is the same, each row or column in a memory core can simply be referred to as a “unit,” and the calculation method only needs to be described once. If present, dummy units do not cause functional failures, and do not need to be repaired (in the initial analysis, dummy units do contribute to the total ANF, as the CAA tool has no knowledge of whether or not they are functional).
The calculation method is based on the well-known principle of Bernoulli Trials. The goal is to get the required number of good units out of some total number of units. First, the tool calculates the number of active units in the core, as shown in equation (5).
Where NA is the required number of active units, NT is the total units, NR is the redundant units, and ND is the dummy units. In equation (6), the tool derives the number of functional non-dummy units.
Next, it calculates the unit ANF in equation (7).
To be consistent with probability theory, the tool converts ANF(unit) back to a yield, using the Poisson equation in equation (8). This value becomes the p term in the Bernoulli equation, which denotes probability of success. The probability of failure, q, is defined in equation (9).
Now the tool must add together the probabilities of all cases that satisfy the requirement of getting at least NA good units out of NF available units. The result, calculated in equation (10), is the repaired yield for that memory core for that specific rule. This process is repeated over all rules in the memory configuration specification, and all memory blocks listed with redundancy.
(10) YR=∑k=0k=NR C(NF,(NF-k))∙p(N_F-k)∙qk
Note that the case where k=0 is necessary to account for the possibility that all units are good. The term C(NF,(NF-k)) is the binomial coefficient, which evaluates to 1 if k=0. For any memory core or rule where no repair resources exist, the calculation in equation (10) is skipped, and the result is simply the original unrepaired yield.
Calculating the effective yield for memory blocks with no redundancy is still valuable if the CAA tool has the capability of post-processing the calculations with a different memory redundancy specification. This enables a “what-if” analysis, which can be crucial for determining whether or not applying redundancy adds more value than the inherent cost of adding it to the design. If the what-if analysis can be done without repeating the full CAA run, then iterating on a few memory redundancy configurations to find the optimum is quite reasonable. In addition, if the tool reports the intermediate calculations for each term in the Bernoulli Trials, the point of diminishing returns can easily be identified. This prevents costly overdesign of the memories with redundancy.
The technique presented has some limitations, but can still be applied with relative ease to determine optimal redundancy parameters. The obvious limitations are:
- The test program must be able to distinguish the case where a failure on a redundant unit has occurred, but all the active units are good. This case requires no repair.
- There is no accounting for fatal defects that cannot be repaired, such as power to ground shorts.
- The redundancy calculation is applied only to the core bitcells, but redundant columns, for example, may include the sense amp and IO registers.
- The CAA rules apply to specified layers and defect types anywhere within the memory core, not to specific structures in the layout. If a method existed for tagging specific structures in the layout and associating them with failure modes or rules, the calculation would be more accurate.
- Algorithmic repair, such as data error correction, is beyond the scope of CAA analysis.
Memory redundancy is a design technique intended to reduce manufacturing cost by improving die yield. If no redundancy is applied, then alternative methods to improve die yield may include making the design smaller, or reducing defect rates. If redundancy is applied where it has no benefit, then die area and test time are wasted, which actually increases manufacturing cost. In between these two extremes, redundancy may or may not be applied depending on very broad guidelines. If defect rates are high, more redundancy may be needed. If defect rates are low, redundancy may be unnecessary. Analysis of memory redundancy using CAA and accurate foundry defect statistics is a valuable process that helps quantify the yield improvement that can be achieved, and determine the optimal configuration.
 Stapper, C.H. “LSI Yield Modeling and Process Monitoring,” in IBM Journal of Research and Development, Vol. 44, p. 112, 2000. Originally published May 1976. http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5391123
 Stapper, C.H. “Improved Yield Models for Fault-Tolerant Memory Chips,” in IEEE Transactions on Computers, vol. 42, no. 7, pp. 872-881, Jul 1993.
Simon Favre is a Technical Marketing Engineer in the Design to Silicon division at Mentor Graphics, supporting and directing improvements to the Calibre YieldAnalyzer and CMPAnalyzer products. Prior to joining Mentor Graphics, Simon worked with foundries, IDMs, and fabless semiconductor companies in the fields of library development, custom design, yield engineering, and process development. He has extensive technical knowledge in DFM, processing, custom design, ASIC design, and EDA. Simon holds BS and MS degrees from U.C. Berkeley in EECS. He can be reached at firstname.lastname@example.org.