Part of the  

Solid State Technology

  and   

The Confab

  Network

About  |  Contact

Posts Tagged ‘SoC’

Next Page »

Embedded FPGAs Offer SoC Flexibility

Wednesday, October 4th, 2017

thumbnail

By Dave Lammers, Contributing Editor

It was back in 1985 that Ross Freeman invented the FPGA, gaining a fundamental patent (#4,870,302) that promised engineers the ability to use “open gates” that could be “programmed to add new functionality, adapt to changing standards or specifications, and make last-minute design changes.”

Freeman, a co-founder of Xilinx, died in 1989, too soon to see the emerging development of embedded field programmable logic arrays (eFPGAs). The IP cores offer system-on-chip (SoC) designers an ability to create hardware accelerators and to support changing algorithms. Proponents claim the approach provides advantages to artificial intelligence (AI) processors, automotive ICs, and the SoCs used in data centers, software-defined networks, 5G wireless, encryption, and other emerging applications.

With mask costs escalating rapidly, eFPGAs offer a way to customize SoCs without spinning new silicon. While eFPGAs cannot compete with custom silicon in terms of die area, the flexibility, speed, and power consumption are proving attractive.

Semico Research analyst Rich Wawrzyniak, who tracks the SoC market, said he considers eFPGAs to be “a very profound development in the industry, a capability that is going to get used in lots of places that we haven’t even imagined yet.”

While Altera, now owned by Intel, and Xilinx, have not ventured publicly into the embedded space, Wawrzyniak noted that a lively bunch of competitors are moving to offer eFPGA intellectual property (IP) cores.

Multiple competitors enter eFPGA field

Achronix Semiconductor (Santa Clara, Calif.) has branched out from its early base in stand-alone FPGAs, using Intel’s 22nm process, to an IP model. It is emphasizing its embeddable Speedcore eFPGAs that can be added to SoCs using TSMC’s 16FF foundry process. 7nm IP cores are under development.

Efinix Inc. (Santa Clara recently rolled out its Efinix Programmable Accelerator (EPA) technology.

Efinix (efinixinc.com) claims that its programmable arrays can either compete with established stand-alone FPGAs on performance, but at half the power, or can be added as IP cores to SoCs. The Efinix Programmable Accelerator technology can provide a look up table (LUT)-based logic cell or a routing switch, among other functions, the company said.

Efinix was founded by several managers with engineering experience at Altera Corp. at various times in their careers — Sammy Cheung, Tony Ngai, Jay Schleicher, and Kar Keng Chua — and has financial backing from two Malaysia-based investment funds.

Flex Logix Technologies, (Mountain View, Calif.) (www.flex-logix.com) an eFPGA startup founded in 2014, recently gained formal admittance to TSMC’s IP Alliance program. It supports a wide array of foundry processes, providing embedded FPGA IP and software tools for TSMC’s 16FFC/FF+, 28HPM/HPC, and 40ULP/LP.

Flex Logix supports several process generations at foundry TSMC. The 16nm test chip is being evaluated. (Source: Flex Logix)

QuickLogic adds SMIC to foundry roster

Menta  (http://www.menta-efpga.com/) is another competitor in the FPGA space. Based in Montpellier, France, Menta is a privately held company founded a decade ago that offers programmable logic IP targeted to both GLOBALFOUNDRIES (14LPP) and TSMC (28HPM and 28HPC+) processes.

Menta offers either pre-configured IP blocks, or custom IPs for SoCs or ASICs. The French company supports its IP with a tool set, called Origami, which generates a bitstream from RTL, including synthesis. Menta said it has fielded four generations of products that in use by customers now “for meeting the sometimes conflicting requirements of changing standards, security updates and shrinking time-to-market windows of mobile and consumer products, IoT devices, networking and automotive ICs.”

QuickLogic, a Silicon Valley stalwart founded in 1988, also is expanding its eFPGA capability. In mid-September, QuickLogic (Sunnyvale, Calif.) (quicklogic.com) announced that its eFPGA IP can now be used with the 40nm low-leakage process at Shanghai-based Semiconductor Manufacturing International Corp. (SMIC). QuickLogic also offers its eFPGA technology on several of the mature GLOBALFOUNDRIES processes, and is participating in the foundry’s 22FDX IP program.

Wawrzyniak, who tracks the SoC market for Semico Research, said an important market is artificial intelligence, using eFPGA gates to add a flexible convolutional neural network (CNN) capability. Indeed, Flex Logix said one of its earliest adopters is an AI research group at Harvard University that is developing a programmable AI processor.

A seminal capability

The U.S. government’s Defense Advanced Projects Agency (DARPA) also has supported Flex Logix by taking a license, endorsing an eFPGA capability for defense and aerospace ICs used by the U.S. military.

With security being such a concern for the Internet of Things edge devices market, Wawrzyniak said eFPGA gates could be used to secure IoT devices against hackers, a potentially large market.

“The major use is in apps and instances where people need some programmability. This is a seminal, basic capability. How many times have you heard someone say, ‘I wish I could put a little bit of programmability into my SoC.’ People are going to take this and run with it in ways we can’t imagine,” he said.

Bob Wheeler, networking analyst at The Linley Group, said the intellectual property (IP) model makes sense for startups. Achronix, during the dozen years it developed and then fielded its standalone FPGAs, “was on a very ambitious road, competing with Altera and Xilinx. Achronix went down the road of developing parts, and that is a tall order.”

While the cost of running an IP company is less than fielding stand-alone parts, Wheeler said “People don’t appreciate the cost of developing the software tools, to program the FPGA and configure the IP.” The compiler, in particular, is a key challenge facing any FPGA vendor.

Wheeler said Achronix https://www.achronix.com/ , has gained credibility for its tools, including its compiler, after fielding its high-performance discrete FPGAs in 2016, made on Intel’s 22nm process.

Achronix offers Speedcore eFPGAs, based on the same architecture as its standalone FPGAs. (Source: Achronix Semiconductor)

And Wheeler cautioned that IP companies face the business challenge of getting a fair return on their development efforts, especially for low-cost IoT solutions where companies maintain tight budgets for the IP that they license.

Achronix earlier this year announced that its 2017 revenues will exceed $100 million, based on a seven-times increase in sales of its Speedster 22i FPGA family, as well as licensing of its Speedcore embedded IP products, targeted to TSMC’s leading-edge 16 nm node, with 7nm process technology for design starts beginning in the second half of this year. Achronix revenues “began to significantly ramp in 2016 and the company reached profitability in Q1 2017,” said CEO Robert Blake.

Escalating mask costs

Flex Logix CEO Geoff Tate

Geoff Tate, now the CEO of Flex Logix Technologies, earlier headed up Rambus for 15 years. Tate said Flex Logix (www.flex-logix.com uses a hierarchical interconnect, developed by co-founder Cheng Wang and others while he earned his doctorate at UCLA. The innovative interconnect approach garnered the Lewis Outstanding Paper award for Wang and three co-authors at the 2014 International Solid-State Circuits Conference (ISSCC), and attracted attention from venture capitalists at Lux Ventures and Eclipse Ventures.

Tate said one of those VCs came to him one day and asked for an evaluation of Wang & Co.’s technology. Tate met with Wang, a native of Shanghai, and found him to be anything but a prima donna with a great idea. “He seemed very motivated, not just an R&D guy.”

While most FPGAs use a mesh interconnect in an X-Y grid of wires, Wang had come up with a hierarchical interconnect that provided high density without sacrificing performance, and proved its potential with prototype chips at UCLA.

“Chips need to be more flexible and adaptable. FPGAs give you another level of programmability,” Tate noted.

Meanwhile, potential customers in networking, data centers, and other markets were looking for ways to make their designs more flexible. An embedded FPGA block could help customers adapt a design to new wireless and networking protocols. Since mask costs were escalating, to an estimated $5 million for 16nm designs and more than double that for 7nm SoCs, customers had another reason to risk working with a startup.

TSMC has supported Flex Logix, in mid-September awarding the company the TSMC Open Innovation Platform’s Partner of the Year Award for 2017 in the category of New IP.

“Our lead customer has a working chip, with embedded FPGA on it. They are in the process of debugging rest of their chip. Overall, we are still in the early stages of market development,” Tate said, explaining that semiconductor companies are understandably risk-averse when it comes to their IP choices.

Asked about the status of its 16nm test chip, Tate said “the silicon is out of the fab. The next step is packaging, then evaluation board assembly.  We should be doing validation testing starting in late September.”

Potential customers are in the process of sending engineers to Flex Logix to look at metrics of the largest 16nm arrays, such as IR drop, vest vectors, switching simulations, and the like. “They making sure we are testing in a thorough fashion. If we screw them over, they’ll tell everybody, so we have got to get it right the first time,” Tate said.

Deep Learning Could Boost Yields, Increase Revenues

Thursday, March 23rd, 2017

thumbnail

By Dave Lammers, Contributing Editor

While it is still early days for deep-learning techniques, the semiconductor industry may benefit from the advances in neural networks, according to analysts and industry executives.

First, the design and manufacturing of advanced ICs can become more efficient by deploying neural networks trained to analyze data, though labelling and classifying that data remains a major challenge. Also, demand will be spurred by the inference engines used in smartphones, autos, drones, robots and other systems, while the processors needed to train neural networks will re-energize demand for high-performance systems.

Abel Brown, senior systems architect at Nvidia, said until the 2010-2012 time frame, neural networks “didn’t have enough data.” Then, a “big bang” occurred when computing power multiplied and very large labelled data sets grew at Amazon, Google, and elsewhere. The trifecta was complete with advances in neural network techniques for image, video, and real-time voice recognition, among others.

During the training process, Brown noted, neural networks “figure out the important parts of the data” and then “converge to a set of significant features and parameters.”

Chris Rowen, who recently started Cognite Ventures to advise deep-learning startups, said he is “becoming aware of a lot more interest from the EDA industry” in deep learning techniques, adding that “problems in manufacturing also are very suitable” to the approach.

Chris Rowen, Cognite Ventures

For the semiconductor industry, Rowen said, deep-learning techniques are akin to “a shiny new hammer” that companies are still trying to figure out how to put to good use. But since yield questions are so important, and the causes of defects are often so hard to pinpoint, deep learning is an attractive approach to semiconductor companies.

“When you have masses of data, and you know what the outcome is but have no clear idea of what the causality is, (deep learning) can bring a complex model of causality that is very hard to do with manual methods,” said Rowen, an IEEE fellow who earlier was the CEO of Tensilica Inc.

The magic of deep learning, Rowen said, is that the learning process is highly automated and “doesn’t require a fab expert to look at the particular defect patterns.”

“It really is a rather brute force, naïve method. You don’t really know what the constituent patterns are that lead to these particular failures. But if you have enough examples that relate inputs to outputs, to defects or to failures, then you can use deep learning.”

Juan Rey, senior director of engineering at Mentor Graphics, said Mentor engineers have started investigating deep-learning techniques which could improve models of the lithography process steps, a complex issue that Rey said “is an area where deep neural networks and machine learning seem to be able to help.”

Juan Rey, Mentor Graphics

In the lithography process “we need to create an approximate model of what needs to be analyzed. For example, for photolithography specifically, there is the transition between dark and clear areas, where the slope of intensity for that transition zone plays a very clear role in the physics of the problem being solved. The problem tends to be that the design, the exact formulation, cannot be used in every space, and we are limited by the computational resources. We need to rely on a few discrete measurements, perhaps a few tens of thousands, maybe more, but it still is a discrete data set, and we don’t know if that is enough to cover all the cases when we model the full chip,” he said.

“Where we see an opportunity for deep learning is to try to do an interpretation for that problem, given that an exhaustive analysis is impossible. Using these new types of algorithms, we may be able to move from a problem that is continuous to a problem with a discrete data set.”

Mentor seeks to cooperate with academia and with research consortia such as IMEC. “We want to find the right research projects to sponsor between our research teams and academic teams. We hope that we can get better results with these new types of algorithms, and in the longer term with the new hardware that is being developed,” Rey said.

Many companies are developing specialized processors to run machine-learning algorithms, including non-Von Neumann, asynchronous architectures, which could offer several orders of magnitude less power consumption. “We are paying a lot of attention to the research, and would like to use some of these chips to solve some of the problems that the industry has, problems that are not very well served right now,” Rey said.

While power savings can still be gained with synchronous architectures, Rey said brain-inspired projects such as Qualcomm’s Zeroth processor, or the use of memristors being developed at H-P Labs, may be able to deliver significant power savings. “These are all worth paying attention to. It is my feeling that different architectures may be needed to deal with unstructured data. Otherwise, total power consumption is going through the roof. For unstructured data, these types of problem can be dealt with much better with neuromorphic computers.”

The use of deep learning techniques is moving beyond the biggest players, such as Google, Amazon, and the like. Just as various system integrators package the open source modules of the Hadoop data base technology into a more-secure offering, several system integrators are offering workstations packaged with the appropriate deep-learning tools.

Deep learning has evolved to play a role in speech recognition used in Amazon’s Echo. Source: Amazon

Robert Stober, director of systems engineering at Bright Computing, bundles AI software and tools with hardware based on Nvidia or Intel processors. “Our mission statement is to deploy deep learning packages, infrastructure, and clusters, so there is no more digging around for weeks and weeks by your expensive data scientists,” Stober said.

Deep learning is driving new the need for new types of processors as well as high-speed interconnects. Tim Miller, senior vice president at One Stop Systems, said that training the neural networks used in deep learning is an ideal task for GPUs because they can perform parallel calculations, sharply reducing the training time. However, GPUs often are large and require cooling, which most systems are not equipped to handle.

David Kanter, principal consultant at Real World Technologies, said “as I look at what’s driving the industry, it’s about convolutional neural networks, and using general-purpose hardware to do this is not the most efficient thing.”

However, research efforts focused on using new materials or futuristic architectures may over-complicate the situation for data scientists outside of the research arena. At the International Electron Devices Meeting (IEDM 2017), several research managers discussed using spin torque magnetic (STT-MRAM) technology, or resistive RAMs (ReRAM), to create dense, power-efficient networks of artificial neurons.

While those efforts are worthwhile from a research standpoint, Kanter said “when proving a new technology, you want to minimize the situation, and if you change the software architecture of neural networks, that is asking a lot of programmers, to adopt a different programming method.”

While Nvidia, Intel, and others battle it out at the high end for the processors used in training the neural network, the inference engines which use the results of that training must be less expensive and consume far less power.

Kanter said “today, most inference processing is done on general-purpose CPUs. It does not require a GPU. Most people I know at Google do not use a GPU. Since the (inference processing) workload load looks like the processing of DSP algorithms, it can be done with special-purpose cores from Tensilica (now part of Cadence) or ARC (now part of Synopsys). That is way better than any GPU,” Kanter said.

Rowen was asked if the end-node inference engine will blossom into large volumes. “I would emphatically say, yes, powerful inference engines will be widely deployed” in markets such as imaging, voice processing, language recognition, and modeling.

“There will be some opportunity for stand-alone inference engines, but most IEs will be part of a larger system. Inference doesn’t necessarily need hundreds of square millimeters of silicon. But it will be a major sub-system, widely deployed in a range of SoC platforms,” Rowen said.

Kanter noted that Nvidia has a powerful inference engine processor that has gained traction in the early self-driving cars, and Google has developed an ASIC to process its Tensor deep learning software language.

In many other markets, what is needed are very low power consumption IEs that can be used in security cameras, voice processors, drones, and many other markets. Nvidia CEO Jen Hsung Huang, in a blog post early this year, said that deep learning will spur demand for billions of devices deployed in drones, portable instruments, intelligent cameras, and autonomous vehicles.

“Someday, billions of intelligent devices will take advantage of deep learning to perform seemingly intelligent tasks,” Huang wrote. He envisions a future in which drones will autonomously find an item in a warehouse, for example, while portable medical instruments will use artificial intelligence to diagnose blood samples on-site.

In the long run, that “billions” vision may be correct, Kanter said, adding that the Nvidia CEO, an adept promoter as well as an astute company leader, may be wearing his salesman hat a bit.

“Ten years from now, inference processing will be widespread, and many SoCs will have an inference accelerator on board,” Kanter said.

Mentor Graphics Veloce Emulation Platform Used by Starblaze for Verification of SSD Enterprise Storage Design

Wednesday, September 21st, 2016

Mentor Graphics Corporation (NASDAQ: MENT) today announced that the Veloce® emulation platform was successfully used by Starblaze Technology for a specialized high-speed, enterprise-based Solid State Drive (SSD) storage design.

Starblaze performed a detailed and lengthy analysis of the available solutions in the emulation market.  The Veloce emulation platform was selected and deployed because of its superior virtualization technology and memory protocol support, rich software debug capabilities and proven track record delivering innovative emulation technology.

“The enterprise SSD market is evolving rapidly, so the SoC (System on a Chip) verification technology we use has to be perfectly aligned with our needs, especially in terms of flexibility and high-performance protocol support,” said Sky Shen, CEO of Starblaze Technology. “After using the Veloce emulation platform on our latest high-performance, enterprise SSD controller project, we are convinced that a virtual solution with extensive software debug capability is the trend for the future of emulation technology.”

In the SSD storage space, it is extremely important for design teams to study the architecture and tune the performance while finding deep hardware bugs in the pre-silicon stage. Starblaze used VirtuaLAB PCIe to provide the host connection to their design on the Veloce emulation platform. The VirtuaLAB PCIe delivers very high debug productivity, and Starblaze was able to use its Software Design Kit “as is” without any modification or adaption.  In addition to using Veloce VirtuaLAB, Starblaze used Mentor’s Codelink® software debug capability to support the requirements of their embedded core software debug. In the flash interface side, the Veloce platform provides both HW and SW sparse memory solutions, which permits the necessary tradeoffs in the storage application.

ICE and Virtual:  Complementary Technologies

With the Veloce Emulation platform, verification teams have access to the best of both worlds, whether using an ICE-based or virtual emulation environment.  In-circuit emulation (ICE), a foundational emulation use model, remains a ‘must have’ for SoC designs that need to connect to real devices or custom hosts where physical hardware is required. The Veloce iSolve™ library offers a full complement of hardware components to build a robust ICE-based flow.

As more verification teams move from an ICE-based flow to a virtual flow, the Veloce emulation platform provides a smooth transition.  The Veloce Deterministic ICE App complements ICE by eliminating the non-deterministic nature of ICE and enabling advanced verification techniques: debug, power analysis, coverage closure, and software debug.

Full virtualization is achieved with the Veloce VirtuaLAB environment, which delivers virtual ICE-equivalent, high-speed host protocols and memory devices, allowing for greater flexibility for hardware/software system-level debug, power analysis, and system performance analysis.

“The Veloce emulation platform continues to deliver a comprehensive and robust emulation platform to a broad set of markets that all have unique challenges,” said Eric Selosse, vice president and general manager of the Mentor Emulation Division. “With Starblaze’s expertise in Flash Controller and SoC design, they quickly recognized the benefits of our VirtuaLAB solution.  Our success in working with them is attributed to our in-depth knowledge of the power of a virtual solution, and our timely support in deploying the Veloce emulation platform to meet their specific needs.”

About the Veloce Emulation platform

The Veloce emulation platform uses innovative software, running on powerful, qualified hardware and an extensible operating system, to target design risks faster than hardware-centric strategies. Now considered among the most versatile and powerful of verification tools, emulation greatly expands the ability of project teams to do hardware debugging, hardware/software co-verification or integration, system-level prototyping, low-power verification and power estimation and performance characterization.

The Veloce emulation platform is a core technology in the Mentor® Enterprise Verification Platform™ (EVP) – a platform that boosts productivity in ASIC and SoC functional verification by combining advanced verification technologies in a comprehensive platform. The Mentor EVP combines Questa® advanced verification solutions, the Veloce emulation platform, and the Visualizer™ debug environment into a globally accessible, high-performance datacenter resource. The Mentor EVP features global resource management that supports project teams around the world, maximizing both user productivity and total verification return on investment.

Veloce2 Emulator

Veloce2 Emulator is a high capacity, high-speed, multi-application powerhouse for simulation and emulation of SoC designs Learn More

About Mentor Graphics

Mentor Graphics Corporation is a world leader in electronic hardware and software design solutions, providing products, consulting services and award-winning support for the world’s most successful electronic, semiconductor and systems companies. Established in 1981, the company reported revenues in the last fiscal year of approximately $1.18 billion. Corporate headquarters are located at 8005 S.W. Boeckman Road, Wilsonville, Oregon 97070-7777. http://www.mentor.com.

Mentor Graphics Tackles SoC Design Challenges

Tuesday, October 6th, 2015

By Jeff Dorsch, Contributing Editor

System-on-a-chip designs are complex endeavors, and they are growing more complicated by the day. Mentor Graphics is cognizant of the many challenges in SoC design and is working to ease the troubles of chip designers.

The electronic design automation software and services company was established in 1981, making it one of the oldest existing EDA suppliers in the industry. Walden C. (Wally) Rhines, Mentor’s chairman and chief executive officer, is being recognized next month with the annual Phil Kaufman Award, given by the EDA Consortium and the IEEE Council on EDA, for distinguished contributions to EDA.

Shankar Krishnamoorthy, Chief Scientist for Mentor’s IC Implementation Division, says there are three key elements to SoC design at present – register-transfer level design, implementation, and sign-off.

In RTL, “the biggest challenge is designs are getting so large and complex,” Krishnamoorthy says. Designers must take into account the performance, area, and power consumption of the full chip, he adds.

“SoCs are getting very large and complex – there are many decisions that are made at RTL stage that impact performance, power and area.”

Sudhakar Jilla, director of marketing for the Olympus-SoC and RealTime Designer products within the IC Implementation Division, says today designers have to do a bottom up approach of synthesizing each RTL module separately, implementing the physical partitions and then finally stitching it all up at the chip level to get an early estimate of the timing, power and area – this process is taking way too long, he notes.

Sudhakar Jilla

“Wait four months – it’s too late,” Jilla says.

Semiconductor intellectual property is “developed in relative isolation, in multiple places,” Krishnamoorthy observes. “The quality for IPs needs to be changed significantly.”

In the IP qualification process, designers must consider physical implementation in addition to functional verification, according to Krishnamoorthy. “RTL Designer must be able identify congestion hotspots and design feasibility during RTL design. They should also consider the chip level context of the IP and explore the PPA tradeoffs early in the design cycle”

There is a movement afoot to get away from channel-based floorplans, opting for channel-less or channel-light floorplans, Krishnamoorthy adds.

In sign-off, there is “a very interesting phenomenon,” Krishnamoorthy says. “The number of corners is spiking up a lot.”

Jilla says SoC designs have gone from a typical six to eight corners to up to 20 corners. “The old problems have gotten worse,” he adds.

Krishnamoorthy observes, “Just because you finish your place-and-route, you’re not done.”

With coloring and other considerations in designing advanced SoCs, “the last mile has become longer,” Jilla says. “Place-and-route was two weeks. It’s now four to six weeks.”

Mentor Graphics is keeping up with “the leading edge of the lithography world,” Krishnamoorthy says. “All the leading foundries use Mentor tools. It gives us an idea of what the foundries are doing with lithography.”

Modern designs can call for two or three colors, and there is double-patterning and triple-patterning involved in immersion lithography, according to Krishnamoorthy. Working with the top foundries “really helps us get an early understanding of these challenges,” he says. “Ten nanometer is already here.”

And chips with 7-nanometer features are on the horizon for EDA companies, silicon foundries, and others. Mentor is working with leading foundries on 7nm IC design and manufacturing, according to Jilla. “Support for 7nm is underway,” he says.

Breaking Down Power Management Verification

Monday, June 22nd, 2015

thumbnail

By Vijay Chobisa, Mentor Graphics Corporation

For system-level power management verification, it is important to understand how software applications running on the targeted SoC effect power use.  During system-level verification, it is imperative to verify that the software power control applications properly initialize power states and power domains. In addition, that signals are stable during the transition from one application to another or between tests and level shifters, and that isolation cells are inserted correctly.

Companies that design complex SoCs implement several power domains in their designs to meet power budgets while maintaining the required operating performance. Several low power management techniques are employed including isolation cells, level shifters, state retention cells, power aware memories, and power control logic. As some memories are power aware, memory behavior must also be validated at the system level. The power control logic resides in hardware and the actual controls come from software, making verification too complex and lengthy for traditional digital simulators.

An advanced emulation platform supports complete power management verification at the system level, where software and hardware operate together where real-world stimulus is applied to the design under test. The speed of emulation allows designers and verification teams to boot the operating system and quickly stress test the design over an extremely large number of power sequences.

Power management verification flows

A power management structure allows designers to divide designs into several power domains. Each domain can be operated with a unique voltage level and can be powered on and off without interfering with the functionality of other domains. This requires isolation between power ON domains and power OFF domains. When an OFF domain needs to wake up, it requires some basic information to return to ON correctly.  Designers use retention techniques to preserve this information while the domain is switched off. The more information retained, the more real estate consumed; but the domain wakes up faster. Designers must be aware that states in the power OFF domain that lack a retention infrastructure can go to unknown values. Level shifters are used to operate domains at different voltages.

Memories can switch to three different states: power ON (allows normal memory operations), power OFF (memory operations are disabled and memory contents are corrupted), and standby (memory operations are disabled but memory contents are kept intact).

In this system, the always ON block implements the isolation interfaces and schemes. Features of the UPF standard are used to accomplish this functionality: an always ON supply and an ON/OFF supply. The Veloce operating system (OS3) supports the UPF supply functions — supply_off/supply_on — to natively handle this behavior.

Together, the four stages described below create a scalable, progressive flow that allows users to begin system-level low power verification early in the design and verification flow, using high-level models, adding detail and accuracy as the design matures. The stage used depends on how far along the design and its corresponding UPF description is in overall development. Each stage has specific goals and actions that build toward full verification of the final netlist.

Stage 1: Verifying UPF accuracy and implementation

Semiconductor companies want to start low power verification as early as possible to shorten production schedules. To make this happen, it is critical that the UPF file accurately captures the design intent as the design blocks are coming together at an early stage (Figure 1).

Figure 1. UPF file accurately captures the design intent.

At this stage, the design is in RTL, and the entire power intent is defined using top level UPF file. Typically it is carried over from a previous design and must be modified to suit the new one. The RTL is still in an early stage of development. There are no Liberty files and no gate-level components at this stage. The testbench is also in an early stage of development. Appropriately, verification takes place at a high level of abstraction, where speed is more important than accuracy.

First, engineers verify that the UPF file is correct, and then verify that the DUT and UPF file are working together—that the syntax matches. After that has been validated, the emulator reads the design RTL plus the UPF file, generates the power hierarchy along with the design netlist, and maps everything to emulation primitives. This step is used to verify the structure from the top level point of view to make sure that the emulator takes this UPF file and creates the proper power infrastructure in the DUT; including power switches, connectivity, and isolation cells. If anything is not implemented correctly or is missing in the UPF file, it is corrected both in the UPF file and the backend implementation.

Stage 2: Adding multiple blocks and corresponding block-level UPF files

In this stage, the design has several RTL blocks — each having its own UPF file. As in Stage 1, there are no gate-level netlists or Liberty files at this point.

The chip is verified with the top-level (power intent) UPF file and the UPF files for each block, which are usually supplied by the IP/block developers. Because each block-level UPF file has been implemented, they are more accurate at representing the power control inside each block. The top-level UPF file verified in Stage 1 is used at this stage, so the block-level and top-level UPF files are used together to thoroughly verify the whole design (Figure 2).

Figure 2. The chip is verified with the top-level (power intent) UPF file and the UPF files for each block.

The verification runs are similar to those in Stage 1. The main difference is that the power control is more detailed and the main goal is to make sure that the internal block controls are working correctly in the whole chip environment. Because each power pin control is more complicated in this context, compared to Stage 1, a finer level of resolution is required to control different sequences and cover all the corner cases. It is essential to test the handshaking between these blocks in the system context, because although these blocks and their respective UPF files have been verified in isolation, it is important to verify that they interact correctly with other blocks at the system level.

Designs often have many power sequences coming from different voltage regulators, and these can be powered on and off at any time. Further, there is not a single source that is controlling all of this activity. These are the real- world behaviors, which engineers want to mimic in emulation. The emulator generates random power sequences that can randomly power the different blocks on and off, which mimics the random nature of real-world scenarios.

Stage 3: Mix of RTL and gate-level netlist

In stage 3, some components of the design are fully ready and are available as a gate-level netlist. For RTL blocks, power intent comes from a UPF file, and for gate level block power infrastructure, is part of the netlist. This requires support of a liberty file. These gate level block or IP could be reused from previous SoC (Figure 3).

Figure 3. Designers test functionality in the hardware/software context.

At this point in the design cycle, the software is maturing and the CPU is used to control power down and power up functionality. So designers need to test this functionality in the hardware/software context.

The emulator must be capable of reading UPF files and liberty files to enable this mix of RTL and gate level netlist verification.  This is very critical at this stage.

As before, the accuracy of the UPF file matching the real chip is the primary goal. Toward that end, the emulator needs to provide power structure visual checking and the ability to report any mismatches. Again, only emulation can provide the required runtime performance to handle these complex operations on a very large chip, especially with gate level components.

Stage 4: Verifying gate-level netlists

In this stage, the power management infrastructure is part of the gate-level netlist and includes the final power hierarchy and reads the power strategy from Liberty files. This enables the final SoC netlist to be verified before chip tape out, ensuring that the final netlist has accurate low power behavior and avoiding translation issues from one design stage to another. Veloce identifies the power hierarchy and provides a debug flow in the event of incorrect or expected behavior (Figure 4).

Figure 4. Final SoC netlist is verified before chip tape out.

Advance low power debug console

Power aware bugs can be hard to debug thus a comprehensive GUI to debug power aware issues is needed. The Visualizer Debug Environment from Mentor offers a comprehensive power aware debug environments to enable debugging power aware issues, connectivity and sequences in an intuitive way.

Some examples:

  • PA domains
  • PA crossings capturing various aspects of isolation/level shifter being missed/incorrect
  • PA SimChecks
  • Power hierarchy schematic

Conclusion

Companies start power aware verification at a very early stage in the design flow and add details and granularities as the design progresses. The successive refinement at each stage allows customers to break down a complex problem into smaller, targeted verification jobs and establish a feedback loop to and from the backend team.

The Veloce Emulation platform from Mentor allows users to approach power management verification at the system level, where both software and hardware operate together with real-world stimulus applied to the design under test. The speed of emulation lets designers and verification teams boot the operating system and run application stress tests on the design through a very large number of power sequences extremely quickly. The Veloce Emulation platform is fully aligned with the Questa® simulator from Mentor, enabling customers to use the same UPF files and UPF constructs for both simulation and emulation.

The emulation team at Mentor has worked with customers to create a system-level power management verification methodology that achieves thorough verification of the interactions between software and hardware and confirms that system resources are powered appropriately in each functional mode. This makes the Veloce Emulation Platform a logical choice for power management verification for companies who see advantages in using standards and avoiding non-standard methodologies. The Veloce Emulation Platform complies with the IEEE 1801 Unified Power Format (UPF 2.0 and 2.1) standards; including comprehensive constructs support and debug capabilities.

Veloce Redefines Power Analysis Flow

Tuesday, June 9th, 2015

thumbnail

Mentor Graphics Corp. released the Veloce® Power Application software that enables accurate, timely and efficient power analysis at the system, RTL and gate level for complex SoC designs.

Power continues to be a primary concern for handheld and smart devices with high resolutions screens that require long battery life, and even wall-plugged equipment in a datacenter or in a network configuration needs to reduce operation costs. Using FinFET process technology reduces static leakage, yet dynamic power remains a challenge.

A new usage model for handheld and smart devices is driving a methodology shift in the way power is analyzed.  One primary driver in this shift is the fact that complex SoC designs are now verified using live applications that require booting the OS and running software applications on an emulator. It is more effective to use the power switching activity plot, generated during emulation, to pass real-time switching activity information to power analysis tools where potential power issues can be evaluated.

“The ITRS report, one of my many primary research projects, has emphasized the issues related to dynamic power for several years,” said Gary Smith, founder and chief analyst, Gary Smith EDA. “A new approach to the transfer of power switching activity data captured during emulation is the right direction for the industry.”

When designs with significant software content are run on an emulator, the current method of generating activity data creates files (like FSDB) that are too large for power analysis tools to handle practically.

The Veloce Power Application replaces the file-based power analysis flow with a Dynamic Read Waveform API integration to power analysis tools.  This Dynamic Read Waveform API approach captures the information from the power switching activity plot and transfers that data to power analysis tools. This enables accurate power calculation at the system level, better power exploration at RTL for power budgeting and tradeoffs as well as more accurate power analysis and sign-off at the gate level.

The result is a significant boost in runtime and performance. The typical approach of running the emulator, creating the file, reading the file into the power analysis tool and running the power analysis tool is now, with this new approach, reduced to the emulator and power analysis runtimes.

Current early access partners and customers have seen up to a 4.5X runtime performance improvement.

“Today we have redefined the power analysis flow,” said Eric Selosse, vice president and general manager of the Mentor Emulation Division. “The Veloce Power Application is a proof point to show that a new methodology that captures real power consumption during emulation and effectively passes that information to power analysis tools is more efficient.”

Delivering this integration with an ecosystem of industry-recognized power analysis tool providers is essential to redefining the power analysis flow. The first Veloce Power Application ecosystem partner is ANSYS® with PowerArtist™.

“This collaboration addresses the challenges for designers of energy-efficient IP and SoC designs in various IoT verticals,” said Vic Kulkarni, Sr. vice president and general manager, RTL power business, at Apache division of ANSYS. “With our industry leading PowerArtist solution, we are delighted to be the premier partner in the Veloce Power Application ecosystem, and to work so closely with a technology leader in hardware emulation.”

The Veloce Power Application integration with ANSYS PowerArtist is available to mutual customers on a limited basis.  Full production release is scheduled for early Q4/CY 2015.

For a technical whitepaper visit: http://www.mentor.com/products/fv/techpubs/download/?id=90538

MicroWatt Chips shown at ISSCC

Thursday, March 5th, 2015

thumbnail

By Ed Korczynski, Sr. Technical Editor

With much of future demand for silicon ICs forecasted to be for mobile devices that must conserve battery power, it was natural for much of the focus at the just concluded 2015 International Solid State Circuits Conference (ISSCC) in San Francisco to be on ultra-low-power circuits that run on mere microWatts (µW). From analog to digital logic to radio-frequency (RF) chips and extending to complete system-on-chip (SoC) prototypes, silicon IC functionality is being designed with evolutionary and even revolutionary reductions in the operational power needed.

The figure shows a multi-standard 2.4 GHz radio that was co-developed by imec, Holst Centre, and Renesas using a 40nm node CMOS process. This was detailed in session 13.2 when Y.H. Liu presented “A 3.7mW-RX 4.4mW-TX Fully Integrated Bluetooth Low-Energy/IEEE802.15.4/Proprietary SoC with an ADPLL-Based Fast Frequency Offset Compensation in 40nm CMOS.” It uses a digital-intensive RF architecture tightly integrated with the digital baseband (DBB) and a microcontroller (MCU), and the digital-intensive RF design reduces the analog core area to 1.3mm2, and the DBB/MCU/SRAM occupies an area of 1.1mm2. This is an evolution of a previous 90nm RF front-end design that results in a reduced supply voltage (20 percent), power consumption (25 percent), and chip area (35 percent).

Ultra-low-power multi-standard 2.4 GHz radio compliant with Bluetooth Low Energy and ZigBee, co-developed by imec, Holst Centre, and Renesas. (Source: Renesas)

“From healthcare to smart buildings, ubiquitous wireless sensors connected through cellular devices are becoming widely used in everyday life,” said Harmke De Groot, Department Director at imec. “The radio consumes the majority of the power of the total system and is one of the most critical components to enable these emerging applications. Moreover, a low-cost area-efficient radio design is an important catalyst for developing small sensor applications, seamlessly integrated into the environment. Implementing an ultra-low power radio will increase the autonomy of the sensor device, increase its quality, functionality and performance and enable the reduction of the battery size, resulting in a smaller device, which in case of wearable systems, adds to user’s comfort.”

When most ICs were used in devices and systems that were powered by line current there was no advantage to minimizing power consumption, and so digital CMOS circuits could be designed with billions of transistors switching billions of times each second resulting in sufficient brute-force power to solve most problems. With power-consumption now a vital aspect of much of the demand for future chips, this year’s ISSCC offered the following tutorials on low-power chips:

  • “Ultra Low Power Wireless Systems” by Alison Burdett of Toumaz Group (UK),
  • “Low Power Near-threshold Design” by Dennis Sylvester of University of Michigan, and
  • “Analog Techniques for Low-Power Circuits” by Vadim Ivanov of Texas Instruments.

Then on Thursday the 26th, an entire short course was offered on “Circuit Design in Advanced CMOS Technologies:  How to Design with Lower Supply Voltages.” with lectures on the following:

  • “A Roadmap to Lower Supply Voltages – A System Perspective” by Jan M. Rabaey of UC Berkeley,
  • “Designing Ultra-Low-Voltage Analog and Mixed-Signal Circuits” by Peter Kinget of Columbia University,
  • “ACD Design in Scaled technologies” by Andrea Baschirotto of University of Milan-Bicocca, and
  • “Ultra-Low-Voltage RF Circuits and Transceivers” by Hyunchoi Shin of Kwangwoon University.

µW SoC Blocks

Session 5.10 covered “A 4.7MHz 53µW Fully Differential CMOS Reference Clock Oscillator with -22dB Worst-Case PSNR for Miniaturized SoCs” by J. Lee et al. of the Institute of Microelectronics (Singapore) along with researchers from KAIST and Daegu Gyeongbuk Institute of Science and Technology in Korea. While many SoCs for the IoT are intended for machine-to-machine networks, human interaction will still be needed for many applications so session 6.7 covered “A 2.3mW 11cm-Range Bootstrapped and Correlated-Double-Sampling (BCDS) 3D Touch Sensor for Mobile Devices” by L. Du et. al. from UCLA (California).

As indicated by the low MHz speed of the clock circuit referenced above, the only way that these ICs can consume 1/1000th of the power of mainstream chips is to operate at 1/1000th the speed. Also note that most of these chips will be made using 90nm- and 65nm-node fab processes, instead of today’s leading 22nm- and 14nm-node processes, as evidenced by session 8.3 covered “A 10.6µA/MHz at 16MHz Single-Cycle Non-Volatile Memory-Access Microcontroller with Full State Retention at 108nA in a 90nm Process” by V.K. Singhal et al. from the Kilby Labs of Texas Instruments (Bangalore, India). Session 18.3 covered “A 0.5V 54µW Ultra-Low-Power Recognition Processor with 93.5% Accuracy Geometric Vocabulary Tree and 47.5 Database Compression” by Y. Kim et al. of KAIST (Daejeon, Korea).

In the Low Power Digital sessions it was natural that ARM Cortex chips were the basis for two different presentations on ultra-low power functionality, since ARM cores power most of the world’s mobile processors, and since the RISC architecture of ARM was deliberately evolved for mobile applications. Session 8.1 covered “An 80nW Retention 11.7pJ/Cycle Active Subthreshold ARM Cortex-M0+ Subsystem in 65nm CMOS for WSN Applications” by J. Myers et al. of ARM (Cambridge, UK). In the immediately succeeding session 8.2, W. Lim et al. of the University of Michigan (Ann Arbor) presented on the possibilities for “Batteryless Sub-nW Cortex-M0+ Processor with Dynamic Leakage-Suppression Logic.”

nW Beyond Batteries

Session 5.4 covered “A 32nW Bandgap Reference Voltage Operational from 0.5V Supply for Ultra-Low Power Systems” by A. Shrivastava et al. of PsiKick (Charlottesville, VA). PsiKick’s silicon-proven ultra-low-power wireless sensing devices are based on over 10 years of development of Sub-Threshold (Sub-Vt) devices. They are claimed to operate at 1/100th to 1/1000th of the power budget of other low-power IC sensor platforms, allowing them to be powered without a battery from a variety of harvested energy sources. These SoCs include full sensor analog front-ends, programmable processing and memory, integrated power management, programmable hardware accelerators, and full RF (wireless) communication capabilities across multiple frequencies, all of which can be built with standard CMOS processes using standard EDA tools.

Extremely efficient energy harvesting was also shown by S. Stanzione et al. of Holst Centre/ imec/KU Leuven working with OMRON (Kizugawa, Japan) in session 20.8 “A 500nW Battery-less Integrated Electrostatic Energy Harvester Interface Based on a DC-DC Converter with 60V Maximum Input Voltage and Operating From 1μW Available Power, Including MPPT and Cold Start.” Such energy harvesting chips will power ubiquitous “smarts” embedded into the literal fabric of our lives. Smart clothes, smart cars, and smart houses will all augment our lives in the near future.

—E.K.

An EDA view of semiconductor manufacturing

Thursday, July 24th, 2014

By Gabe Moretti, Contributing Editor

The concern that there is a significant break between tools used by designers targeting leading edge processes, those at 32 nm and smaller to be precise, and those used to target older processes was dispelled during the recent Design Automation Conference (DAC).  In his address as a DAC keynote speaker in June at the Moscone Center in San Francisco Dr. Antun Domic, Executive Vice President and General Manager, Synopsys Design Group, pointed out that advances in EDA tools in response to the challenges posed by the newer semiconductor process technologies also benefit designs targeting older processes.

Mary Ann White, Product Marketing Director for the Galaxy Implementation Platform at Synopsys, echoed Dr. Domic remarks and stated:” There seems to be a misconception that all advanced designs needed to be fabricated on leading process geometries such as 28nm and below, including FinFET. We have seen designs with compute-intensive applications, such as processors or graphics processing, move to the most advanced process geometries for performance reasons. These products also tend to be highly digital. With more density, almost double for advanced geometries in many cases, more functionality can also be added. In this age of disposable mobile products where cellphones are quickly replaced with newer versions, this seems necessary to remain competitive.

However, even if designers are targeting larger, established process technologies (planar CMOS), it doesn’t necessarily mean that their designs are any less advanced in terms of application than those that target the advanced nodes.  There are plenty of chips inside the mobile handset that are manufactured on established nodes, such as those with noise cancellation, touchscreen, and MEMS (Micro-Electronic Sensors) functionality. MEMS chips are currently manufactured at the 180nm node, and there are no foreseeable plans to move to smaller process geometries. Other chips at established nodes tend to also have some analog capability, which doesn’t make them any less complex.”

This is very important since the companies that can afford to use leading edge processes are diminishing in number due to the very high ($100 million and more) non recurring investment required.  And of course the cost of each die is also greater than with previous processes.  If the tools could only be used by those customers doing leading edge designs revenues would necessarily fall.

Design Complexity

Steve Carlson, Director of Marketing at Cadence, states that “when you think about design complexity there are few axes that might be used to measure it.  Certainly raw gate count or transistor count is one popular measure.  From a recent article in Chip Design a look at complexity on a log scale shows the billion mark has been eclipsed.”  Figure 1, courtesy of Cadence, shows the increase of transistors per die through the last 22 years.

Fig 1

Steve continued: “Another way to look at complexity is looking at the number of functional IP units being integrated together.  The graph in figure 2, provided by Cadence, shows the steep curve of IP integration that SoCs have been following.  This is another indication of the complexity of the design, rather than of the complexity of designing for a particular node.  At the heart of the process complexity question are metrics such as number of parasitic elements needed to adequately model a like structure in one process versus another.”  It is important to notice that the percentage of IP blocks provided by third parties is getting close to 50%.

Fig 2

Steve concludes with: “Yet another way to look at complexity is through the lens of the design rules and the design rule decks.  The graphs below show the upward trajectory for these measures in a very significant way.” Figure 3, also courtesy of Cadence, shows the increased complexity of the Design Rules provided by each foundry.  This trend makes second sourcing a design impossible, since having a second source foundry would be similar to having a different design.

Fig 3

Another problem designers have to deal with is the increasing complexity due to the decreasing features sizes.  Anand Iyer, Calypto Director of Product Marketing, observed that: “Complexity of design is increasing across many categories such as Variability, Design for Manufacturability (DFM) and Design for Power (DFP). Advanced geometries are prone to variation due to double patterning technology. Some foundries are worst casing the variation, which can lead to reduced design performance. DFM complexity is causing design performance to be evaluated across multiple corners much more than they were used to. There are also additional design rules that the foundry wants to impose due to DFM issues. Finally, DFP is a major factor for adding design complexity because power, especially dynamic power is a major issue in these process nodes. Voltage cannot scale due to the noise margin and process variation considerations and the capacitance is relatively unchanged or increasing.”

Impact on Back End Tools.

I have been wondering if the increasing dependency on transistors geometries and the parasitic effects peculiar to each foundry would eventually mean that a foundry specific Place and Route tool would be better than adapting a generic tool to a Design Rules file that is becoming very complex.  I my mind complexity means greater probability of errors due to ambiguity among a large set of rules.  Thus by building rules specific Place and Route tools would directly lower the number of DR checks required.

Mary Ann White of Synopsys answered: “We do not believe so.  Double and multiple patterning are definitely newer techniques introduced to mitigate the lithographic effects required to handle the small multi-gate transistors. However, in the end, even if the FinFET process differs, it doesn’t mean that the tool has to be different.  The use of multi patterning, coloring and decomposition is the same process even if the design rules between foundries may differ.”

On the other hand Steve Carlson of Cadence shares the opinion.  “There have been subtle differences between requirements at new process nodes for many generations.  Customers do not want to have different tool strategies for second source of foundry, so the implementation tools have to provide the union of capabilities needed to enable each node (or be excluded from consideration).   In more recent generations of process nodes there has been a growing divergence of the requirements to support

like-named nodes. This has led to added cost for EDA providers.  It is doubtful that different tools will be spawned for different foundries.  How the (overlapping) sets of capabilities get priced and packaged by the EDA vendors will be a business model decision.  The use model users want is singular across all foundry options.  How far things diverge and what the new requirements are at 7nm and 5nm may dictate a change in strategy.  Time will tell.”

This is clear for now.  But given the difficulty of second sourcing I expect that a de4sign company will choose one foundry and use it exclusively.  Changing foundry will be almost always a business decision based on financial considerations.

New processes also change the requirements for TCAD tools.  At the just finished DAC conference I met with Dr. Asen Asenov, CEO of Gold Standard Simulations, an EDA company in Scotland that focuses on the simulation of statistical variability in nan-CMOS devices.

He is of the opinion that Design-Technology Co-Optimization (DTCO) has become mandatory in advanced technology nodes.  Modeling and simulation play an increasing important role in the DTCO process with the benefits of speeding up and reducing the cost of the technology, circuit and system development and hence reducing the time-to-market.  He said: “It is well understood that tailoring the transistor characteristics by tuning the technology is not sufficient any more. The transistor characteristics have to meet the requirement for design and optimization of particular circuits, systems and corresponding products.  One of the main challenges is to factor accurately the device variability in the DTCO tools and practices. The focus at 28nm and 20nm bulk CMOS is the high statistical variability introduced by the high doping concentration in the channel needed to secure the required electrostatic integrity. However the introduction of FDSOI transistors and FinFETs, that tolerate low channel doping, has shifted the attention to the process induced variability related predominantly to silicon channel thickness or shape  variation.”  He continued: “However until now TCAD simulations, compact model extraction and circuit simulations are typically handled by different groups of experts and often by separate departments in the semiconductor industry and this leads to significant delays in the simulation based DTCO cycle. The fact that TCAD, compact model extraction and circuit simulation tools are typically developed and licensed by different EDA vendors does not help the DTCO practices.”

Ansys pointed out that in advanced finFET process nodes, the operating voltage for the devices have drastically reduced. This reduction in operating voltage has also lead to a decrease in operating margins for the devices. With several transient modes of operation in a low power ICs, having an accurate representation of the package model is mandatory for accurate noise coupling simulations. Distributed package models with a bump resolution are required for performing Chip-Package-System simulations for accurate noise coupling analysis.

Further Exploration

The topic of Semiconductors Manufacturing has generated a large number of responses.  As a result the next monthly article will continue to cover the topic with particular focus on the impact of leading edge processes on EDA tools and practices.

This article was originally published on Systems Design Engineering.

Solid State Watch: May 30-June 5, 2014

Friday, June 6th, 2014
YouTube Preview Image

The Week in Review: June 6, 2014

Friday, June 6th, 2014

After two years of decline, fab equipment spending for Front End facilities in 2014 is expected to increase 24 percent in 2014 (US$35.7 billion), according to the May 2014 SEMI World Fab Forecast Report released this week.

This week, the Society for Information Display (SID) unveiled the winners of its prestigious 19th annual Display Industry Awards.

The Semiconductor Industry Association (SIA) this week announced that worldwide sales of semiconductors reached $26.34 billion for the month of April 2014.

Imec announced this week that it is collaborating with Samsung Electronics to accelerate innovation and collaboration among technology companies and researchers working in the burgeoning mobile wearable field.

Synopsys, Inc. and Intel Corporation this week announced broad SoC design enablement for Intel’s 14nm Tri-Gate process technology for use by customers of Intel Custom Foundry.

Next Page »