Wanted: New Memory Type for Supercomputing
By Mark LaPedus
The shift towards a new class of exascale computers will require new breakthroughs in power management and chip-level technologies like memories, according to a technologist at an event last week sponsored by the IEEE San Francisco Bay Area Nanotechnology Council.
A number of entities are racing each other to develop exascale systems, which are supercomputers that are targeted to run faster than a petaflop — or one quadrillion floating point operations per second. Targeted by 2018 or so, exascale computers would operate at 10¹8 flops and would be used for climate modeling, defense, Internet searching, medicine, physics and other applications.
“The biggest challenge (for exascale computing) is energy and power,” said Matthew Marinella, a senior technical staff member of Sandia National Laboratories, during a presentation at the IEEE event, entitled “Emerging Non-Volatile Memory Technologies. “The next challenge is memory and storage.”
Sandia is developing an exascale system as part of a DARPA program. In the short term, the lab has found a memory solution for an exascale system: Micron Technology Inc.’s Hybrid Memory Cube (HMC). Long term, Sandia is looking at a storage-class memory architecture based on resistive RAMs (ReRAMs).
One of Sandia’s ReRAM candidates is based on the memristor from Hewlett-Packard Co. Memristor-based ReRAMs could be out sooner than later. “We have a significant commercialization effort going on,” said R. Stanley Williams, senior HP fellow and director of quantum science research at HP Labs, during a presentation at the IEEE event, located in Santa Clara, Calif.
At the IEEE event, several other entities also presented details about various next-generation memory technologies, including IBM Corp. and its racetrack memory.
Supercomputer race
For years, there has been an intense competition among nations in the supercomputer field. The U.S. and Japan dominated the field until 2010, when the National Supercomputing Center in Tianjin, China stunned the industry and rolled out the world’s fastest supercomputer: the Tianhe-1A. Built around graphics processor units (GPUs) from Nvidia Inc., the Tianhe-1A is one of the few petascale-class supercomputers in the world.

K computer installed in a room (Source: Fujitsu)
Then, in June of 2011, Japan’s Fujitsu Ltd. took the lead by rolling out the K computer. Based on 705,024 Sparc64 processor cores, the K system became the first computer to top 10 petaflops. At present, the United States is in third place with Jaguar, a supercomputer built by Cray Inc. Based on 224,162 Opteron cores, Jaguar has a peak performance of just over 1.75 petaflops.
Seeking to leapfrog one another, various entities are building exascale systems. In one effort in the United States, Sandia National Laboratories in 2010 was selected as one of four institutions to develop new exascale prototype systems for the Defense Advanced Research Projects Agency (DARPA). As part of a $100 million effort, DARPA also launched the Ubiquitous High Performance Computing (UHPC) program. Intel, Nvidia and Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory are also part of the group.
Sandia, which expects its prototype to be completed by 2018, is a government-owned/contractor operated (GOCO) facility. Sandia Corp., a Lockheed Martin company, manages Sandia for the U.S. Department of Energy’s National Nuclear Security Administration.
There are various roadblocks in developing an exascale system. Power management, software programming, processing power, memory bandwidth and storage are just are a few of the challenges.
Power management is a central issue in supercomputing. For example, IBM’s Roadrunner supercomputer, the first supercomputer to break one petaflop in performance in 2008, runs at 7 megawatts. This is the equivalent energy to power 5,000 homes, Marinella said.
On the memory side, there is a crying need for a new technology in supercomputing. Social networking startup Facebook sees a similar need for its datacenters. DRAMs are cheap, but they are power-hungry devices that are becoming difficult to scale. “DRAM per flop is going down,” Marinella said during the presentation.

Hybrid Memory Cube (HMC). (Source: Micron Technology)
For its exascale project, Sandia is initially looking to use Micron’s HMC, he said. Rolled out last year, HMC is a 3D device that will incorporate DRAM arrays stacked on a logic chip. The device is connected with 2,000 to 3,000 through-silicon vias (TSVs).
To speed up the HMC, Sandia is looking to use optical interconnects. By 2020, Sandia is also proposing the idea of developing what it calls a “Universal Memory Logic Cube.” Based on optical interconnects, ReRAM and possibility a system-on-a-chip, the stacked device is basically an “entire supercomputer on a chip,” Marinella said.
One of the ReRAM candidates for this device is HP’s memristor. On a slide, both HP and Sandia showed a “4D address architecture” based on the memristor. The memristor, short for “memory resistor,” was postulated to be the fourth basic circuit element by Leon Chua of the University of California at Berkeley in 1971.
A memristor is a passive two-terminal electronic device. In memristance, if the flow of a charge is stopped by turning off the applied voltage, this component will “remember” the last resistance that it had, according to HP. HP hopes to commercialize the memristor in the form of a ReRAM. Besides supercomputing, HP and its co-development partner, South Korean memory maker SK Hynix, are looking at several current applications such as storage.

Source: HP
During an interview after his presentation at the IEEE event, Williams said the memristor-based ReRAM is making progress. “We are clearly on a path towards viability,” he said, without elaborating on the details of the proposed ReRAM or 4D architecture.
Like all presenters at the event, Williams said there is a need for a new memory type. DRAMs are running out of gas and floating-gate flash could hit the wall at 14nm.
The new memory types include ReRAMs, MRAMs, phase-change memories, among others. But the trouble is that the newfangled memory types are difficult to scale and make. And if or when these newfangled memories reach production, these devises may not create new markets, but rather they could cannibalize existing markets.
For this reason and others, vendors could potentially delay the introduction or slow the development of these new memory types. “The business issues are trumping the technical issues,” Williams said.
During a separate presentation, Yiming Huai, vice president of technology at Avalanche Technology Inc., said he believes the next-generation memory types will be required sooner than later. Current memory types like NAND suffer from endurance and reliability issues. “Existing memories face increasing challenges beyond 32nm,” he said.
He also disclosed more details about Avalanche’s technology, a spin-torque MRAM. Dubbed AvRAM, Avalanche’s technology is going after the embedded market. The startup has developed a 64-Mbit device based on 65nm technology.
In another presentation, IBM discussed its previously-announced Racetrack memory, which moves data by sliding magnetic bits back and forth along nanowire “racetracks.” Last year, IBM claimed to have measured the time and distance of domain wall acceleration and deceleration in response to electric current pulses. More recently, the company has demonstrated 256-cell, 8 x 32 nanowire device based on a 90nm process, said Stuart Parkin, an IBM fellow at IBM Research in Almaden, during the presentation.















