Design Article

Fulcrum Microsystems: is it about packet switching, or asynchronous design

Ron Wilson

9/6/2011 12:32 PM EDT

At Hot chips last month Fulcrum Microsystems, purveyor of high-speed packet-switching ICs, made probably its last appearance as an independent company, delivering one last paper before the silicon gates of Intel clanged shut behind them. The paper described Alta, a 1 Gpacket/s packet-processing switch. And, in describing Alta’s development process, the paper also raised a fascinating question: did Intel buy Fulcrum for the little company’s packet-switching skills, or for its ability at asynchronous circuit design?

The story starts with Bali, Fulcrum’s current chip. Bali includes a RAM-based crossbar switch, a packet scheduler, and a frame-processing pipeline that examines packets and decides what to do with them (figure 1.) The crossbar is implemented in Fulcrum’s signature asynchronous logic: dual-rail domino logic with a third rail for flow control. The scheduler and frame processor are done in conventional synchronous RTL.

Figure 1: Fulcrum's Mali architecture combined an asynchronous switch with synchronous logic for packet-handling.
Fulcrum director of IC development Mike Davies observed that while the switch had been the center of attention in the first-generation chip, the added speed and the increasing number of protocols required of Bali grew the frame processor into a huge design task. “It took 10 million gates and most of the development time,” Davies said.

After that experience, the design team faced Alta’s design requirements with trepidation. Alta had a far longer list of packet-processing protocols to support. The design team reckoned that the logic would have twice the complexity of Bali’s. Worse, many of the protocols were still fluid during the chip development process, so the only alternative was to make the frame processor programmable. Alta would also require three times the performance of Bali.

The original plan was to reuse many blocks from Bali and implement two synchronous pipelines. But it quickly became apparent that even using TSMC’s 65 GP process—Bali uses 130nm—the plan wasn’t likely to make the performance goals.

So Fulcrum called upon its superweapon: asynchronous logic. The company had actually been founded to commercialize asynchronous design techniques developed in the university environment. But they had found the best approach was to focus on applications and downplay the logic-design style. Now, however, like the spurned crazy uncle who turns out to have the family jewels, asynchronous design became the focus again.

“Asynchronous design has its value and its limitations,” Davies explained. “It is very suitable for building SRAM, TCAM, and multiplexers. But it has turned out to be less useful for random logic. You need to be going really fast—over 1 GHz—for asynchronous random logic to be area-competitive. And there are issues with timing closure: the handshaking process creates cyclic timing constraints.” That posed a challenge: was it possible to forget about hard-wired logic, forget about programmable state machines, and decompose packet-processing into functions that asynchronous design does well?

“If you look at it, most of the functions in packet handling are made up of just a few operations,” Davies observed. “You have pattern matching, guarded assignments, multiplexing, and table look-ups. It turns out that you can implement these operations quite well using TCAMs, SRAMs, multiplexers, and what we call action logic.”

Using this decomposition, Fulcrum’s team designed a 14-stage pipeline (figure 2.) “Each stage contains some combination of all four modules,” Davies said. “But each stage is programmable, and tailored to a particular class of operations”

Figure 2: Alta's 14-stage pipeline includes specialized combinations of the four basic modules.
The design met its performance target, handling 1.1 billion packets/s through a single microcoded pipeline, which Davies claimed could handle just about any realistic standard. The chip has 1.2 billion transistors, just under half of which were created by the asynchronous design flow Fulcrum invented for the project. “We exceeded our area and schedule targets, but the chip is at the fab,” Davies said.
So we are left with an interesting question. Was Intel attracted to Fulcrum by the possibility of dominating, and then integrating, packet switches in the data center? Or does Intel have problems of its own—problems that decompose into TCAMs, SRAMs, and multiplexers? For that matter, is Fulcrum’s process of decomposing complex functions into large macro-functions instead of into RTL text the beginning of a new design paradigm? It may be a while before we hear an answer to any of those questions.




iniewski

9/8/2011 2:08 PM EDT

Very interesting use of asynchronous logic. Are there any other examples of use of asynchronous design in commercial products? There was lot of research on this topic in the last several years but I was not aware of commercial successes...Kris

Sign in to Reply



Ron Wilson, Embedded.com

9/13/2011 6:19 PM EDT

Chris:
It may not be obvious, but much of the SmartCard technology in Europe depends on low-power MCUs designed with asynchronous logic. Also, many of the critical circuits in large CPUs are in fact self-timed in one way or another. But you are absolutely right that no one makes a big deal about their use of asynchronous design in commercial products.
ron

Sign in to Reply



iniewski

9/13/2011 6:58 PM EDT

thank you Ron, good to know that Smart Cards are designed with asynchronous logic...another field that uses that design style is medical imaging where I am quite active (certain designs only)...Kris

Sign in to Reply



Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)