Architecting Faster Computers

To create faster computers, the industry must take a major step back and re-examine choices that were made half a century ago. One of the most likely approaches involves dropping demands for determinism, and this is being attempted in several different forms.

Since the establishment of the von Neumann architecture for computers, small, incremental improvements have been made to architectures, micro-architectures, implementation, and fabrication technologies. This has led to an amazing increase in computational power. But some of these techniques already have reached the end of the line, while others are getting close to their limit.

Traditional computers will continue to play a role, and significant progress still can be made. But it’s getting more difficult and more expensive. “Performance within CPUs is still expected to rise through standard techniques of increasing prediction capabilities alongside increasing execution width and depth,” says Peter Greenhalgh, vice president of technology and fellow at Arm. “New instructions and features can provide more performance on the latest applications. Software optimization in libraries and compilers remains important, especially with new workloads. Within the system, more cache, more cache bandwidth, and how that cache will be managed will increase overall chip performance.”

There are more gains to be found in implementation and fabrication. “For 30 years, EDA algorithms and heuristics have been driving design quality within pre-determined constraints and assumptions,” says James Chuang, product marketing manager for Fusion Compiler at Synopsys. “Some assumptions were made to simplify optimization complexity, but with modern software, hardware, and AI compute power, some assumptions are now limiting the potential of design PPA. One example is voltage level. Voltage levels have long been kept constant during optimization and explored only post-silicon, where findings can only indirectly influence the next revisions. Introducing voltage level as a vast new exploration space during EDA optimization unlocks significant gains, especially in total power, where voltage has a square impact on switching power (CV2). We expect the industry to continue breaking the molds and find new dimensions for optimization.”

Still, bigger improvements are possible — but only by rethinking everything we know about computers.

The chip industry has benefited over the past 50 years of computing from deterministic hardware. This was no accident. But it was a costly decision, and one that chip design no longer may be able to afford.

The power it takes to run thousands or millions of computations concurrently, and to amalgamate those results such that the result cannot be swayed by quantum effects, is basically what is happening inside of every logic gate. As those gates become smaller, the number of simultaneous experiments is diminishing, and the true quantum nature of them is being revealed. Gates are becoming less reliable because of variability in manufacturing, or by variations in environmental conditions.

So how do we approach non-determinism? “Design and verification for non-deterministic (anything that does not have a 100% probability) behavior has always been a challenge,” says Michael Frank, fellow and system architect at Arteris IP. “Examples include asynchronous clock boundaries, schedulers, probabilistic network (queuing) models, or approximate computation. One way to address this is by randomizing stimuli and comparing the results to expected distributions instead of a fixed value. These types of systems are not really that new. They used to be called ‘fuzzy logic,’ a term introduced by Dr. Lotfi Zadeh of the University of California at Berkeley in 1965.”

Embracing less accuracy
Non-deterministic computing is showing up in several forms. Some of them are non-deterministic in what each execution will provide as an answer, some are non-deterministic in that the output for a particular input cannot be determined. Artificial intelligence (AI) inference is deterministic, in that the same result is achieved every time when the conditions are identical, but it is not end-to-end deterministic in a statistical sense, because a very minor change in conditions can create a completely unknown response. When you present an inference engine with an image, or sound — or whatever the input type is — that it has never encountered before, you do not know for certain what the result will be.

To go from a trained model to a deployed inference engine on the edge requires quantization, and that can add another form of non-deterministic error. Quantization reduces the size of the data types used for inference, and those results often are checked back against the original samples to see how much inaccuracy it adds.

“What operand lengths do you need for the signals, the weights, etc., to get the desired degree of fidelity in the result, even though it’s a non-deterministic machine?” asks Ravi Subramanian, senior vice president and general manager for Siemens EDA. “We have to define the notion of fidelity for a particular purpose. This is the fidelity of the result when you’re trying to do image recognition, or fidelity of the result when you’re trying to predict a heart waveform. There is no rigorous science available for this today, and people are trying to develop methods where they look at this as a multi-variate optimization problem. You have a set of operand lengths you would like to optimize under a set of constraints. The way the learning occurs is based on the set of input data that comes in. It is a very complicated problem, which today is really driven by heuristics.”

But what does it mean for results to be 2% less accurate? How much additional learning has to be done to improve that and what are the implications for accuracy required in inference? “The history of computer architecture has gone in waves,” adds Subramanian. “The components of the wave are workload, architecture, efficiency. Today, the main driver is for workloads. There is a plethora of architectures, but in many cases, we have architectures in search of a workload. This is simply a telling point in the evolution of the technology hype curve. Until you have a well-established contract with a dominant workload, you won’t be able to explore the efficiency phase. How much power am I spending to get that additional 1% accuracy in the result when I’m inferencing, because my training is better? Until we are in the efficiency phase, we will be wasting a lot of energy.”

Along with a reduction in the fidelity of the results, new underlying execution hardware is being considered, which is not fully deterministic or error-free.

One option involves the re-introduction of analog computing, which is highly efficient at doing the multiply accumulate functions required for machine learning (ML). It can do the work faster while consuming a fraction of the power. Unfortunately, analog is easily influenced by manufacturing variation and other things going on around it. Noise, power fluctuations, temperature, and many other environmental factors can influence results. Researchers are working to remove as much as that variability as possible, but in many cases it comes at the expense of resolution.

“Analog has tremendous benefits from a power density and performance point of view,” says Tim Vehling, senior vice president of product and business development at Mythic. “Analog as a compute technology for edge AI is just getting going. AI, by definition, is statistical. It’s not a precise calculation. You train the model in a statistical way, and then your model runs, and deep learning scientists don’t know exactly how or why it does what it does. They can’t explain how it works. It’s not like a mathematical equation. There is a lot of deep-learning black magic in between. Because of the nature of AI, analog lends itself quite well because AI is already noise-immune. It deals with noise by the way you train the model.”

Another technique that is good at doing matrix computations is photonics. “You can do the multiply and accumulate more power-efficiently in optics,” says Priyank Shukla, staff product marketing manager at Synopsys. “You can multiply wavelengths in an easier way, in more power-efficient way, compared to how you can multiply currents using transistors.”

It is still early days for photonic computation, and this technology is also highly influenced by environmental conditions.

Quantum computation
Another form of non-deterministic hardware that is seen as the new frontier of computing is quantum computing. These computers are likely to be able to solve problems way beyond the scope of traditional deterministic computers, but they also produce a wide range of answers each time the same question is asked of them. Attempts are being made to reduce the error levels in these types of machines, but their very nature suggests that it will be the responsibility of software to deal with uncertainty.

Quantum computers are a family of devices, not a single architecture. “In conventional computers you may have x86 and Arm, and everyone knows the difficulties associated with comparing those,” says Joel Wallman, R&D operating manager at Keysight Technologies. “Comparing any two quantum computers is that same kind of comparison. There are a few constellations, usually based on physical implementations, that are much easier to compare to each other. For instance, all superconducting qubits are fairly comparable in terms of running in similar timescales, and they will have similar error rates. They will also have similar expressability, meaning you would need a similar number of primitive instructions to run the same program on each of them. With this type of machine you can parallelize your primitive instructions, but with ion traps you generally can’t. To parallelize primitive instructions with ion traps you need more lasers, because those are the things that actually execute the instructions — shining lasers at the trapped ions. And there’s only so many lasers you can shine into a small point while keeping it well controlled.”

Today, quantum computing is plagued by high error rates, making it hard to define how fast they are. “You are trying to assess the performance of something that is simply not performing well enough to actually perform the computation,” adds Wallman. “The simplest metric that I think can be used is, ‘What’s the largest number of qubits that you can actually use and have a reasonable chance of getting the right answer?’ Alternatively, you can ask about the actual number of instructions that you can implement on a quantum computer before you get junk.”

IBM basically has combined those two elements to create a figure of merit for quantum computers. “It’s called Quops,” says Wallman. “They’re talking about how many layers of operations can be done per second. With superconducting qubits, you’re talking hundreds or thousands, but for ion traps you’re talking about a much lower number, although the ion traps tend to be a little more accurate.”

There are other significant differences between conventional computing and quantum and photonics, such as neither technology has a natural means of storing information. “Researchers are still questioning the state of art with respect to what are computational elements, storage elements and what that means about the types of data flow that will compute well,” says Subramanian. “People are learning about how they can use it effectively for tasks where they can dramatically change the timescale when looking at very large-scale problems.”

Getting statistical
If hardware does not provide the same results each time it is executed, or the fidelity of results is uncertain, it means that software has to become statistical. “I believe we need to learn a lot more about statistics,” says Arteris’ Frank. “Statistical methods — randomization, simulated annealing — and approximations have been in use for back-end tools and verification for a while, and propagating them forward requires some thought. Applying these paradigms to ‘digital’ circuits somehow strikes me to be similar to converting logic back into analog representation. Results coming out of neural network components might require some type of ‘analog’ interpretation.”

It also will require a rethink in terms of programming skills. “While programming has become a basic skill, it has to be a 101 course,” says Anoop Saha, senior manager, strategy and business development at Siemens EDA. “Then it needs to be specialized into various domains. We need more knowledge about algorithms and data science. The ideas and knowledge of statistics, and the idea of mathematical modeling and data science, are more fundamental than machine learning.”

To get to the next plateau of computing will require more than small incremental changes, and the most fundamental tenets that will have to be given up are determinism and fidelity. That impacts hardware and software. Of the two, it is software that will see the greatest impact. This will limit the kinds of problems it can be directed toward until the necessary foundational building blocks have been put in place. In the meantime, the vast majority of computational needs must be satisfied by those incremental changes to traditional compute architectures.