Recently, arenna, a start-up company of analog AI chips in the United States, disclosed its AI chip architecture for the first time at the tiny ml summit. Different from the general analog AI chip, the chip adopts SRAM array, and integrates analog-to-digital converter (ADC) and digital to analog converter (DAC) in the memory array.
In short, DAC is to convert the digital quantity represented by binary code or BCD code into its proportional analog quantity output, while ADC is to convert the continuous analog signal into digital signal.
However, these two functions usually occupy most of the chip area and power consumption in memory computing. Therefore, integrating these two functions in the memory array can further reduce the memory power consumption, and its computing performance also has greater room for improvement. The American semiconductor magazine eetimes thinks that the chip may change the analog computing technology.
1、 NSF seed funding, unique array design or breakthrough data conversion bottleneck
Founded in 2019, arenna received seed funding from the US National Science Foundation in the form of small business innovation research (SBIR) grants, totaling US $225000. The company has two patents on its architecture.
Areanna's two founders, behdad youssefi and Patrick satarzadeh, are all from Tektronix, an electronic measurement company. They are also areanna's only full-time employees.
In addition, areanna has two part-time engineers and several consultants. In 2020, the start-up released a test chip with a tile that can perform partial matrix multiplication. The benchmark power efficiency of the chip is 40tops / W, the operation density is 2TPS / mm2, and the memory bandwidth of each core is 2TB / s.
Arenna's test chip runs on an architecture called in memory computing and quantization (cqim). The architecture is based on analog in memory computing technology, which is basically consistent with the concept of mythic, gyrfalcon and other AI chip start-ups. However, arenna uses the SRAM array instead of the commonly used nonvolatile memory and comes with some unique technologies.
Due to the advantages of AI in the edge, such as privacy, low latency and effective utilization of network bandwidth, the research of AI edge devices has received more and more attention, but the power consumption of edge devices has always been a big problem. In memory computing is the chip computing in memory, which can reduce the energy consumption of memory access. It is one of the solutions of AI edge.
The design of Aranna's SRAM array is the key to its core technology. The array integrates ADC and DAC functions internally, freeing up the power consumption and area of memory, and further improving the chip performance.
In traditional in memory computing, DAC functions are usually designed on each row / input and ADC is used on each column / output. According to the data of arenna, these two functions occupy 85% of the power consumption and 98% of the silicon area of the chip. On tinyml summit, behdad youssefi said that the traditional simulation method just "replaces the memory bottleneck of von Neumann architecture with data conversion bottleneck".
In areanna's cqim architecture, analog-to-digital and digital to analog conversion are implemented through the same circuit structure as computation, which is called multiplying bit cells (MBCs).
2、 High integrity of analog signal, 100% hardware utilization
Although arenna's chip is based on analog computing, its circuit is almost completely digital, and digital processing technology is used in manufacturing. Youssefi once described its simulation calculation process to eetimes reporter. The chip reads the weight parameters from SRAM bit cell, then inputs them into multiplier for processing, and then converts the signal into electric charge with metal capacitor, and vertically accumulates the results to carry out simulation calculation.
Because the same MBC structure is used in the analog computation, this architecture saves a lot of chip area when the chip performs a / D and D / a conversion. And this design can no longer use ADC sampling circuit, and has made a breakthrough in power consumption.
An important feature of the design is that each dot product calculation only needs one quantization (one analog-to-digital conversion) regardless of the resolution of the analog operation. For the conversion of analog signal and digital signal, because the collected analog signal is continuous and infinite value, it is necessary to carry out quantitative processing in order to obtain better processing finite value. In this process, how to deal with it accurately and quickly is the difficulty.
Youssefi stressed that in other in memory computing architectures, analog AI chips often need to scale the digital signal after conversion. And the chip of arenna will scale the analog signal, and then carry out quantization processing, which retains the integrity of the analog signal.
In the digital to analog conversion, the signal conversion accuracy is often expressed by resolution, which is given by the significant digits of the analog electrical input binary number. Youssefi mentioned that arenna's architecture design provides fully programmable resolution without compromising hardware utilization.
"If you want to provide variable resolution [for other memory computing solutions], you have to significantly reduce hardware utilization," he said. But we will not reduce the hardware utilization from 8 bits to 4 bits and then to 1 bit. Regardless of the resolution, it is still 100% hardware utilization. "
3、 SRAM array has low power consumption and good scalability
In addition, compared with nonvolatile memory devices, SRAM has lower read-write power consumption, so that the chip does not need to introduce a lot of energy from the outside. The low write power consumption of SRAM also makes data flow optimization flexible.
At present, AI chips need to move data and weight from memory to processing unit for machine learning, and then store the intermediate results back to memory. This method is inefficient, and its unnecessary information transmission not only increases the calculation delay, but also increases the corresponding power consumption. These "no value-added" data movements consume a lot of energy. In fact, the weight of data and computing units only consumes a small part of energy.
For a large neural network layer with many weights, keeping the weights fixed may improve the performance effectively. For the network processing high-resolution images, the input activation data is the most data intensive data type, so it may be more meaningful to keep the input activation still.
Arenna's SRAM based architecture allows dual static data stream optimization, that is, two data types can be set to static without additional hardware, which can better reduce hardware power consumption.
"Because our calculations are done in parallel in the simulation domain, we don't actually need to move data," youssefi said. With this architecture, the areanna chip can fix the weight or any data selected by the user, and the partial sum output is always fixed. As a result, the two data types have not changed. " Users can choose the most effective way to set the algorithm (or for a specific layer in the neural network).
According to youssefi, the scalability of many current in memory computing architectures is limited. He mentioned that some architectures optimize power performance through logic technology, while others improve storage density. When these two technologies are put into the same chip, it will be found that the two technologies are not compatible at all.
Arenna's chip does not have this problem. Because its architecture is almost completely based on digital circuits, it can be manufactured in standard CMOS process and is compatible with many other technologies.
Because of the standardized process, the chip can also follow Moore's law and use smaller process nodes. Next, the company plans to build a larger test chip with multiple computing tiles, and the second test chip is expected to be available in 2022.
Conclusion: with the wave of artificial intelligence sweeping all fields, the complexity of AI model is increasing. However, the traditional computing architecture is difficult to meet the future needs of AI edge applications due to energy consumption. Previously, TSMC announced an improved SRAM memory array, which greatly reduces the chip power consumption through in memory computing, proving the feasibility of SRAM array to some extent.
As a continuous data computing method, analog computing can complement digital computing and has great potential. By integrating ADC and DAC functions into the memory array, arenna's chip provides a new way to reduce power consumption and chip area for analog computing.