Powering LLMs with Enhanced Optical Neural Networks


ChatGPT’s capacity to produce polished essays, emails, and code in response to a few simple inquiries has garnered international attention. Researchers at MIT has reported a method that has the potential to pave the way for machine-learning programs many times more capable than the one responsible for ChatGPT. Furthermore, their technology might consume less energy than the state-of-the-art supercomputers powering today’s machine-learning models.

The team reports the first experimental demonstration of the new system, which uses hundreds of micron-scale lasers to perform computations based on the movement of light rather than electrons. The new system is more than 100 times more energy efficient than current state-of-the-art digital computers for machine learning and 25 times more powerful in compute density.

Moreover, they note “substantially several more orders of magnitude for future improvement.” This, the scientists add, “opens an avenue to large-scale optoelectronic processors to accelerate machine-learning tasks from data centers to decentralized edge devices.” In the future, little devices like cell phones may be able to execute programs that can only be computed at massive data centers.

< />

Massive machine learning models that mimic the brain’s information processing are the basis of deep neural networks (DNNs) like the one powering ChatGPT. While machine learning is expanding, the digital technologies powering today’s DNNs are plateauing. In addition, they are often only found in very big data centers due to their extreme energy needs. This is driving innovation in computing architecture.

The discipline of data science is evolving due to the rise of deep neural networks (DNNs). In response to the exponential expansion of these DNNs, which is taxing the capabilities of traditional computer hardware, optical neural networks (ONNs) have recently evolved to execute DNN tasks at high clock rates, in parallel, and with minimal data loss. Low electro-optic conversion efficiency, huge device footprints, and channel crosstalk contribute to low compute density in ONNs, while a lack of inline nonlinearity causes significant delay. Researchers have experimentally shown a spatial-temporal-multiplexed ONN system to address all of these issues at once. They use neuron encoding using micrometer-scale arrays of vertical-cavity surface-emitting lasers (VCSELs), which are made in large quantities and display excellent electro-optical conversion.

For the first time, researchers provide a small design that addresses these three problems at once. Modern LiDAR remote sensing and laser printing both use this architecture, which is built on vertical surface-emitting lasers (VCSELs) arrays. These measures seem like a two-order-of-magnitude improvement in the near future. The optoelectronic processor provides novel opportunities to speed up machine learning processes across centralized and distributed infrastructures.