MIT AI Hardware Program

2025 Symposium

Monday, March 31, 2025 | 10:00 AM - 3:30 PM ET

MIT Schwarzman College of Computing
51 Vassar St, 8th Floor
Cambridge, MA 02139


Artificial Intelligence brain shape in a complex and modern GPU card in shades of purple, blue, and gold

About

The MIT AI Hardware Program is an academia-industry initiative between the MIT School of Engineering and MIT Schwarzman College of Computing. We work with industry to define and bootstrap the development of translational technologies in hardware and software for the AI and quantum age.

Our annual symposium included a keynote talk from professor Elsa Olivetti, reviews of the current project portfolio, presentations on new projects, and a networking reception featuring a poster session and interactive demos.

Agenda

9:30 – 10:00

Registration and Breakfast


10:00 - 10:10

Year in Review & the Year Ahead

Program Co-Leads

Jesús del Alamo, Donner Professor; Professor, Electrical Engineering and Computer Science; MacVicar Faculty Fellow
Aude Oliva, Director of Strategic Industry Engagement, MIT Schwarzman College of Computing; CSAIL Senior Research Scientist


10:10 – 11:40

Project Reviews

Updates on the current research portfolio of the MIT AI Hardware Program.


Increasing Architectural Resilience to Small Delay Faults

Peter Deutsch and Vincent Ulitzsch, PhD Candidates, Electrical Engineering and Computer Science

This research aims to create models for new fault modes in processors, addressing reliability challenges for large-scale data centers. Our research develops methods for designing resilient hardware, guiding cost-effective protection strategies for scalability.

In collaboration with Mengjia Yan, Assistant Professor of Electrical Engineering and Computer Science and Joel S. Emer, Professor of the Practice, Electrical Engineering and Computer Science

Slides

Wafer-Scale 2D Transition Metal Dichalcogenides for Neuromorphic Applications

Jiadi Zhu, Research Affiliate, MIT Research Laboratory of Electronics

This project aims to explore the use of two-dimensional transition metal dichalcogenides (TMDs), such as MoS2 and WSe2, as neuromorphic devices. We will leverage the extremely low leakage current of wide-bandgap TMD materials to develop floating-gate transistors (FGFETs), where changes in charge stored at the floating gate alter the MoS2 channel conductance. Floating gate structures will be first simulated and then experimentally demonstrated on TMD’s grown by metal-organic chemical vapor deposition (MOCVD) at back-end-of-line (BEOL)-compatible temperatures and integrated on a standard silicon CMOS process. For this, we will build on the MoS2 low-temperature 200 mm wafer-scale MOCVD growth technology recently demonstrated by our group, and we will fabricate highly-scaled heterostructure-based devices to ensure reproducible neuromorphic devices with record-low power consumption. We will also explore the impact of different processing steps such as lithography and deposition conditions on the device performance and stability.

In collaboration with Tomás Palacios, Clarence J. Lebel Professor in Electrical Engineering, Electrical Engineering and Computer Science; Director, Microsystems Technology Laboratories, and Jing Kong, Jerry Mcafee (1940) Professor in Engineering, Electrical Engineering and Computer Science

Slides

CIRCUIT: A Benchmark for Circuit Interpretation and Reasoning Capabilities of LLMs

Yan Xu, Ph.D. Candidate, Electrical Engineering and Computer Science

The role of Large Language Models (LLMs) has not been extensively explored in analog circuit design, which could benefit from a reasoning-based approach that transcends traditional optimization techniques. In particular, despite their growing relevance, there are no benchmarks to assess LLMs’ reasoning capability about circuits. Therefore, we created the CIRCUIT dataset consisting of 510 question-answer pairs spanning various levels of analog-circuit-related subjects. The best-performing model on our dataset, GPT-4o, achieves 48.04% accuracy when evaluated on the final numerical answer. To evaluate the robustness of LLMs on our dataset, we introduced a unique feature that enables unit-test-like evaluation by grouping questions into unit tests. In this case, GPT-4o can only pass 27.45% of the unit tests, highlighting that the most advanced LLMs still struggle with understanding circuits, which requires multi-level reasoning, particularly when involving circuit topologies. This circuit-specific benchmark highlights LLMs’ limitations, offering valuable insights for advancing their application in analog integrated circuit design.

In collaboration with Ruonan Han, Associate Professor, Electrical Engineering and Computer Science

Slides

Efficient Large Language Models and Generative AI

Song Han, Associate Professor, Electrical Engineering and Computer Science

The rapid advancement of generative AI, particularly large language models (LLMs), presents unprecedented computational challenges. The autoregressive nature of LLMs makes inference memory bounded. Generating long sequences further compounds the memory demand. Our research addresses these challenges by quantization (SmoothQuant, AWQ, SVDQuant) and KV cache optimization (StreamingLLM, QUEST, DuoAttention). We then present two efficient model architectures—HART, a hybrid autoregressive transformer for efficient visual generation, and VILA-U, a unified foundation model that seamlessly integrates visual understanding and generation in a single model.

Slides

A 14-nm Energy-Efficient and Reconfigurable Analog Current-Domain In-Memory Compute SRAM Accelerator

Aya Amer, Postdoctoral Associate, Research Laboratory of Electronics

This work presents a low-power reconfigurable 12T- SRAM current-domain analog in-memory computing (IMC) SRAM macro design to address non-linearities, process variations, and limited throughput. The proposed design features a time- domain subthreshold multiply and accumulate (MAC) operation with a differential output current sensing technique. Reconfigurable current-controlled design supports different precisions and speeds. A 1kbit macro is prototyped in a 14-nm CMOS process and achieves a measured bitwise energy efficiency of 580 TOPS/W while obtaining highly linear MAC operations. This is the highest energy efficiency reported for IMC current- domain computing methods. In addition, simulation results and estimations based on blocks and 1kb macro measurements show that increasing the macro size to 16kbit can achieve 2128 TOPS/W, which is comparable to other charge domain computing methods. Finally, A fully analog MLP classifier for voice-activity detection (VAD) is prototyped with 3 cascaded analog IMC macros, achieving ~ 90% classification accuracy at 5dB-SNR while consuming 0.58 nJ/classification.

In collaboration with Anantha Chandrakasan, Dean of the School of Engineering and Vannevar Bush Professor of Electrical Engineering and Computer Science

Slides

Photonics for AI | AI for Photonics

Dirk Englund, Professor, Electrical Engineering and Computer Science

The hardware limitations of conventional electronics in deep neural network (DNN) applications have spurred explorations into alternative architectures, including architectures using optical- and/or quantum- domain signal processing signal processing subroutines. This work investigates the scalability and performance metrics—such as throughput, energy consumption, and latency—of various such architectures, with a focus on recently developed hardware error correction techniques, in-situ training methods, initial field trials, as well as extensions into DNN-based inference on quantum signals with reversible, quantum-coherent resources.

Slides

11:40 – 12:00

UROP (Undergraduate Research Opportunities Program) Pitches

Project pitches from undergraduate students funded by the MIT AI Hardware Program.


PERE-Chains: AI-Supported Discovery of Privilege Escalation and Remote Exploit Chains

Cristián Colón, Undergraduate, Engineering and Computer Science

We’re developing PERE-Chains, a new tool that helps discover vulnerabilities in computer networks. It focuses on finding “exploit chains,” which attackers often use to escalate privileges or gain control of multiple computers remotely. PERE-Chains leverages LLMs and AI planning to quickly identify these chains. By pinpointing vulnerabilities that attackers might exploit, our method allows network security teams to prioritize fixes effectively. This not only simplifies security management but also significantly improves network protection by clearly showing which vulnerabilities should be addressed first.

In collaboration with Una-May O’Reilly, Principal Research Scientist, Computer Science & Artificial Intelligence Lab

Slides

Computing with Heat

Caio Silva, Undergraduate, Physics

Heat is often regarded as a waste byproduct of physical processes, something to be minimized and dissipated. However, by carefully designing thermal devices—such as metal alloys in specific shape and structures—it is possible to control heat flow in ways that enable novel applications across various fields. One such application is Computing with Heat, where temperature and thermal currents serve as carriers of information for performing computational operations. Using topology optimization and differential programming, we have developed inverse-designed 2D metal metastructures capable of receiving temperature inputs and executing matrix multiplications through heat conduction. This work lays the foundation for leveraging thermal transport as a computational medium, opening possibilities for energy-efficient analog computing.

In collaboration with Giuseppe Romano, Research Scientist, Institute for Soldier Nanotechnologies

Slides

Simulation of Optical Phase Change Modulator for Analog Photonic Applications

Anthony Donegan, Undergraduate

Thermal and optical simulations of optical phase change material (PCM) modulator geometries for a neuromorphic photonic chip were performed using Lumerical and COMSOL. Optical simulations identified the ideal thickness and length of the PCM modulator and verified a reasonable attenuation through the device. Thermal simulations identified the parameters for the device’s operation. Device fabrication and experimental verification of results are currently being conducted.

In collaboration with Juejun Hu, Professor, Materials Science and Engineering

Slides

DEXO: Hand Exoskeleton System for Teaching Robot Dexterous Manipulation In-The-Wild

Juan Alvarez, Undergraduate, Aeronautics and Astronautics

We introduce DEXO, a novel hand exoskeleton system designed to teach robots dexterous manipulation in-the-wild. Unlike traditional teleoperation systems, which are limited by the lack of haptic feedback and scalability, DEXO enables natural and intuitive control through kinematic mirroring and force transparency. The system’s passive exoskeleton design allows human users to directly control a robot’s dexterous hand, transmitting precise motion and force data for learning complex tasks in real-world environments. Equipped with integrated tactile sensors, DEXO captures high-fidelity interaction data, facilitating manipulation learning without the need for costly hardware or careful engineering. We evaluate the system across multiple dexterous tasks, demonstrating its capability to replicate human-level manipulation and its potential to scale the collection of high-quality demonstration data for training advanced robot learning models. Our experiments show significant improvements in task success rates compared to existing teleoperation method, making DEXO a powerful tool for advancing robot dexterity.

In collaboration with Pulkit Agrawal, Associate Professor, Electrical Engineering & Computer Science

Slides

Ferroelectric Memory Devices for AI Hardware

Tyra Espedal, Undergraduate, Physics

Ferroelectric (FE) memory based on CMOS-compatible Hf0.5Zr0.5O2 (HZO) has emerged as a promising non-volatile memory (NVM) technology for AI hardware due to its potential for low-voltage and fast switching, long data retention, and high memory endurance. In this work, we systematically investigate the wake-up behavior of TiN- and W-based FE-HZO capacitors under repeated triangular sweeps at frequencies ranging from 1.4 Hz to 1 MHz. We find that wakeup is more effective with slow triangular sweep cycling. High-frequency cycling, on the other hand, limits the wake-up effect as a result of domain pinning through high-voltage induced defect generation.

In collaboration with Jesús del Alamo, Donner Professor; Professor, Electrical Engineering and Computer Science; MacVicar Faculty Fellow

Slides

12:00 – 1:00

Lunch & Networking


1:00 – 1:30

Keynote


The Climate and Sustainability Implications of Generative AI

Elsa Olivetti, Professor, Materials Science and Engineering

Generative AI’s meteoric rise and explosive data center growth offers a unique opportunity to pioneer for sustainable, strategic AI deployment coupled with leadership in energy infrastructure modernization and decarbonization. Given the scale of the challenge, meeting unprecedented demand must be done with a mission-driven, holistic and collaborative outlook. To address this challenge, MIT is linking research on energy supply and compute demand, integrating efforts from chip design to workflow management to data center architecture to building footprint to power generation along the entire computing lifecycle with performance tradeoffs at hand and replacement cycles in mind, including how sustainable AI can drive broader societal decarbonization.

Slides

1:30 – 2:30

Highlights: Prospective New Projects

Presentations covering new research projects on AI and hardware.


Hardware-efficient Neural Architectures for Language Modeling

Lucas Torroba Hennigen, PhD Candidate, Computer Science and Artificial Intelligence Laboratory

The Transformer architecture has proven to be effective for modeling many structured domains including language, images, proteins, and more. A Transformer processes an input sequence through a series of “Transformer blocks”, each of which consists of an attention layer followed by a fully connected (FFN) layer. Both types of layers employ highly parallelizable matrix operations, and thus Transformers can take advantage of specialized hardware such as GPUs. However, the complexities of attention/FFN layers are quadratic with respect to sequence length/hidden state size respectively; making Transformers more efficient thus requires alternatives to these fundamental primitives. This proposal will develop efficient variants of attention/FFN layers that (1) enable scaling to longer sequences and larger hidden states, and (2) can make existing models more efficient to deploy on resource-constrained environments. We will couple these layers with hardware-efficient implementations that take advantage of the device on which these models will be trained/deployed.

In collaboration with Yoon Kim, Assistant Professor, Electrical Engineering and Computer Science

Slides

Ferroelectric AI Hardware: Overcoming Conventional Paradigms and Scalability Limits

Suraj Cheema, Assistant Professor, Materials Science and Engineering & Electrical Engineering and Computer Science

In-memory computing (IMC) paradigms comprised of two-terminal memristor-based crossbar arrays of nonvolatile memory elements have emerged as a promising solution to address the growing demand for data-intensive computing and its exponentially rising energy consumption. However, these solutions suffer from poor array scalability due to a lack of self-rectifying behavior, resulting in sneak path issues and additional selector devices. Furthermore, the best performing memristors are often based on emerging materials (2D van der Waals, complex oxides, electrolyte-based) that are not yet compatible with complementary metal-oxide-semiconductor (CMOS) and very large-scale integration (VLSI) processes, impeding high-density array integration. Here, we demonstrate the experimental realization of a self-rectifying memristor combining the ideal switching and rectification behavior of tunnel junctions and diodes, respectively – i.e. a hybrid ferroelectric-ionic tunnel diode (HTD) — in a scalable fabrication flow using the CMOS-compatible materials and VLSI processes employed in modern microelectronics. From a materials perspective, we harness the collective (ferroelectric-antiferroelectric polymorphism) and defective (ionic) switching character of HfO2-ZrO2 (HZO) to synergistically enhance both its electroresistance and rectifying behavior. From a device perspective, we leverage the conformal growth capability of atomic layer deposition (ALD) to integrate three-dimensional (3D) HTD structures, to improve both array density and electrostatic control, yielding record-high on/off and rectification ratios across all two-terminal paradigms. From an array perspective, the enhanced self-rectifying behavior leads to the highest array scalability and storage capacity reported for any memristive system. Overall, the unprecedented memristive performance, which exploits the same materials and processes used in modern microelectronics, not only positions the HTD as an ideal hardware building block for future 3D IMC platforms, but also highlights the potential of engineering breakthrough properties in conventional CMOS materials towards accelerating the “lab-to-fab” technological translation of novel functional devices.

Slides

Analog Computing with Inverse-Designed Metastructures

Giuseppe Romano, Research Scientist, Institute for Soldier Nanotechnologies

The increasing demand for AI is spurring the development of innovative computing platforms, which often employ single analog complex operations rather than traditional Boolean logic. We propose inverse-designed metastructures that perform matrix-vector multiplications, leveraging heat as the signal carrier. Furthermore, we investigate optimized spatiotemporal-modulated structures for processing time-dependent signals.

Slides

Magnetic Tunnel Junction for Stochastic Neuromorphic Computing

Luqiao Liu, Associate Professor, Electrical Engineering and Computer Science

The rapidly growing demand for more efficient artificial intelligence (AI) hardware accelerators remains a pressing challenge. Crossbar arrays have been widely proposed as a promising in-memory computing architecture. But conventional nonvolatile-memory-based crossbar arrays inherently require a large number of analog-to-digital converters (ADCs), leading to significant area and energy inefficiencies. Here, we demonstrate three-terminal stochastic magnetic tunnel junctions (sMTJs) operated by spin-orbit torque (SOT) as novel interfacial components between the analog and digital domains for next-generation AI accelerators. By harnessing the intrinsic analog-current-to-digital-signal conversion from sMTJs, we replace conventional bulk, energy hungry, and slow ADCs with compact, low power, and rapid stochastic current digitizers. Furthermore, a partial sum approach is introduced to break down large matrix operations, optimizing computational efficiency and achieving high accuracy on the MNIST handwritten digit dataset. This work paves the way for future AI hardware designs that leverage device-level innovations to overcome the limitations of current in-memory computing systems.

Slides

Compressing (in) the Wild: Continual Fine-tuning of Autoencoders for Camera Trap Image Compression

Timm Haucke, PhD Candidate, Electrical Engineering and Computer Science

This talk introduces an on-device fine-tuning strategy for autoencoder-based image compression, designed to achieve high compression ratios for camera trap images. Camera traps are an essential tool in ecology but are often deployed in remote areas and are therefore frequently bandwidth constrained. We exploit the fact that camera traps are static cameras, which results in high temporal redundancy in the image background, and fine-tune an autoencoder-based compression model to specific sites, thereby achieving higher compression ratios than general compression models.

In collaboration with Sara Beery, Assistant Professor, Electrical Engineering and Computer Science

Slides

Declarative Optimization for AI Workloads

Michael Cafarella, Research Scientist, Computer Science and Artificial Intelligence Laboratory (CSAIL)

Today’s AI engineer must make a huge number of narrow technical decisions: which models best suit which problems, which prompting methods to use, which test-time compute patterns to employ, whether to substitute conventional code for “easy” problems, and so on. These decisions are crucial for good performance, quality, and cost but are tedious and time-consuming to make. Moreover, they must be revisited when new models are released or existing prices are changed.

We propose a declarative AI programming framework that automatically optimizes the program on the user’s behalf. Like a relational database, it can marshal a wide range of optimization strategies, with the goal of making AI programs that are as fast, inexpensive, and high-quality as possible. Our prototype Palimpzest system can currently obtain AI programs that are up to 3.3x faster and 2.9x cheaper than a baseline method, while achieving equal or greater quality. Palimpzest is open source and is designed to integrate new optimization methods that should permit even greater future gains.

Slides

2:30 – 3:30

Research Showcase


Privacy Preference Center