A Survey on Compute-in-Memory and Hyperdimensional Computing for Low-Power Machine Learning¶

Abstract¶

Compute-in-Memory (CiM) architectures and Hyperdimensional Computing (HDC) offer complementary approaches to energy-efficient machine learning (ML). CiM reduces data-movement energy by performing analog computation within memory arrays, while HDC provides algorithmic robustness under low precision and noise. This survey reviews recent advances at their intersection—from resistive and phase-change crossbars to hardware-aware training and in-memory HDC prototypes—highlighting key challenges in calibration, programmability, and co-design. Together, these paradigms outline a pathway toward sustainable, low-power intelligent systems.

Additional Key Words and Phrases¶

Compute-in-Memory (CiM), Analog In-Memory Computing (AIMC), Resistive Crossbar Arrays, Phase-Change Memory (PCM), Ferroelectric FETs (FeFETs), Hyperdimensional Computing (HDC), Vector Symbolic Architectures (VSA), Analog-Aware Training, Energy-Efficient Machine Learning (ML), Sustainable Computing, Hardware–Software Co-Design, Low-Precision Inference.

1. Introduction¶

1.1 Motivation: The Energy Cost of Machine Learning¶

Machine learning (ML) has become one of the dominant consumers of computational energy in the modern computing stack. The growing deployment of deep neural networks (DNNs) and large language models (LLMs) has amplified the mismatch between processor speed and memory bandwidth—commonly known as the von Neumann bottleneck. In contemporary systems, moving data between compute cores and off-chip memory consumes orders of magnitude more energy than performing arithmetic on that data. As a result, energy per inference and latency increasingly depend on memory access rather than computation itself.

Compute-in-Memory (CiM) architectures address this challenge by collapsing computation into the memory plane, thereby eliminating the costly transfer of intermediate data between separate compute and storage units. Using resistive or phase-change memory arrays (see §1.2), CiM circuits perform analog vector–matrix multiplications in situ, exploiting the physical Ohm’s Law and Kirchhoff’s Circuit Laws of current summation to implement multiply–accumulate (MAC) operations natively in hardware. By replacing digital data movement with analog accumulation, these systems promise dramatic reductions in energy and area. Early demonstrations such as ISAAC¹ achieved \(5.5 \times\) lower energy and \(14.8 \times\) higher throughput than digital accelerators by integrating analog crossbars into a deep-learning pipeline. Later designs like PUMA² further introduced programmability, showing up to \(2,446 \times\) higher energy efficiency—and \(66 \times\) lower latency—relative to contemporary GPUs.

However, despite these improvements, CiM architectures remain constrained by analog non-idealities—conductance drift, asymmetry, retention loss, and noise—that degrade computational accuracy. Moreover, analog-to-digital (ADC) and digital-to-analog (DAC) conversion overheads frequently dominate total energy cost, eroding the theoretical gains of in-memory computation. Even with careful calibration and retraining, hardware imperfections can propagate through deep learning models, particularly those trained with high numerical precision.

In parallel, Hyperdimensional Computing (HDC)—also known as Vector Symbolic Architecture (VSA)—has emerged as an alternative computational framework that naturally embraces approximation and noise tolerance. HDC represents data as very high-dimensional vectors (typically \(10^3\)–\(10^5\) elements) and manipulates them through simple algebraic operations such as binding, bundling, and permutation (see §1.2). These operations distribute information across all dimensions, allowing computation to remain robust even when a large fraction of individual bits are corrupted or flipped. Because of this intrinsic fault tolerance, HDC can maintain accuracy under low precision and stochastic errors that would significantly degrade conventional deep neural networks (DNN)³⁴.

The synergy between CiM and HDC is increasingly recognized as a promising avenue for low-power, sustainable ML. CiM offers physical energy efficiency by minimizing data movement, while HDC offers algorithmic resilience to the analog variability inherent to CiM hardware. Experiments have demonstrated that HDC can operate effectively on memristive crossbars⁵, achieving competitive classification accuracy at substantially lower precision and energy than digital implementations. Both paradigms rely on massively parallel vector–matrix operations, making them naturally compatible at the architectural level. This alignment motivates a joint exploration of how algorithm–hardware co-design can combine CiM’s physical efficiency with HDC’s computational robustness to enable scalable, energy-efficient inference.

From a sustainability perspective, the stakes are high. The rapid growth of artificial intelligence (AI) has driven up data-center electricity use, cooling water demand, and carbon emissions. As model sizes and deployment scale increase, these trends stress power grids and raise operating costs. Architectures that push inference toward picojoule-level energy costs can materially curb AI’s environmental footprint and enable operation within strict energy and carbon budgets. Technically, the path runs through solving calibration, variability, and programmability: if we can tame analog non-idealities (device drift, asymmetry, retention), reduce converter overheads, and integrate analog-aware toolchains, analog–digital hybrid systems such as CiM coupled with HDC can deliver high performance with minimal power and robust accuracy at both the edge and in the cloud.

1.2 Background and Definitions¶

To understand the current state of this field, several key concepts must be defined.

CiM, also referred to as Analog In-Memory Computing (AIMC), integrates computation directly into memory arrays. The core operation is the analog vector–matrix multiplication, where input voltages applied across the rows of a resistive crossbar produce output currents on the columns proportional to the weighted sum of the inputs. Each memory cell stores a conductance value corresponding to a model weight, and the aggregated current encodes the MAC result. This operation, governed by Kirchhoff’s and Ohm’s laws, inherently performs parallel computation with extremely high density and energy efficiency. Reported device-to-system efficiencies for analog MAC operations typically fall in the sub- to single-picojoule range (approximately \(0.1\)–\(1\) pJ/MAC) across recent CiM platforms, with FeFET (see below) crossbars demonstrating sub-picojoule read energies.⁶⁷

Several device technologies underpin CiM research: Resistive Random Access Memory (RRAM), Phase-Change Memory (PCM), and Ferroelectric FETs (FeFETs) each offer tunable multi-level conductance states, enabling analog computation but differing in endurance, linearity, and retention. System-level performance is typically measured in energy per MAC and throughput (tera-operations per second per watt, or TOPS/W).

HDC represents information as hypervectors—long random vectors in high-dimensional spaces—using mathematical operations designed to preserve similarity and compositionality. Common primitives include:

Bundling: addition or majority vote to combine multiple hypervectors
Binding: elementwise multiplication or XOR to associate entities
Permutation: fixed reordering to encode sequence or position

These operations produce distributed representations that are inherently robust to noise, bit errors, and quantization. Empirical studies show that in-memory HDC prototypes maintain competitive classification accuracy under substantial device noise and limited precision (for example, variability in PCM arrays), and broader surveys emphasize HDC’s robustness to stochastic or unreliable hardware substrates; in practice, resilience at or around the 10% noise or bit-error level has been demonstrated depending on task and device characteristics.³⁴⁵ When implemented in hardware, such operations map naturally to analog crossbars performing vector additions and correlations, allowing HDC to exploit the same physical mechanisms as CiM.

Finally, non-idealities refer to the unavoidable deviations of analog devices from ideal behavior—such as conductance drift, asymmetry, write noise, retention loss, and voltage drop across interconnects (IR drop)—which cumulatively limit precision and accuracy. Addressing these effects requires calibration, retraining, or algorithmic compensation, often through hardware-aware training.⁸⁹

In summary, the research area surveyed in this paper examines the intersection of two converging trends: (1) hardware-level analog CiM architectures designed to reduce energy by eliminating data movement, and (2) algorithm-level HDC frameworks designed to tolerate the noise, imprecision, and variability of such hardware. The remainder of this survey outlines the major challenges preventing widespread adoption, reviews the state of the art across devices and architectures, and identifies future research directions that could enable the co-design of CiM and HDC systems for energy-efficient ML.

2. Challenges and Key Research Problems¶

2.1 Device Non-Idealities and Calibration¶

Despite the theoretical efficiency of CiM architectures, practical implementations face severe analog non-idealities. Variations in device conductance, asymmetric write behavior, retention loss, and IR drop introduce errors that accumulate across layers during inference. Crossbar arrays fabricated from RRAM, PCM, or FeFETs all exhibit stochastic programming noise and limited linearity, constraining achievable precision to approximately 4–8 effective bits per MAC operation.⁶⁷

Hardware-aware training techniques partially mitigate these effects by incorporating device models directly into the forward and backward passes.⁸ Such approaches recover inference accuracy to within about one percentage point of floating-point digital baselines but require retraining for each device type, which limits scalability. Even recent analog training algorithms that remove dependence on calibrated zero-point conductance values⁹ cannot yet guarantee stability over temperature or long-term drift. These imperfections remain a central barrier to deploying large-scale analog accelerators outside the laboratory.

2.2 System-Level Bottlenecks¶

At the system level, converter overheads dominate the total energy budget. ADC and DAC interfaces often consume more energy than the crossbar compute itself, negating the theoretical advantages of in-situ MACs.¹² Increasing converter precision improves accuracy but scales power quadratically; conversely, lowering precision amplifies quantization error and requires algorithmic compensation.

Architectural proposals such as ReHarvest¹⁰ decouple and pool ADC resources dynamically across tiles, achieving about \(3 \times\) utilization gains and \(3.5 \times\) throughput improvement, yet practical implementations must still balance latency, concurrency, and signal-integrity constraints. Peripheral circuits—including decoders, drivers, and sense amplifiers—contribute additional overheads that narrow the efficiency gap between analog CiM and optimized digital accelerators. These findings highlight the need for holistic co-optimization of core, converter, and interconnect energy.

2.3 Software and Algorithmic Gaps¶

Even with improved devices and architectures, CiM’s success depends on software support that can map, schedule, and calibrate workloads across thousands of noisy crossbar tiles. Current toolchains remain fragmented: ISAAC’s mapping strategy¹ and PUMA’s compiler² demonstrate feasibility but lack integration with analog-aware training loops. Moreover, existing frameworks cannot automatically adjust tile-level precision or re-balance workloads in response to device aging or environmental drift.

At the algorithmic level, most deep-learning models assume deterministic high-precision arithmetic, making them brittle under analog variability. HDC offers a compelling contrast: its distributed, high-dimensional representations inherently tolerate bit-flip and quantization noise.³⁵ Yet systematic co-design of CiM hardware and HDC algorithms is still nascent. Bridging this gap—through standardized analog simulation stacks, compiler abstractions, and noise-aware learning paradigms—remains a key research frontier.

In summary, across these layers, three intertwined challenges impede progress toward commercially viable analog compute:

Device reliability — how to maintain precision under non-ideal behavior;
System efficiency — how to minimize ADC/DAC and peripheral overhead;
Software adaptability — how to design compilers and algorithms that thrive under analog variability.

Addressing these issues will be essential to realize CiM’s potential as a sustainable platform for low-power ML.

3. Review of the State of the Art¶

3.1 Evolution of Compute-in-Memory Architectures¶

The trajectory of analog CiM research began with fixed-function accelerators and has progressively evolved toward programmable and hybrid analog–digital systems.

ISAAC introduced the first large-scale demonstration of in-situ analog arithmetic for convolutional and fully connected neural networks.¹ Its hierarchical organization of \(128 \times 128\) crossbar arrays integrated analog vector–matrix multipliers within a pipelined digital dataflow, achieving \(14.8 \times\) higher throughput, \(5.5 \times\) lower energy, and \(7.5 \times\) greater computational density than the fully digital DaDianNao baseline. ISAAC’s architectural template—tile-level dataflow, weight replication for pipeline balance, and bit-serial encoding matched to ADC precision—remains foundational for later CiM research.

PUMA extended ISAAC’s fixed-function model into a programmable CiM platform by introducing an instruction set architecture (ISA) and compiler that map diverse workloads—including CNNs, long short-term memory (LSTM) networks, and multilayer perceptrons (MLPs)—onto heterogeneous analog crossbars.² This generalization preserved analog efficiency while enabling flexibility comparable to digital accelerators. PUMA demonstrated up to \(2{,}446 \times\) higher energy efficiency and \(66 \times\) lower latency than 2019 GPUs on CNN/LSTM/MLP workloads, establishing CiM as a feasible substrate for domain-specific inference.

Subsequent research focused on closing the accuracy gap caused by analog imperfections. Rasch et al. developed hardware-aware training that injects non-ideal device characteristics—such as conductance drift, asymmetry, and variability—directly into the forward and backward passes during retraining.⁸ This closed-loop approach recovered inference accuracy to within approximately 1% of digital baselines across CNN, recurrent neural networks (RNN), and transformer workloads. Later, zero-point-independent analog training algorithms eliminated the need for precisely calibrated reference conductance values, maintaining stability under 10% write noise and 30% asymmetry while accelerating convergence by \(3 \times\) relative to earlier analog training schemes.⁹

More recently, Leroux et al. demonstrated the scalability of CiM architectures to transformer-class workloads. Their analog attention engine performed key, query, and value matrix–vector multiplications directly in memory using hybrid analog–digital gain-cell arrays.¹¹ The system achieved up to \(10^2 \times\) latency reduction and \(10^4 \times\) energy savings compared with GPU baselines, illustrating that analog computing can address not only classical inference tasks but also the energy bottlenecks of LLM attention layers.

Collectively, these works mark CiM’s progression from architectural proof-of-concept to programmable analog accelerator and, finally, to hybrid analog–digital engines capable of supporting modern AI workloads. Remaining challenges center on end-to-end software integration and reproducible evaluation across device technologies.

3.2 Device Innovations and Co-Design Perspectives¶

Device-level advances underpin every gain in CiM performance. Haensch et al. synthesized over a decade of crossbar research into a co-design taxonomy spanning materials, circuits, architectures, and workloads.⁶ Their analysis quantified how device parameters—conductance linearity, retention, and variability—propagate to system-level trade-offs in energy, delay, and accuracy. Reported operating energies ranged from 0.1 to 1 pJ per MAC, with endurance between \(10^5\) and \(10^8\) cycles across RRAM, PCM, and FeFET technologies.

Soliman et al. presented the first experimental demonstration of a CMOS-compatible multi-level FeFET crossbar, where each cell supports up to sixteen analog states via partial polarization control.⁷ The prototype achieved sub-picojoule read energy, endurance beyond \(10^6\) cycles, and retention exceeding \(10^5\) seconds (approximately 28 hours) without significant drift—validating FeFETs as a promising non-volatile platform for analog MACs.

At the circuit and architecture level, Xu et al. proposed ReHarvest, which decouples ADCs from individual crossbars and dynamically shares them across tiles.¹⁰ This resource-harvesting approach improved ADC utilization by \(3.2 \times\) and overall throughput by \(3.5 \times\) while reducing redundant converter power by \(3.1 \times\). Such pooling strategies redefine energy bottlenecks as scheduling problems rather than device-physics limitations.

Finally, Lammie et al. demonstrated that analog variability can enhance robustness instead of merely degrading performance. Using a PCM-based CiM chip, they showed that intrinsic noise and stochasticity act as implicit regularizers that smooth decision boundaries, reducing adversarial attack success rates by approximately 25% with negligible accuracy loss.¹²

Together, these studies emphasize that CiM’s efficiency depends on co-design across layers: material science determines circuit precision; circuit design constrains architecture; and architecture informs compiler and workload design. Future progress hinges on metrics that unify these abstractions into a single evaluative framework.

3.3 Hyperdimensional Computing as a Complementary Paradigm¶

In parallel to hardware advances, HDC has matured into a robust mathematical and algorithmic framework aligned with the properties of analog hardware. Kleyko et al. provided a comprehensive synthesis of HDC’s foundations, showing that simple arithmetic operations—binding, bundling, and permutation—form a universal substrate for symbolic reasoning and associative memory.³ High-dimensional representations (\(10^3\)–\(10^5\) dimensions) distribute information such that minor perturbations to individual components yield minimal degradation of global similarity, explaining HDC’s resilience to bit-flip and quantization errors.

Karunaratne et al. realized these principles experimentally by mapping HDC primitives directly onto memristive crossbars performing analog matrix–vector multiplications.⁵ Their prototype surpassed 90% classification accuracy on benchmark tasks under up to 10% device noise while consuming more than an order of magnitude less energy than digital baselines.

Subsequent surveys⁴¹³ extended this foundation to hardware implementations across CMOS, spintronic, photonic, and memristive substrates. They reported that HDC models maintain functional accuracy under precision as low as a few bits per component, demonstrating compatibility with the limited dynamic range of analog CiM devices. Collectively, these results position HDC as an algorithmic paradigm inherently tolerant to the same noise sources that constrain analog accelerators, making it a natural candidate for co-execution on CiM hardware.

3.4 Bridging CiM and HDC¶

Recent work has begun to merge CiM’s physical efficiency with HDC’s algorithmic robustness, highlighting their conceptual and operational convergence. Both paradigms rely on vector–matrix operations that can be implemented efficiently through analog current summation in crossbar arrays. In HDC, similarity computation between hypervectors reduces to high-dimensional dot products—precisely the operation that analog CiM performs most efficiently.

Karunaratne et al. demonstrated an in-memory HDC pipeline that executes binding and bundling directly on resistive arrays,⁵ confirming that high-dimensional representations maintain accuracy even under significant analog noise. Similarly, Kleyko et al. suggested that HDC constitutes a representative algorithmic framework for assessing emerging analog substrates and advocated for standardized benchmarking frameworks that compare energy and precision trade-offs across devices.³¹³

These studies suggest a unified research direction: analog–aware HDC on CiM hardware. Under this paradigm, analog noise becomes a controllable algorithmic parameter rather than a hardware defect, and compiler/runtime layers can tune system precision to match task tolerance. Empirical metrics—such as energy per MAC (approximately 0.1–1 pJ), throughput (order-of-magnitude TOPS/W improvements), and bit-error tolerance (around the 10% level, depending on task and device)—indicate the potential of such co-designed systems to surpass the efficiency of digital accelerators while maintaining functional robustness.

The next phase of research will likely focus on end-to-end co-simulation frameworks that integrate device-level variability, circuit-level scheduling, and algorithm-level learning dynamics. Bridging CiM and HDC not only offers a pathway to sustainable low-power inference but also redefines how noise and approximation can be leveraged as computational resources.

4. Future Research Directions¶

4.1 Unified Simulation and Compiler Frameworks¶

A key barrier to reproducibility and adoption in CiM research is the absence of standardized software infrastructure. Existing compilers such as those used in ISAAC¹ and PUMA² provide functional mapping from neural graphs to analog tiles, but lack integration with hardware-aware training or dynamic calibration. Future frameworks should unify analog device models, crossbar scheduling, and training loops within a single stack—enabling end-to-end co-simulation of energy, accuracy, and throughput.

Such frameworks could incorporate parametric noise models drawn from experimental characterization data⁶⁷ and expose these parameters to the training process, effectively teaching models to adapt to hardware variability. Integrating compiler abstractions for converter sharing and precision scaling, inspired by ReHarvest,¹⁰ would permit automated exploration of ADC/DAC configurations under power and accuracy constraints. A unified, open-source toolchain would therefore accelerate cross-comparisons and lower the entry barrier for algorithm–hardware co-design.

4.2 Cross-Layer Algorithm–Hardware Co-Design¶

The next generation of analog accelerators must transcend device- or architecture-centric optimization and instead pursue cross-layer learning loops. Hardware-aware training⁸⁹ provides a starting point, but future systems should adapt online to non-idealities such as conductance drift or temperature fluctuations. One promising direction is closed-loop calibration, in which analog feedback signals adjust training gradients or precision scheduling in real time.

On the algorithmic side, HDC offers principles for constructing inherently noise-tolerant representations.³⁵ Embedding HDC-style encoding and similarity operations within CiM architectures could create resilient hybrid models that degrade gracefully under analog noise. Research should explore how HDC’s statistical error-correction properties interact with device-level variability and whether such methods can serve as a regularization mechanism for analog neural networks. Ultimately, effective co-design will require rethinking the interface between compiler, runtime, and learning algorithm so that precision and noise become first-class optimization variables rather than fixed constraints.

4.3 Standardized Evaluation and Benchmarking¶

Meaningful comparison across analog platforms remains difficult due to inconsistent reporting of precision, noise, and energy metrics. Following the methodology of Haensch et al.,⁶ the community would benefit from benchmark suites that specify device-to-system metrics—e.g., energy per multiply–accumulate (pJ/MAC), effective bit precision, and drift parameters—under standardized workloads. Benchmarks should include both conventional deep-learning tasks and HDC workloads that stress robustness under quantization and noise.¹³

Complementary to technical metrics, reporting should incorporate sustainability indicators such as joules per inference, embodied carbon, and lifetime energy amortization. Establishing such cross-layer benchmarks would allow CiM and HDC researchers to quantify real-world efficiency gains and track progress toward climate-aligned computing goals.

4.4 Toward Sustainable, Large-Scale Deployment¶

Scaling analog-in-memory computing beyond laboratory prototypes requires addressing manufacturability, endurance, and integration with existing digital ecosystems. Demonstrations such as the analog attention engine for LLMs¹¹ show that hybrid analog–digital architectures can already support transformer-level workloads, but deploying them at datacenter scale will require adaptive control of thermal budgets, error correction, and online recalibration.

On the societal front, the motivation for energy-efficient computation has never been clearer: AI workloads now account for a rapidly growing share of global datacenter electricity use. Mature CiM–HDC co-designed systems could deliver orders-of-magnitude reductions in energy and water consumption, mitigating environmental impact while enabling edge intelligence under stringent power constraints. Achieving this vision will depend on open collaboration between device engineers, system architects, and algorithm designers to transform analog computing from a research curiosity into a cornerstone of sustainable AI infrastructure.

In summary, future progress in CiM and HDC will depend on integrating the physical, architectural, and algorithmic layers into a cohesive ecosystem. Unified simulators and compilers will make analog systems programmable; cross-layer learning loops will make them adaptive; standardized benchmarks will make them comparable; and sustainability-driven design will make them societally relevant. Together, these directions chart a path toward robust, efficient, and environmentally responsible ML at scale.

5. Conclusion¶

The convergence of CiM and HDC represents one of the most promising directions for sustainable ML. By collapsing data movement into the memory plane and embracing algorithmic noise tolerance, these paradigms jointly address the physical and computational inefficiencies that dominate modern AI workloads. Over the past decade, progress has transformed CiM from fixed-function prototypes into programmable analog accelerators and expanded HDC from cognitive theory into a hardware-ready framework resilient to low precision. Yet several obstacles remain: non-ideal analog behavior still limits accuracy, converter overhead constrains energy efficiency, and the absence of unified software stacks hinders large-scale adoption.

The next frontier will require integrating device-level realism, system-level scheduling, and algorithm-level robustness into a cohesive design philosophy. Co-simulation environments, analog-aware compilers, and standardized benchmarking will enable reproducible progress, while hybrid CiM–HDC systems can redefine noise not as a defect but as a resource for efficient computation. Ultimately, advancing this field is about more than architectural optimization—it is about aligning ML’s computational growth with the planet’s finite energy resources. If realized, CiM–HDC co-design could enable the next generation of intelligent systems that are not only fast and capable, but also fundamentally sustainable.

References¶

Shafiee, A., Nag, A., Muralimanohar, N., Balasubramonian, R., Strachan, J. P., Hu, M., Williams, R. S., & Srikumar, V. (2016). ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. ISCA, 14–26. https://doi.org/10.1145/3007787.3001139 ↩↩↩↩↩
Ankit, A., El Hajj, I., Chalamalasetti, S. R., Ndu, G., Foltin, M., Williams, R. S., Faraboschi, P., Hwu, W.-M. W., Strachan, J. P., Roy, K., & Milojicic, D. S. (2019). PUMA: A programmable ultra-efficient memristor-based accelerator for machine learning inference. ASPLOS, 715–731. https://doi.org/10.1145/3297858.3304049 ↩↩↩↩↩
Kleyko, D., Davies, M., Frady, E. P., Kanerva, P., Kent, S. J., Olshausen, B. A., Osipov, E., Rabaey, J. M., Rachkovskij, D. A., Rahimi, A., & Sommer, F. T. (2022). Vector symbolic architectures as a computing framework for emerging hardware. Proceedings of the IEEE, 110(10), 1538–1571. https://doi.org/10.1109/JPROC.2022.3209104 ↩↩↩↩↩↩
Kleyko, D., Rachkovskij, D. A., Osipov, E., & Rahimi, A. (2022). A survey on hyperdimensional computing aka vector symbolic architectures, part I: Models and data transformations. ACM Computing Surveys, 55(6), Article 130, 1–40. https://doi.org/10.1145/3538531 ↩↩↩
Karunaratne, G., Le Gallo, M., Cherubini, G., Benini, L., Rahimi, A., & Sebastian, A. (2020). In-memory hyperdimensional computing. Nature Electronics, 3, 327–337. https://doi.org/10.1038/s41928-020-0410-3 ↩↩↩↩↩↩
Haensch, W., Raghunathan, A., Roy, K., Chakrabarti, B., Phatak, C. M., Wang, C., & Guha, S. (2023). Compute-in-memory with non-volatile elements for neural networks: A review from a co-design perspective. Advanced Materials, 35(37), e2204944. https://doi.org/10.1002/adma.202204944 ↩↩↩↩↩
Soliman, T., Chatterjee, S., Laleni, N., Müller, F., Kirchner, T., Wehn, N., Kämpfe, T., Chauhan, Y. S., & Amrouch, H. (2023). First demonstration of in-memory computing crossbar using multi-level cell FeFET. Nature Communications, 14, 6348. https://doi.org/10.1038/s41467-023-42110-y ↩↩↩↩
Rasch, M. J., Mackin, C., Le Gallo, M., Chen, A., Fasoli, A., Odermatt, F., Li, N., Nandakumar, S. R., Narayanan, P., Tsai, H., Burr, G. W., Sebastian, A., & Narayanan, V. (2023). Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators. Nature Communications, 14, 5282. https://doi.org/10.1038/s41467-023-40770-4 ↩↩↩↩
Rasch, M. J., Carta, F., Fagbohungbe, O., & Gokmen, T. (2024). Fast and robust analog in-memory deep neural network training. Nature Communications, 15, 7133. https://doi.org/10.1038/s41467-024-51221-z ↩↩↩↩
Xu, J., Liu, H., Duan, Z., Liao, X., Jin, H., Yang, X., Li, H., Liu, C., Mao, F., & Zhang, Y. (2024). ReHarvest: An ADC resource-harvesting crossbar architecture for ReRAM-based DNN accelerators. ACM Transactions on Architecture and Code Optimization (TACO), 21(3), Article 63, 1–26. https://doi.org/10.1145/3659208 ↩↩↩
Leroux, N., Manea, P.-P., Sudarshan, C., Finkbeiner, J., Siegel, S., Strachan, J. P., & Neftci, E. (2025). Analog in-memory computing attention mechanism for fast and energy-efficient large language models. Nature Computational Science, 5, 813–824. https://doi.org/10.1038/s43588-025-00854-1 ↩↩
Lammie, C., Büchel, J., Vasilopoulos, A., Le Gallo, M., & Sebastian, A. (2025). The inherent adversarial robustness of analog in-memory computing. Nature Communications, 16, 1756. https://doi.org/10.1038/s41467-025-56595-2 ↩
Kleyko, D., Rachkovskij, D. A., Osipov, E., & Rahimi, A. (2023). A survey on hyperdimensional computing aka vector symbolic architectures, part II: Applications, cognitive models, and challenges. ACM Computing Surveys, 55(9), Article 175, 1–52. https://doi.org/10.1145/3558000 ↩↩↩