Project 1 – Heterogeneous and Disaggregated Memory
Emerging applications like training large machine learning models, graph analytics, etc. have working sets in the terabytes range and exhibit poor data reuse. Moreover, the cost of DRAM is increasingly dominating the overall cost of a computing node, especially in the cloud computing environments. Reliability and power consumption of the memory subsystem continue to be challenges especially as we go to 1 nm scale device geometries. Simultaneously there exciting new developments such as CXL, photonic interconnects, and 3D/2.5D packaging, which enable rethinking memory subsystem in new ways, beyond traditional hardware-managed cache hierarchies to balance latency, bandwidth, power consumption, and reliability.
One consequence of these new developments is the growing semantic gap between the programmer’s view of data (both allocation, deallocation, and movement between different types of memories) and the physical implementation of data movement in the computing system. Our overarching research goal is to bridge this semantic gap with new ISAs for data-movement, hardware/software codesign of the data tiering mechanisms, and compiler/language support for programmers to specify data movement granularity and needs of a given application. We are extending this to disaggregated systems and systems that can improve performance by scaling up as opposed to scaling out. We are benchmarking real world applications on real hardware and developing models for architectural enhancements for hardware/software codesign in gem5 simulation environment.
Collaborators: Prof. Jason Lowe-Power (CS Department, UC Davis)
Lead Students: Mark Hildebrand (Ph.D. @ UC Davis now at Intel), Julian T. Angeles (MS)
Funding Sources: Intel, NSF
Recent Published Papers
- Hildebrand, Mark, Julian T. Angeles, Jason Lowe-Power, and Venkatesh Akella. “A Case Against Hardware Managed DRAM Caches for NVRAM Based Systems.” In 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 194-204. IEEE, 2021.
- Hildebrand, Mark, Jawad Khan, Sanjeev Trika, Jason Lowe-Power, and Venkatesh Akella. “Autotm: Automatic tensor movement in heterogeneous memory systems using integer linear programming.” In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 875-890. 2020.
Project 2 – Secure HPC
We at LBNL/UC Davis are working on DESC (Data Enclaves for Secure Computing) to support emerging collaborative workflows to harness the data generated by sensors/instruments to enable next generation DOE missions. The goal of DESC is end-to-end protection of data and the associated scientific IP (models, outputs of the models) – from the sensor to the edge node to the HPC facility and possibly back to the instruments in the field. The underlying workflows are complex both in terms of the underlying heterogeneous infrastructure (5G networks, different types of sensors and computing hardware) and the need to support collaboration between untrusted parties across organizational boundaries. Furthermore, the data captured and generated by these workflows have complex and ad-hoc ownership, usage, and sharing rights/privileges and must adhere to compliance and audit requirements. Our current work is focused on codesigning next generation hardware/software to support DESC by extending hardware-based memory isolation mechanisms underlying TEEs (trusted execution environments).
Collaborators: Dr. Sean Peisert (LBL and UC Davis), Prof. Jason Lowe-Power (CS Department UC Davis)
Lead Student: Ayaz Akram (Ph.D.)
Funding Sources: DOE, LBL
Recent Published Papers
- Akram, Ayaz, Anna Giannakou, Venkatesh Akella, Jason Lowe-Power, and Sean Peisert. “Performance analysis of scientific computing workloads on general purpose TEEs.” In 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 1066-1076. IEEE, 2021.
- Akram, Ayaz, Venkatesh Akella, Sean Peisert, and Jason Lowe-Power. “Enabling design space exploration for risc-v secure compute environments.” (2021).
- Lowe-Power, Jason, Venkatesh Akella, Matthew K. Farrens, Samuel T. King, and Christopher J. Nitta. “Position Paper: A case for exposing extra-architectural state in the ISA.” In Proceedings of the 7th International Workshop on Hardware and Architectural Support for Security and Privacy, pp. 1-6. 2018.
Project 3 – High Performance Computing Architecture
In this project we are investigating computer architecture for high performance and scalable data analytics. The project has four aspects to it. First, we show that contention is the main bottleneck in the memory subsystem especially in chiplet-based heterogeneous data parallel computing systems. We propose new ways of designing (partitioned) memory controllers that reduce contention by creating dedicated paths (enabled by wavelength division multiplexing) between the cores and DRAM banks. Next, we show how DRAM microarchitecture can be codesigned with electronics and optics to expose massive amount of parallelism to the compute elements. This in combination with the so-called partitioned memory controller idea helps us develop DRAM based memory subsystem that not only has significantly high bandwidth but also very low (significantly more deterministic) latency. Armed with this new memory subsystem we are exploring scalable architectures for petascale graph analytics and disaggregated memory suitable for data-centric HPC applications.
Collaborators: Prof. Jason Lowe-Power (CS Department, UC Davis) and Prof. Ben Yoo (ECE Dept. UC Davis)
Students: Marjan Fairboz (Ph.D.), Mahyar Samani (Ph.D.), Terry O’Neill (Ph.D.), Pouya Fotouhi (Ph.D. now at Nvidia)
Funding Sources: ARO
Recent Published Papers
- Fotouhi, Pouya, Marjan Fariborz, Roberto Proietti, Jason Lowe-Power, Venkatesh Akella, and S. J. Yoo. “HTA: A Scalable High-Throughput Accelerator for Irregular HPC Workloads.” In International Conference on High Performance Computing, pp. 176-194. Springer, Cham, 2021.
- Fariborz, Marjan, Mahyar Samani, Pouya Fotouhi, Roberto Proietti, Il-Min Yi, Venkatesh Akella, Jason Lowe-Power, Samuel Palermo, and S. J. Yoo. “LLM: Realizing Low-Latency Memory by Exploiting Embedded Silicon Photonics for Irregular Workloads.” In International Conference on High Performance Computing, pp. 44-64. Springer, Cham, 2022.