William Wang

I'm a researcher at Arm Research in Cambridge, where I lead the systems research on non-volatile memories. My current research focus is on addressing persistent memory programming challenges with the right architectural and microarchitectural support.

If you're interested in finding out more about my research area, an overview can be found in my recent keynote [slides|video] at PIRL'20. You can also find a few blogs I wrote to introduce research publications in microarchitectural support, architectural support, and language support of persistent programming.

Selected Publications

HPCA'21, March 2021

The gem5 Simulator: Version 20.0+

Research Infrastructure

arXiv, September 2020

Relaxed Persist Ordering Using Strand Persistency

Addressing NVM Programming Challenges

ISCA'20, June 2020

Hardware Implementation of Strand Persistency

Addressing NVM Programming Challenges

NVMW'20, March 2020

Language support for memory persistency

Addressing NVM Programming Challenges

IEEE Micro Top Picks, May-June 2019

Strand Persistency

Addressing NVM Programming Challenges

NVMW'19, March 2019

Software Wear Management for Persistent Memories

Addressing NVM Endurance Challenges

FAST'19, February 2019

Quantifying the performance overheads of PMDK

Addressing NVM Programming Challenges

MEMSYS'18, October 2018

Persistency for synchronization-free regions

Addressing NVM Programming Challenges

PLDI'18, June 2018
MEMSYS'17, October 2017
MEMSYS'15, October 2015
IISWC'13, September 2013

Selected Patents


Processor Architecture,UK App No. 2105240.2

An apparatus and method are described for generating debug information. The apparatus has processing circuitry for executing a sequence of instructions that includes a plurality of debug information triggering instructions, and debug information generating circuitry for coupling to a debug port. On executing a given debug information triggering instruction, the processing circuitry is arranged to trigger the debug information generating circuitry to generate a debug information signal whose form is dependent on a control parameter specified by the given debug information triggering instruction. The generated debug information signal is output from the debug port for reference by a debugger. The control parameter is such that the form of the debug information signal enables the debugger to determine a state of the processing circuitry when the given debug information triggering instruction was executed.

April 2021

Systems and Methods for Defining and Enforcing Ordered Constraints

Processor Architecture,US17/129,515

In a particular implementation, a method includes: receiving, at a computing device, first and second instructions of a plurality of instructions obtained from a memory, where the first instruction corresponds to a preceding instruction of a second instruction, and where the second instruction corresponds to a succeeding instruction of the first instruction; determining a dependency of the first and second instructions; sending the first and second instructions to an issue queue of the computing device; executing, at the computing device, the first and second instructions; and completing, at the computing device, the first and second instructions.

December 2020

Draining Operation to Cause Store Data to be Written to Persistent Memory

Processor Microarchitecture,US17/069,057

An apparatus comprises a write buffer to buffer store requests issued by the processing circuitry, prior to the store data being written to at least one cache. Draining circuitry detects a draining trigger event having potential to cause loss of state stored in the at least one cache. In response to the draining trigger event, the draining circuitry performs a draining operation to identify whether the write buffer buffers any committed store requests requiring persistence, and when the write buffer buffers at least one committed store request requiring persistence, to cause the store data associated with the at least one committed store request to be written to persistent memory. This helps to eliminate barrier instructions from software, simplifying persistent programming and improving performance.

October 2020

Draining Operation for Draining Dirty Cache Lines to Persistent Memory

Processor Microarchitecture,UK App No. 2014440.8

September 2020

Hardware Bitvector Techniques

Processor Microarchitecture,US16/882,402

Various implementations described herein are related to a device having energy harvesting circuitry that experiences power failures. The device may include computing circuitry having a processor coupled to the energy harvesting circuitry. The processor may be configured to reduce a number of write operations to a log structure having a hardware bit-vector used by the computing circuitry to boost computational progress even with the power failures.

May 2020

Task-Aware Checkpointing for Intermittent Compute

Processor Microarchitecture, UK App No. 2001782.8

February 2020

Instruction ordering

Processor Architecture, US Patent No.: 10,956,166

March 2019

Method and apparatus for implementing lock-free data structures

Processor Architecture, US Patent No.: 10,866,890

November 2018

Robust Transactional Memory

Processor Microarchitecture, US Patent No.: 10,445,238

April 2018

Energy conservation for memory applications

Memory System, US15/927,478

March 2018

Multi-tier cache placement mechanism

Processor Microarchitecture, US Patent No.: 10,831,678

November 2017

Initialisation of a storage device

Memory System, US15/798,259

October 2017

Apparatus and method of handling caching of persistent data

Memory System, US Patent No.: 10,642,743

June 2017

Initialisation of a storage device

Memory System, UK Patent No.: 2561011, US Patent No.: 10,964,386

March 2017

Control of refresh operation for memory regions

Memory System, UK Patent No.: 2560968, US Patent No.: 10,847,204

March 2017

Non-volatile buffer for memory operations

Processor Microarchitecture, Memory System, US Patent No.: 10,719,236

November 2015

Research Interests

In addition to architectural and microarchitectural support for persistent memory, I've looked a bit into programming lanaguages and operating systems support for persistent memory, and thought about tackling translation, protection and persistence holistically. Apart from systems research related to non-volatile memory, I've also looked into novel use cases of emerging non-volatile memories, ranging from off-chip to on-chip memory use cases, and from uses as memory to uses as analog computing device for machine learning and bioinformatics.

Outside of memory, I'm interested in exploring pushing up the boundary between hardware and software, e.g., from ISA to EDGE, just as non-volatile memory shifts up the boundary between volatility and non-volatility from memory/storage to caches/memory. You can find some of my thoughts in this direction in the short write-up (中文).

In the past, I've also led the project on accelerating genomics on Arm, you can find the invited talk that I gave at Cambridge University Computer Laboratory on Genomics at Arm, in which the bit shuffle ISA extensions were accepted to SVE2, also a short abstract on Hardware Accelerators for Genomic Data Processing. In addition, I've spent a couple of years working on the gem5 simulator around 2010, refactoring the memory subsystems and verifying the AArch64 ISA support (including getting Linux to boot on gem5 AArch64), alongside many system and CPU studies with the simulator.

Please get in touch if you are interested in exploring collaboration opportunities. At present, I'm very fortunate to work with talented PhD students Sivert Sliper and Dmitrii Ustiugov as their industrial supervisor, along with their academic supervisors.