I'm a researcher at Arm Research in Cambridge, where I lead the systems research on non-volatile memories. My current research focus is on addressing persistent memory programming challenges with the right architectural and microarchitectural support.
If you're interested in finding out more about my research area, an overview can be found in my recent keynote [slides|video] at PIRL'20 in October 2020, as well as my more recent invited talk [slides] at VCEW'21 in June 2021 and talk [slides] at Dagstuhl Seminar 21462 in November 2021, as well as my invited talk [slides] at NANDA Workshop in September 2022. You can also find a few blogs I wrote to introduce research publications in microarchitectural support, architectural support, and language support of persistent programming.
Architectural support for synchronization of asynchronous instructions in a dataflow triggered instruction set architecture.
An apparatus, method and computer program, the apparatus comprising processing circuitry to execute instructions, issue circuitry to issue the instructions for execution by the processing circuitry, and candidate instruction storage circuitry to store a plurality of condition- dependent instructions, each specifying at least one condition. The issue circuitry is configured to issue a given condition-dependent instruction in response to a determination or a prediction of the at least one condition specified by the given condition-dependent instruction being met, and when the given condition-dependent instruction is a sequence-start instruction, the issue circuitry is responsive to the determination or prediction to issue a sequence of instructions comprising the sequence-start instruction and at least one subsequent instruction.
An apparatus and method are described for generating debug information. The apparatus has processing circuitry for executing a sequence of instructions that includes a plurality of debug information triggering instructions, and debug information generating circuitry for coupling to a debug port. On executing a given debug information triggering instruction, the processing circuitry is arranged to trigger the debug information generating circuitry to generate a debug information signal whose form is dependent on a control parameter specified by the given debug information triggering instruction. The generated debug information signal is output from the debug port for reference by a debugger. The control parameter is such that the form of the debug information signal enables the debugger to determine a state of the processing circuitry when the given debug information triggering instruction was executed.
In a particular implementation, a method includes: receiving, at a computing device, first and second instructions of a plurality of instructions obtained from a memory, where the first instruction corresponds to a preceding instruction of a second instruction, and where the second instruction corresponds to a succeeding instruction of the first instruction; determining a dependency of the first and second instructions; sending the first and second instructions to an issue queue of the computing device; executing, at the computing device, the first and second instructions; and completing, at the computing device, the first and second instructions.
In addition to architectural and microarchitectural support for persistent memory, I've looked into programming lanaguages and operating systems support for persistent memory, and thought about tackling translation, protection and persistence holistically. Apart from systems research related to non-volatile memory, I've also looked into novel use cases of emerging non-volatile memories, ranging from off-chip to on-chip memory use cases, and from uses as memory to uses as analog computing device for machine learning and bioinformatics.
Outside of memory, I'm interested in exploring pushing up the boundary between hardware and software, e.g., from RISC to EDGE, just as non-volatile memory shifts up the boundary between volatility and non-volatility from memory/storage to caches/memory. You can find some of my thoughts in this direction in the short write-up (中文), and my patents related to distributed memory spatial architectures such as the hybrid dataflow architecture with triggered instruction set architecture. In addition to looking up and looking forward in computer architecture, in my spare time, I also enjoy looking back (and looking down the stacks) of past computer architectures, here's an example collection of computer architectures that I curated and researched.
In the past, I've also led the project on accelerating genomics on Arm, you can find the invited talk that I gave at Cambridge University Computer Laboratory on Genomics at Arm, in which the bit shuffle ISA extensions were accepted to SVE2, also a short abstract on Hardware Accelerators for Genomic Data Processing. In addition, I've spent a couple of years working on the gem5 simulator around 2010, refactoring the memory subsystems and verifying the AArch64 ISA support (including getting Linux to boot on gem5 AArch64), alongside many system and CPU studies with the simulator.
Please get in touch if you are interested in exploring collaboration opportunities. At present, I'm very fortunate to work with talented PhD students Sivert Sliper and Dmitrii Ustiugov as their industrial supervisor, along with their academic supervisors.