I'm a researcher at Arm Research in Cambridge, where I lead the systems research on non-volatile memories. My current research focus is on addressing persistent memory programming challenges with the right architectural and microarchitectural support.
If you're interested in finding out more about my research area, an overview can be found in my recent keynote [slides|video] at PIRL'20. You can also find a few blogs I wrote to introduce research publications in microarchitectural support, architectural support, and language support of persistent programming.
An apparatus and method are described for generating debug information. The apparatus has processing circuitry for executing a sequence of instructions that includes a plurality of debug information triggering instructions, and debug information generating circuitry for coupling to a debug port. On executing a given debug information triggering instruction, the processing circuitry is arranged to trigger the debug information generating circuitry to generate a debug information signal whose form is dependent on a control parameter specified by the given debug information triggering instruction. The generated debug information signal is output from the debug port for reference by a debugger. The control parameter is such that the form of the debug information signal enables the debugger to determine a state of the processing circuitry when the given debug information triggering instruction was executed.
In a particular implementation, a method includes: receiving, at a computing device, first and second instructions of a plurality of instructions obtained from a memory, where the first instruction corresponds to a preceding instruction of a second instruction, and where the second instruction corresponds to a succeeding instruction of the first instruction; determining a dependency of the first and second instructions; sending the first and second instructions to an issue queue of the computing device; executing, at the computing device, the first and second instructions; and completing, at the computing device, the first and second instructions.
An apparatus comprises a write buffer to buffer store requests issued by the processing circuitry, prior to the store data being written to at least one cache. Draining circuitry detects a draining trigger event having potential to cause loss of state stored in the at least one cache. In response to the draining trigger event, the draining circuitry performs a draining operation to identify whether the write buffer buffers any committed store requests requiring persistence, and when the write buffer buffers at least one committed store request requiring persistence, to cause the store data associated with the at least one committed store request to be written to persistent memory. This helps to eliminate barrier instructions from software, simplifying persistent programming and improving performance.
Various implementations described herein are related to a device having energy harvesting circuitry that experiences power failures. The device may include computing circuitry having a processor coupled to the energy harvesting circuitry. The processor may be configured to reduce a number of write operations to a log structure having a hardware bit-vector used by the computing circuitry to boost computational progress even with the power failures.
In addition to architectural and microarchitectural support for persistent memory, I've looked a bit into programming lanaguages and operating systems support for persistent memory, and thought about tackling translation, protection and persistence holistically. Apart from systems research related to non-volatile memory, I've also looked into novel use cases of emerging non-volatile memories, ranging from off-chip to on-chip memory use cases, and from uses as memory to uses as analog computing device for machine learning and bioinformatics.
Outside of memory, I'm interested in exploring pushing up the boundary between hardware and software, e.g., from ISA to EDGE, just as non-volatile memory shifts up the boundary between volatility and non-volatility from memory/storage to caches/memory. You can find some of my thoughts in this direction in the short write-up (中文).
In the past, I've also led the project on accelerating genomics on Arm, you can find the invited talk that I gave at Cambridge University Computer Laboratory on Genomics at Arm, in which the bit shuffle ISA extensions were accepted to SVE2, also a short abstract on Hardware Accelerators for Genomic Data Processing. In addition, I've spent a couple of years working on the gem5 simulator around 2010, refactoring the memory subsystems and verifying the AArch64 ISA support (including getting Linux to boot on gem5 AArch64), alongside many system and CPU studies with the simulator.
Please get in touch if you are interested in exploring collaboration opportunities. At present, I'm very fortunate to work with talented PhD students Sivert Sliper and Dmitrii Ustiugov as their industrial supervisor, along with their academic supervisors.