The day school / workshop on language and run-time support for concurrent systems was a great success, with nearly 40 researchers from the UK, USA, Belgium and Spain. Transactional memory was a particular focus of the day, with presentations from leading researchers in this field, such as Tim Harris (Microsoft Research Cambridge) and Tony Hosking (Purdue University).

Don't forget ISMM 2009 in Dublin. The call for papers is here.

The workshop was organised by MM-NET. MM-NET is a network of researchers, interested in memory management for programming languages. The project's mission is to strengthen collaboration between UK industry and academia to further research and development of advanced memory management systems. MM-NET's previous workshops have been highly successful in this respect, bringing together researchers with interests ranging from logics for reasoning about memory use to developers of embedded systems.

We are very grateful to Microsoft Research, Cambridge, UK, who kindly hosted the workshop, and provided lunch and coffee/tea, and to the EPSRC for supporting Tony Hosking's visit to the UK.

Programme

If you haven't yet sent Richard your slides, please do so asap.

08:30 Tea/Coffee/Biscuits available for early arrivals.
8:45 Introduction
Session 1: Programming language abstractions (chair: Ian Watson)
9:00 Tony Hosking, Purdue U Open nested transaction abstractions for fine-grained concurrency
9:20 Alastair Reid, ARM System on Chip C (SoC-C): Efficient programming abstractions for heterogeneous multicore Systems on Chip
9:40 Ferad Zyulkyarov, Barcelona Atomic Quake - Use Case of Transactional Memory in an Interactive Multiplayer Game Server
10:00 Discussion
10:30 Break
Session 2: Haskell (chair: John Reppy)
11:00 Simon Marlow, MSR Comparing and Optimising Parallel Haskell Implementations on Multicore
11:20 Neil Brown, U. Kent STM implementation of CHP
11:40 Nehir Sonmez, Barcelona Profiling STM Applications in Haskell
12:00 Discussion
12:30 Lunch
Session 3: Managing contention (chair: Tony Hosking)
2:00 Tim Harris, MSR Transactional memory with strong atomicity using off-the-shelf memory protection hardware
2:20 Mohammad Ansari, U. Manchester Steal-on-abort: Improving Transactional Memory Performance through Dynamic Transaction Reordering
2:40 Carl Ritson, U. Kent Multi-core scheduling for light-weight communicating processes
3:00 Discussion
3:30 Break
Session 4: Hardware (chair: Richard Jones)
4:00 Stephan Diestelhorst, AMD AMD's Advanced Synchronization Facility
4:20 Mikel Lujan, U. Manchester Object Based TM
5:40 Sutirtha Sanyal, Barcelona Accelerating Hardware Transactional Memory (HTM)
5:00 Discussion
5:30 Wrap up

 

Programme

Abstracts

Tony Hosking Open nested transaction abstractions for fine-grained concurrency
We are seeing many proposals supporting atomic transactions in programming languages, software libraries, and hardware, some with and some without support for nested transactions. In the long run, it is important to support nesting, and to go beyond closed nesting to open nesting. I will argue as to the general form open nesting should take and why, namely that it is a property of classes (data types) not code regions, and must include support for programmed concurrency control as well as programmed rollback. I will also touch on the implications for software or hardware transactional memory in order to support open nesting of this kind, and briefly describe a prototype implementation for Java that we are developing as a way-point for further research. [pdf]

Alastair Reid System on Chip C (SoC-C): Efficient programming abstractions for heterogeneous multicore Systems on Chip
The architectures of system-on-chip (SoC) platforms found in high-end consumer devices are getting more and more complex as designers strive to deliver increasingly compute-intensive applications on near-constant energy budgets. Workloads running on these platforms require the exploitation of heterogeneous parallelism and increasingly irregular memory hierarchies. The conventional approach to programming such hardware is very low-level but this yields software which is intimately and inseparably tied to the details of the platform it was originally designed for, limiting the software's portability, and, ultimately, the architectural choices available to designers of future platform generations. The key insight of this paper is that many of the problems experienced in mapping applications onto SoC platforms come not from deciding how to map a program onto the hardware but from the need to restructure the program and the number of interdependencies introduced in the process of implementing those decisions. We tackle this complexity with a set of language extensions which allows the programmer to introduce pipeline parallelism into sequential programs, manage distributed memories, and express the desired mapping of tasks to resources. The compiler takes care of the complex, error-prone details required to implement that mapping. We demonstrate the effectiveness of SoC-C and its compiler with a "software defined radio" example (the PHY layer of a Digital Video Broadcast receiver) achieving a 3.4x speedup on 4 cores. [ppt,paper]

Tim Harris Transactional memory with strong atomicity using off-the-shelf memory protection hardware
I'll introduce a new way to provide "strong atomicity" in an implementation of atomic blocks using transactional memory. Strong atomicity lets us offer clear semantics to programs, even if they access the same locations inside and outside atomic blocks. It also avoids differences between hardware-implemented transactions and software-implemented ones. Our new idea is to use off-the-shelf page-level memory protection hardware to detect conflicts between normal memory accesses and transactional ones. The page-level system ensures correctness but gives poor performance because of the costs of manipulating memory protection hardware from user-mode and the costs of synchronizing protection settings between processors or cores. However, in practice, we show how a combination of careful object placement and dynamic code update allow us to eliminate almost all of the protection changes. Existing implementations of strong atomicity in software rely on detecting conflicts by conservatively treating some non-transacted accesses as short transactions. In contrast, our page-level technique provides a foundation that lets us be less conservative about how nontransacted accesses are treated; we avoid changes to non-transacted code until a possible conflict is detected dynamically, and we can respond to phase changes where a given instruction sometimes generates conflicts and sometimes does not. We evaluate our implementation with C# versions of many of the STAMP benchmarks. Our implementation requires no changes to the operating system. [pptx]

Simon Marlow Comparing and Optimising Parallel Haskell Implementations on Multicore
We investigate the differences and tradeoffs imposed by different implementations of two parallel Haskell dialects running on a multicore machine. The GpH and Eden dialects of Haskell are both constructed using the highly-optimising sequential GHC compiler, and share a common code base. We consider implementations of both dialects using physically shared-memory and message-passing primitives on a commodity eight-core machine, reporting for the first time on a new shared-memory implementation of Eden, and providing new comparative performance results for all four systems. Since the physically-shared memory implementation of GpH implementation is still immature, our testing has therefore revealed several areas for improvement. We evaluate some of those improvements using our benchmarks, and suggest other improvements for future work. [pptx]

Neil Brown STM implementation of CHP
This talk will outline a "process-oriented" concurrency model centred around encapsulated processes communicating over synchronous channels. We will explain how the ability to choose between communicating on different channels can be very useful, including choosing between different conjoined sets of channel communications. We will give an overview of how this choice has been implemented with Software Transactional Memory. [pdf]

Carl Ritson Multi-core scheduling for light-weight communicating processes
With the growing ubiquity of multi-processor computer systems developers are exploring alternative development paradigms to leverage hardware parallelism. Process-oriented programming, in which a program is designed as a network of communicating component processes, provides one option for developing scalable concurrent software. At University of Kent the CoSMoS project applies process-oriented programming via the occam-pi language to develop large agent-based simulations designed to capture complex emergent behaviour. These real-time interactive simulations involve thousands of agent processes executing concurrently. In this talk we discuss our run-time kernel for the occam-pi language which allows us to efficiently schedule these large numbers of communicating processes on commodity multi-core computer systems. [pdf]

Sutirtha Sanyal Accelerating Hardware Transactional Memory (HTM)
Transactional Memory (TM) is an emerging concept which promises to make parallel programming easier compared to earlier lock based approaches. However, to be efficient, underlying TM system should protect only true shared data and leave thread-local data out of the transaction. This paper proposes a scheme in the context of a lazy-lazy Hardware Transactional Memory (HTM) system to dynamically identify variables which are local to a thread and exclude them from the readset and writeset of the transaction. To achieve this, we propose modest micro-architectural changes and modifications in the virtual memory management unit of the operating system kernel. Two broad categories of local variables namely local variables residing in the thread-private stack and local variables which are dynamically allocated in the heap but used only locally within a thread are identified and excluded. For evaluation we have implemented a lazy-lazy model of HTM in line with the TCC in a full system simulator. We modified the 2.6.13 version of the Linux kernel for our experiments. We observed a significant speedup on all benchmarks since while committing filtered commit-sets over shared bus, a significant amount of interconnection bandwidth is saved. With this minimal protection feature we got an average speed-up of 1.24x, on standardized STAMP benchmarks. [ppt]

Ferad Zyulkyarov Atomic Quake - Use Case of Transactional Memory in an Interactive Multiplayer Game Server
This talk makes the first attempt to present an experience of using transactions in a rich and complex parallel application - a parallel version of the multiplayer Quake server. Compared to the TM workloads used so far, Atomic Quake exhibits irregular parallelism, has I/O and system calls, error handling, instances of privatization. Inside complex transactions, there are function calls, memory management and nested transactions. This presentation focus on the new principles emerging in parallel programming with the use of transactions. It reports about the challenges and the programmer effort required to transactify Quake. [pptx,ppt,pdf]

Nehir Sonmez Profiling STM Applications in Haskell
We present the profiling work that we've been doing for Haskell STM. We discuss certain metrics that was proposed as well as presenting our results on atomic block based profiling and its merits. [ppt]

Mikel Lujan Object Based TM
Transactional Memory (TM) can enable multi-core hardware that dispenses with conventional bus-based cache coherence, resulting in simpler and more extensible systems. This is increasingly important as we move into the many-core era. Within TM, however, the processes of conflict detection and committing still require synchronization and the broadcast of data. By increasing the granularity of when synchronization is required, the demands on communication are reduced. Software implementations of TM have taken advantage of the fact that the object structure of data can be employed to further raise the level at which interference is observed. This talk describes the first hardware TM approach where the object structure is recognized and harnessed. This leads to novel commit and conflict detection mechanisms, and also to an elegant solution to the virtualization of version management, without the need for additional software TM support. [pdf]

Mohammad Ansari Steal-on-abort: Improving Transactional Memory Performance through Dynamic Transaction Reordering
In Transactional Memory (TM), aborted transactions waste computing resources, and reduce performance. Ideally, concurrent execution of transactions should be optimally ordered to minimize aborts, but such an ordering is often either complex, or unfeasible, to obtain. This talk presents a new technique called steal-on-abort, which aims to improve transaction ordering at runtime. When a transaction is aborted, it is typically restarted immediately. However, due to close temporal locality, the immediately restarted transaction may repeat its conflict with the same transaction that aborted it the first time. In steal-on-abort, the aborted transaction is stolen by its opponent transaction, and queued behind it, thus trying to break the close temporal locality. It operates at runtime, and requires no application-specific information or offline pre-processing. [ppt]

Stephan Diestelhorst AMDs Advanced Synchronization Facility
AMD has recently introduced the Advanced Synchronization Facility (ASF), an experimental microprocessor extension that enables the construction of short atomic sections in hardware. These atomic sections can be used to flexibly create atomic read-modify-write constructs, such as double compare-and-exchange, using only a slightly extended instruction set. The presentation will show the key components of ASF and will highlight recent related activities at AMD. [pdf]

Problems with this page?
Contact the mm-net webmaster
Last modified Thu 9 Oct 2008