The day school / workshop on language and run-time support for concurrent systems was a great success, with nearly 40 researchers from the UK, USA, Belgium and Spain. Transactional memory was a particular focus of the day, with presentations from leading researchers in this field, such as Tim Harris (Microsoft Research Cambridge) and Tony Hosking (Purdue University).
Don't forget ISMM 2009 in Dublin. The call for papers is
here.
The workshop was organised by MM-NET. MM-NET is a network
of researchers, interested in memory management for programming
languages. The project's mission is to strengthen collaboration
between UK industry and academia to further research and development
of advanced memory management systems. MM-NET's previous workshops
have been highly successful in this respect, bringing together
researchers with interests ranging from logics for reasoning about
memory use to developers of embedded systems.
We are very grateful to Microsoft Research, Cambridge, UK, who kindly hosted the workshop, and provided lunch and coffee/tea, and to the EPSRC for supporting Tony Hosking's visit to the UK.
If you haven't yet sent Richard your slides, please do so asap.
| 08:30 | Tea/Coffee/Biscuits available for early arrivals. | |
| 8:45 | Introduction | |
| Session 1: Programming language abstractions (chair: Ian Watson) | ||
| 9:00 | Tony Hosking, Purdue U | Open nested transaction abstractions for fine-grained concurrency |
| 9:20 | Alastair Reid, ARM | System on Chip C (SoC-C): Efficient programming abstractions for heterogeneous multicore Systems on Chip |
| 9:40 | Ferad Zyulkyarov, Barcelona | Atomic Quake - Use Case of Transactional Memory in an Interactive Multiplayer Game Server |
| 10:00 | Discussion | |
| 10:30 | Break | |
| Session 2: Haskell (chair: John Reppy) | ||
| 11:00 | Simon Marlow, MSR | Comparing and Optimising Parallel Haskell Implementations on Multicore |
| 11:20 | Neil Brown, U. Kent | STM implementation of CHP |
| 11:40 | Nehir Sonmez, Barcelona | Profiling STM Applications in Haskell |
| 12:00 | Discussion | |
| 12:30 | Lunch | |
| Session 3: Managing contention (chair: Tony Hosking) | ||
| 2:00 | Tim Harris, MSR | Transactional memory with strong atomicity using off-the-shelf memory protection hardware |
| 2:20 | Mohammad Ansari, U. Manchester | Steal-on-abort: Improving Transactional Memory Performance through Dynamic Transaction Reordering |
| 2:40 | Carl Ritson, U. Kent | Multi-core scheduling for light-weight communicating processes |
| 3:00 | Discussion | |
| 3:30 | Break | |
| Session 4: Hardware (chair: Richard Jones) | ||
| 4:00 | Stephan Diestelhorst, AMD | AMD's Advanced Synchronization Facility |
| 4:20 | Mikel Lujan, U. Manchester | Object Based TM |
| 5:40 | Sutirtha Sanyal, Barcelona | Accelerating Hardware Transactional Memory (HTM) |
| 5:00 | Discussion | |
| 5:30 | Wrap up | |
Tony Hosking
Open nested transaction abstractions for fine-grained concurrency
We are seeing many proposals supporting atomic transactions in programming languages, software libraries, and hardware, some with and some without support for nested transactions. In the long run, it is important to support nesting, and to go beyond closed nesting to open nesting. I will argue as to the general form open nesting should take and why, namely that it is a property of classes (data types) not code regions, and must include support for programmed concurrency control as well as programmed rollback. I will also touch on the implications for software or hardware transactional memory in order to support open nesting of this kind, and briefly describe a prototype implementation for Java that we are developing as a way-point for further research.
[pdf]
Alastair Reid
System on Chip C (SoC-C): Efficient programming abstractions for heterogeneous multicore Systems on Chip
The architectures of system-on-chip (SoC) platforms found in high-end
consumer devices are getting more and more complex as designers strive
to deliver increasingly compute-intensive applications on near-constant
energy budgets. Workloads running on these platforms require the
exploitation of heterogeneous parallelism and increasingly irregular
memory hierarchies. The conventional approach to programming such
hardware is very low-level but this yields software which is intimately
and inseparably tied to the details of the platform it was originally
designed for, limiting the software's portability, and, ultimately, the
architectural choices available to designers of future platform
generations. The key insight of this paper is that many of the problems
experienced in mapping applications onto SoC platforms come not from
deciding how to map a program onto the hardware but from the need to
restructure the program and the number of interdependencies introduced
in the process of implementing those decisions. We tackle this
complexity with a set of language extensions which allows the programmer
to introduce pipeline parallelism into sequential programs, manage
distributed memories, and express the desired mapping of tasks to
resources. The compiler takes care of the complex, error-prone details
required to implement that mapping. We demonstrate the effectiveness of
SoC-C and its compiler with a "software defined radio" example (the PHY
layer of a Digital Video Broadcast receiver) achieving a 3.4x speedup on
4 cores. [ppt,paper]
Tim Harris
Transactional memory with strong atomicity using off-the-shelf memory protection hardware
I'll introduce a new way to provide "strong atomicity" in
an implementation of atomic blocks using transactional memory.
Strong atomicity lets us offer clear semantics to programs, even if
they access the same locations inside and outside atomic blocks.
It also avoids differences between hardware-implemented transactions
and software-implemented ones. Our new idea is to use
off-the-shelf page-level memory protection hardware to detect conflicts
between normal memory accesses and transactional ones. The
page-level system ensures correctness but gives poor performance
because of the costs of manipulating memory protection hardware
from user-mode and the costs of synchronizing protection settings
between processors or cores. However, in practice, we show how a
combination of careful object placement and dynamic code update
allow us to eliminate almost all of the protection changes. Existing
implementations of strong atomicity in software rely on detecting
conflicts by conservatively treating some non-transacted accesses
as short transactions. In contrast, our page-level technique provides
a foundation that lets us be less conservative about how nontransacted
accesses are treated; we avoid changes to non-transacted
code until a possible conflict is detected dynamically, and we can
respond to phase changes where a given instruction sometimes generates
conflicts and sometimes does not. We evaluate our implementation
with C# versions of many of the STAMP benchmarks.
Our implementation requires no changes to the operating system. [pptx]
Simon Marlow
Comparing and Optimising Parallel Haskell Implementations on Multicore
We investigate the differences and tradeoffs imposed by different
implementations of two parallel Haskell dialects running on a multicore
machine. The GpH and Eden dialects of Haskell are both constructed using
the highly-optimising sequential GHC compiler, and share a common code
base. We consider implementations of both dialects using physically shared-memory and
message-passing primitives on a commodity eight-core machine, reporting for the
first time on a new shared-memory implementation of Eden, and providing
new comparative performance results for all four systems.
Since the physically-shared memory implementation of GpH implementation is still immature,
our testing has therefore revealed several areas for improvement. We evaluate
some of those improvements using our benchmarks, and suggest other
improvements for future work. [pptx]
Neil Brown
STM implementation of CHP
This talk will outline a "process-oriented" concurrency model
centred around encapsulated processes communicating over synchronous
channels. We will explain how the ability to choose between communicating
on different channels can be very useful, including choosing between
different conjoined sets of channel communications. We will give an
overview of how this choice has been implemented with Software
Transactional Memory. [pdf]
Carl Ritson
Multi-core scheduling for light-weight communicating processes
With the growing ubiquity of multi-processor computer systems
developers are exploring alternative development paradigms to
leverage hardware parallelism. Process-oriented programming, in
which a program is designed as a network of communicating component
processes, provides one option for developing scalable concurrent
software.
At University of Kent the CoSMoS project applies process-oriented
programming via the occam-pi language to develop large agent-based
simulations designed to capture complex emergent behaviour. These
real-time interactive simulations involve thousands of agent processes
executing concurrently. In this talk we discuss our run-time kernel
for the occam-pi language which allows us to efficiently schedule
these large numbers of communicating processes on commodity multi-core
computer systems. [pdf]
Sutirtha Sanyal
Accelerating Hardware Transactional Memory (HTM)
Transactional Memory (TM) is an emerging concept which promises
to make parallel programming easier compared to earlier lock based approaches.
However, to be efficient, underlying TM system should protect
only true shared data and leave thread-local data out of the transaction.
This paper proposes a scheme in the context of a lazy-lazy Hardware Transactional Memory
(HTM) system to dynamically identify variables which are local to a thread and
exclude them from the readset and writeset of the transaction. To achieve
this, we propose modest micro-architectural changes and modifications
in the virtual memory management unit of the operating system kernel.
Two broad categories of local variables namely local variables residing in the
thread-private stack and local variables which are dynamically allocated
in the heap but used only locally within a thread are identified and excluded.
For evaluation we have implemented a lazy-lazy model of HTM in line
with the TCC in a full system simulator. We modified the 2.6.13 version of
the Linux kernel for our experiments. We observed a significant speedup
on all benchmarks since while committing filtered commit-sets over shared bus, a significant
amount of interconnection bandwidth is saved. With this minimal
protection feature we got an average speed-up of 1.24x, on standardized
STAMP benchmarks.
[ppt]
Ferad Zyulkyarov
Atomic Quake - Use Case of Transactional Memory in an Interactive Multiplayer Game Server
This talk makes the first attempt to present an experience of using
transactions in a rich and complex parallel application - a parallel version of
the multiplayer Quake server. Compared to the TM workloads used so far, Atomic
Quake exhibits irregular parallelism, has I/O and system calls, error handling,
instances of privatization. Inside complex transactions, there are function
calls, memory management and nested transactions. This presentation focus on the
new principles emerging in parallel programming with the use of transactions. It
reports about the challenges and the programmer effort required to transactify
Quake. [pptx,ppt,pdf]
Nehir Sonmez
Profiling STM Applications in Haskell
We present the profiling work that we've been doing for Haskell
STM. We discuss certain metrics that was proposed as well as
presenting our results on atomic block based profiling and its
merits. []
Mikel Lujan
Object Based TM
Transactional Memory (TM) can enable multi-core hardware that dispenses
with conventional bus-based cache coherence, resulting in simpler and
more extensible systems. This is increasingly important as we move into
the many-core era. Within TM, however, the processes of conflict
detection and committing still require synchronization and the broadcast
of data. By increasing the granularity of when synchronization is
required, the demands on communication are reduced. Software
implementations of TM have taken advantage of the fact that the object
structure of data can be employed to further raise the level at which
interference is observed.
This talk describes the first hardware TM approach where the object
structure is recognized and harnessed. This leads to novel commit and
conflict detection mechanisms, and also to an elegant solution to the
virtualization of version management, without the need for additional
software TM support. [pdf]
Mohammad Ansari
Steal-on-abort: Improving Transactional Memory Performance through
Dynamic Transaction Reordering
In Transactional Memory (TM), aborted transactions waste computing
resources, and reduce performance. Ideally, concurrent execution of
transactions should be optimally ordered to minimize aborts, but such an
ordering is often either complex, or unfeasible, to obtain. This talk
presents a new technique called steal-on-abort, which aims
to improve transaction ordering at runtime. When a transaction is
aborted, it is typically restarted immediately. However, due to close
temporal locality, the immediately restarted transaction may repeat its
conflict with the same transaction that aborted it the first time. In
steal-on-abort, the aborted transaction is stolen by its opponent
transaction, and queued behind it, thus trying to break the close
temporal locality. It operates at runtime, and requires no
application-specific information or offline pre-processing.
[ppt]
Stephan Diestelhorst
AMDs Advanced Synchronization Facility
AMD has recently introduced the Advanced Synchronization Facility
(ASF), an experimental microprocessor extension that enables the
construction of short atomic sections in hardware. These atomic
sections can be used to flexibly create atomic read-modify-write
constructs, such as double compare-and-exchange, using only a
slightly extended instruction set.
The presentation will show the key components of ASF and will
highlight recent related activities at AMD.
Problems with this page?
Contact the mm-net webmaster
Last modified Thu 9 Oct 2008