Threading Overview

One of the goals of the Liberty platform is to scale well in modern multi-core environments. In support of this, work management, scheduling, and dispatching will be centralized in the threading and event services to allow for greater control over how work is distributed and executed within a server.

Managing Threads

In the classic application server, thread pools were everywhere. There were thread pools for the web container, thread pools for the orb, thread pools for asynch beans, thread pools for JMX notification, thread pools for messaging, thread pools for DRS, thread pools for DCS, etc. This proliferation of thread pools led to a proliferation of threads with, in turn, led to significant resource consumption above what was needed.

In addition the increased resource consumption, each of these pools attempted to manage its own work without any coordination. This generally resulted in sub-optimal dispatching policies.

To help alleviate the issues (real or imagined) with having so many thread pools, components within Liberty are discouraged from (read that as "don't do it") explicitly creating threads or thread pools. The goal is to move components from a model where they "own" threads to a model where they submit tasks for execution and rely on the runtime to handle the mechanics.

Scheduler Implementation

The current implementation of the scheduler is a basic thread pool that employs work stealing. As with a standard thread pool, a global queue is available to hold work that is submitted for execution. In addition to that global queue, each thread in the pool maintains its own stack of work. The scheduler is non-preemptive.

When work is submitted to the scheduler the new work can either be added to the global work queue or it can be pushed to the bottom of a double-ended queue owned by the thread. Threads within the pool will first look at their own work pile for work. If work is found, it will be executed. If no work is found on the local work pile, the thread will then look for work on the global work queue. If no work is found on the global work queue, the thread will look at the other active threads in the pool and will attempt to steal work from another victim thread.

The Work Stealing Deque

The data structure used to maintain the local work piles is a double-ended queue or deque. The thread that owns the deque is adds and removes work from one end while thieves take it from the other end.

Since a deque is owned by a single thread, the push and pop operations can be done without any synchronization until the top and bottom of the list are within some threshold. When synchronization is required, it is done via an atomic compare-and-swap operation that is non-blocking.

There are three types of deques implemented in the threading code. Each of the implementations consists of a circular buffer to hold work, an index to the bottom of the list, and a composite object that holds the top index, and largest steal size.

Steal-Half
A Steal-Half deque is used to implement the local work pile. A compare and swap operation is used to change the steal range each time the size of the deque crosses a power of two boundary. The steal range represents which elements can be stolen by a thief. By changing the steal range at each power of two boundary, we can allow the thief to steal up to 1/2 of the elements in a deque at one time. By allowing larger steals, it allows workloads to be balanced more quickly.
Steal-One
The classic work-stealing deque. This deque only allows a thief to take one element at a time but it allows the owner of the deque to push and pop work without any synchronization until it removes the last item.
Steal-All
A Steal-All deque is an implementation that does not allow the owner to pop items or steal from victims but it allows thieves to take all of the items on the deque in one shot. When in strict-stealing mode, this deque is used to implement the foreign deques as foreign threads will never run work destined for another thread pool.

Benefits of Work Stealing

There are several benefits to a work stealing scheduler.

Strict Work-Stealing Scheduling Policy

Work submitted to the executor from a thread that is not part of the pool can either be pushed to a foreign deque or added to the global work queue. If work is pushed to a foreign deque, the scheduling policy is strict work stealing.

In this mode local threads will have to move work generated by threads outside of the pool to their own deques during steal operations. While this policy avoids locking and hot cache lines, it takes more instructions to manage the work as it will always end up on at least two queues before execution.

Submitting Tasks

The Liberty Event Engine is built on top of the threading services and can be used to easily implement most pipeline or continuation work flows. If the basic services provided by the event engine are insufficient, components may directly submit tasks for execution to an ExecutorService instance made available by the threading code.

Outside of the event engine, there are two ways to acquire a reference to an executor. The first is to resolve a reference to the default scheduler. This scheduler is bound in the service registry behind the java.util.concurrent.ExecutorService. The second option is for a component to request its own named instance of the service. This is accomplished by calling the getStage method on the com.ibm.ws.threading.WorkStageManager service bound in the service registry.

In both cases, the object that is returned implements the Java java.util.concurrent.ExecutorService contract. This will allow various styles of Callable and Runnable scheduling. Please see the ExecutorService javadoc for details.