Wednesday, December 28, 2011

Java 7: Understanding the Phaser

Java 7 introduces a flexible thread synchronization mechanism called Phaser. If you need to wait for threads to arrive before you can continue or start another set of tasks, then Phaser is a good choice. Here is the listing, everything is explained step-by-step.
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import java.util.concurrent.Phaser;

public class PhaserExample {

 public static void main(String[] args) throws InterruptedException {

  List tasks = new ArrayList<>();

  for (int i = 0; i < 2; i++) {
   Runnable runnable = new Runnable() {
    @Override
    public void run() {
     int a = 0, b = 1;
     for (int i = 0; i < 2000000000; i++) {
      a = a + b;
      b = a - b;
     }
    }
   };

   tasks.add(runnable);

  }

  new PhaserExample().runTasks(tasks);

 }

 void runTasks(List tasks) throws InterruptedException {

  final Phaser phaser = new Phaser(1) {
   protected boolean onAdvance(int phase, int registeredParties) {
    return phase >= 1 || registeredParties == 0;
   }
  };

  for (final Runnable task : tasks) {
   phaser.register();
   new Thread() {
    public void run() {
     do {
      phaser.arriveAndAwaitAdvance();
      task.run();
     } while (!phaser.isTerminated());
    }
   }.start();
   Thread.sleep(500);
  }

  phaser.arriveAndDeregister();

 }

}

This example allows to learn a lot about the internals of a Phaser. Let's go through the code:

Line 8-26: The main-Method that creates two Runnable tasks
Line 28: Task list is passed to the runTasks-Method

The runTasks-Method actually uses a Phaser to synchronize the tasks in a way that each task in the list needs to arrive at the barrier before they are executed in parallel. The task list is executed twice. The first cycle is started when both threads arrived at the barrier (see image mark 1). The second cycle is started when both threads arrived at the barrier (see image mark 2).


Notice: "party" is a term in the Phaser context that is equivalent to what we mean by a thread. When one party arrives, then one thread arrived at the synchronization barrier.

Line 34: create a Phaser that has one registered party (this means: at this time phaser expects one thread=party to arrive before it can start the execution cycle)
Line 35: implement the onAdvance-Method to explain that this task list is executed twice (done by: Line 36 says that it returns true if phase is equal or higher then 1)
Line 40: iterate over the list of tasks
Line 41: register this thread with the Phaser. Notice that a Phaser instance does not know the task instances. It's a simple counter of registered, unarrived and arrived parties, shared across participating threads. If two parties are registered then two parties must arrive at the phaser to be able to start the first cycle.
Line 45: tell the thread to wait at the barrier until the arrived parties equal the registered parties
Line 50: Just for demonstration purposes, this line delays execution. The original code snippet prints internal infos about the Phaser state to standard out.
Line 51: two tasks are registered, in total three parties are registered.
Line 53: deregister one party. This results in two registered parties and two arrived parties. This causes the threads waiting (Line 45) to execute the first cycle. (in fact the third party arrived while three were registered - but it does not make a difference)

The original code snippet stored in my Git repository creates the following output:

After phaser init -> Registered: 1 - Unarrived: 1 - Arrived: 0 - Phase: 0
After register -> Registered: 2 - Unarrived: 2 - Arrived: 0 - Phase: 0
After arrival -> Registered: 2 - Unarrived: 1 - Arrived: 1 - Phase: 0
After register -> Registered: 3 - Unarrived: 2 - Arrived: 1 - Phase: 0
After arrival -> Registered: 3 - Unarrived: 1 - Arrived: 2 - Phase: 0
Before main thread arrives and deregisters -> Registered: 3 - Unarrived: 1 - Arrived: 2 - Phase: 0
On advance -> Registered: 2 - Unarrived: 0 - Arrived: 2 - Phase: 0
After main thread arrived and deregistered -> Registered: 2 - Unarrived: 2 - Arrived: 0 - Phase: 1
Main thread will terminate ...
Thread-0:go  :Wed Dec 28 16:09:16 CET 2011
Thread-1:go  :Wed Dec 28 16:09:16 CET 2011
Thread-0:done:Wed Dec 28 16:09:20 CET 2011
Thread-1:done:Wed Dec 28 16:09:20 CET 2011
On advance -> Registered: 2 - Unarrived: 0 - Arrived: 2 - Phase: 1
Thread-0:go  :Wed Dec 28 16:09:20 CET 2011
Thread-1:go  :Wed Dec 28 16:09:20 CET 2011
Thread-1:done:Wed Dec 28 16:09:23 CET 2011
Thread-0:done:Wed Dec 28 16:09:23 CET 2011

Line 1: when the Phaser is initialized in line 34 of the code snippet then one party is registered and none arrived
Line 2: after the first thread is registered in Line 41 in the code example there are two registered parties and two unarrived parties. Since no thread reached the barrier yet, no party is arrived.
Line 3: the first thread arrives and waits at the barrier (line 45 in the code snippet)
Line 4: register the second thread, three registered, two unarrived, one arrived
Line 5: the second thread arrived at the barrier, hence two arrived now
Line 7: one party is deregistered in the code line 53 of the code example, therefore onAdvance-Method is called and returns false. This starts the first cycle since registered parties equals arrived parties (i.e. two). Phase 1 is started -> cycle one (see image mark 1)
Line 8: since all threads are notified and start their work, two parties are unarrived again, non arrived
Line 14: After the threads executed their tasks once they arrive again (code line 45) the onAdvance-Method is called, now the 2nd cycle is executed

OK, go through it and look into my comments in the original code snippet to learn more.

8 comments:

  1. Isn't same thing can be achieved by CountDownLatch or simply wait ? anyway thanks for bringing this as it was lost with other popular addition on JDK7 like fork-join framework.

    ReplyDelete
  2. This was going to be my exact same question, but reading the JavaDoc for Phaser, then rereading the article, the obvious difference is that you can deregister the number of parties that are associated with the lock at anytime, thereby adding a level of flexibility not available to CyclicBarrier or CountDownLatch.

    ReplyDelete
  3. Interesting stuff! How is this different from the Barrier concept from BSP? Or perhaps a better way to put it is, why call the Phaser a Phaser and not a Barrier class>

    ReplyDelete
  4. Sarah, because Barrier is already an overloaded term.

    However, Phaser is simply a cyclic barrier with mutable party counts.

    ReplyDelete
  5. Thanks for that clarification. I agree "Barrier" is an overloaded word, I'm not sure it helps to add a new bit of jargon to the maelstrom, but then naming things is one of the more difficult problems in CS ;)

    ReplyDelete
  6. Nice example .But Anonymous Inner class is less readable.

    ReplyDelete
  7. There is no output of the above program. You forgot to check it.

    ReplyDelete
  8. Can you explain why the value of constructor need to be 1 in Phaser creation?
    final Phaser phaser = new Phaser(1).

    ReplyDelete