Niklas' Blog

Setting up MongoDB for bi-temporal data

2019-03-01T16:09:00.001+01:00

See this tutorial: http://www.projectbarbel.org/docs/mongotutorial

Manage bitempotal data with BarbelHisto

2019-02-27T22:33:00.001+01:00

The last 15 years I've worked on projects for insurance businesses, implementing a variety of policy management systems. The topic that has always been a major requirement was to store the policies and their changes in a way that is traceable for audits and customer claims. Implementing bullet-proof bitemporal data storage has always taken a reasonable amount of time (and nerves). Every time we've implemented a new policy management system we have been on the lookout for a reusable component for bitemporal data. The few options we found did not really satisfy our needs, or had too many technical constraints. For that reason I've decided to implement my own open source library that I'd like to share with you guys: BarbelHisto. With this lightweight library I want to address that bitemporal data storage requirement, without any bothersome constraints. Just managing bitemporal data, that's it. No technology backpack.

Modern State Pattern using Enums and Functional Interfaces

2019-01-30T15:23:00.002+01:00

It’s offen the case that the behaviour of an object should change depending on the objects state. Consider a ShoppingBasket object. You can add articles to your basket as long the order isn’t submitted. But once it’s submitted you typically don’t want to be able to change that order anymore. So, there are two states in such a shopping basket object: READONLY and UPDATEABLE . Here is the ShoppingBasket class.

In such a class, you can add articles and maybe you can perform an order. Once you’ve performed an order, the client of such an object would still be able to change that order object which should not be possible. To prevent clients from updating the order, that was already submitted, we want to change the behaviour of the ShoppingBasket . It should not be possible to add articles or change the orderNo field, once the order is submitted. What’s an intelligent object-oriented modern Java solution to such a problem? What I usually do in such cases, I use an enum to implement a GoF state pattern. Here is such an enum :

My UpdateState enum takes an Runnable object as constructor argument. You can use more complicated functional interfaces to suit specific needs, the sky is your limit in terms of complexity here. But for now, its an ordinary Runnable interface. The UpdateState enum has exactly two states: UPDATEABLE and READONLY . The UPDATEABLE enum value does validate to true, always, the READONLY enum value always evaluates to false, which results in an InvalidStateException (using Apache Commons Lang Validate class). The UpdateState enum has a method called set() which takes an argument, and returns exactly that given argument. But before returning the argument, the set() method runs the state dependend Runnable action. Now, why all that hassle?

The ShoppingBasket now has a state field of enum type UpdateState . That state field defaults to UPDATEABLE cause when you create the ShoppingBasket it’s always updateable, meaning: the order wasn’t submitted yet. When you fire the order through the order() method, the state changes to READONLY . Since the state changed to read-only, the ShoppingBasket will change its behaviour, specifically when clients try to access the class fields. Let’s look at the setOrderNo() method for instance. The setOrder() method does not assign the order number directly to the orderNo field anymore, instead it calls the UpdateState enums set() method, which returns that given value you want to set. That return value is assigned to your orderNo field. The set() method of the UpdateState enum always checks whether updates are allowed. So when your ShoppingBaskets state is UPDATEABLE, the set() method will succeed, but when its READONLY then the set() method of that state will result in IllegalStateException. This was exactly what we’ve wanted to achieve in the beginning, make that object read-only, if the order is submitted.

Notice that you can make such a state pattern implementation as complex as required. It’s a very elegant, short option to drive your objects behaviour by objects state. And it saves you a lot if-else typing non-object-oriented logic in all the accessor methods. Consider classes that have 20 fields, you don’t want to check state each time in all the methods. That would clearly clutter up your class code. Using the demonstrated state pattern, you save lines of code and your place looks quite tidy. Change the functional interface used in the UpdateState enum and you’ll realize the great potential of state dependend behaviour that can be implemented with very little lines of code.

Passing multiple arguments into stream filter predicates

2019-01-29T16:18:00.002+01:00

When I am working with java streams I am intensively using filters to find objects. I offen have the situation where I'd like to pass two arguments to the fiter function. Unfortunately the standard API does only accept a Predicate but not BiPredicate.

To solve this limitation I define all my predicates as methods in a class, say Predicates. That predicate class takes a constant parameter.

When I am using the Predicates I instintiate an instance with the constant parameter of my choice. Then I can use the instance methods as method references passed to the filter. Like so:

5' on IT-Architecture: the modern software architect

2012-08-17T13:16:00.006+02:00

Before I start writing about this let me adjust something right at the beginning:

Yes of course, there is the role of a "software architect" in any non-trivial software development project. Even in times of agile projects, dynamic markets and vague terms like "emergence". The simple reason for that is that emergence and democracy in teams only work within constraints. Though, it's not always clever to assign somebody the role explicitly. In an ideal world one developer in that team evolves into the architecture role.

When I started working as an IT professional at a *big* american software & IT consulting company I spent around five years with programming. After that time I got my first architecture job on a big project at a german automotive manufacturer. My main responsibility was to design the solution, advice developers, project managers and clients in doing things and to organize the development process. I wrote many documents, but I didn't code anymore. The result was that I lost expertise in my core business: programming. So after a while my assessments and gut instinct got worse, which results in worse decisions. As a sideeffect of generic (vague) talking it got harder to gain acceptance by the developers, project managers or clients. When I realized all that I decided to do more development again. Today, I am doing architecture for 10 years. I am developing code in the IDE of my choice at least 20-30% of my time.

Avtivity profile

Whilst programming is a necessary activity, there is a whole bunch of activities that are sufficient to be successful as an architect. Doing architecture is a lot about collaboration, evaluating alternatives objectively (neutral and fair-minded) and about decision making. It's a lot about communication, dealing with other individuals that almost always have their own opinions. Further more it's a lot about forming teams and designing the ideal development process around those teams to solve the concrete problem. Last not least it's about designing (structuring) the solution in a way that all functional and non-functional requirements are well covered. You can do all that more or less without super actual technical knowledge. But I believe an architect can do better if he/she has technical expertise gathered by day-to-day coding business. In the long run you cannot be a technical architect without sufficient coding practice.

Figure 1: Activities of the software architect

Solving tradeoffs

When I worked as an architect I often found myself in difficult tradeoff situations. That is, I wanted to improve one quality attribute, but to achieve that I needed to downgrade another. Here is a simple but very common example: its often desireable to have a highly changeable system with best possible performance. However, these two attributes - performance and changeability - typically correlate negatively, when you want to increase changeability you often loose efficiency. Doing architecture often means to find the golden mean between competing system qualities - it means choosing the right alternative that represents the best compromise. It's about finding the balance between system qualities and the environmental factors of that system (e.g. steakholders, requirements). The operations manager will focus on the efficiency of a new system, while the development manager will argue that it's important to have a changeable system that generates little maintenance costs. The client wants to have a new system with the highest degree of business process automation as possible. These situations consume a reasonalbe amount of time and energy.

Sharing knowledge and communication

Another superior important activity: sharing knowledge in a team of technical experts and other steakholders. The core problem of software development is to transform fuzzy knowledge of domain experts into merciless logical machine code of silly computers that only understand two digits: 0 and 1. This is a long way through the venturesome and endless jungle of human misunderstandings! Therefore, architects communicate a lot. They use models to do that. Models serve as a mapping mechanism between human brains and computers. The set of problems that can arise during the knowledge-to-binary transformation is very diverse. It's impossible that every team member knows all of them. That's another reason why sharing knowledge in a team is so superior important.

Nobody is perfect!

Needless to say that nobody is perfect. Every team is different and so is every concrete situation. So in one situation one may be the right architect for the team while in other team set-ups that person doesn't fit. An architect can also have different strengths. I know architects that communicate and socialize very well but don't do so good in designing solutions or organizing the development process. Although they don't master each individual skill, they're all good architects. The common ground is that they were all down-to-earth developers.

That's all I wanted to express today.
So long, Niklas

5' on IT-Architecture: root concepts explained by the pioneers of software architecture

2012-07-18T16:59:00.003+02:00

The last couple of weeks I am working on a new software architecture course specifically for the insurance and financial sector. During the preparations I was reading many of the most cited articles on software architecture. The concepts described in these articles are so fundamental (and still up-to-date) that every architect really should know about them. I have enjoyed reading such "old" stuff. I first read most of the cited articles during my studies at university in the mid 90s. It is surprising to realize that, the longer you're in this business, the more you agree to the ideas explained - in articles that were written 40 years ago! I've decided to qoute the original text passages - may be I thought it would be overbearing to explain it in my own words ;-) I hope you enjoy reading these text passages from the pioneers of software architecture.

On the criteria for system decomposition

"Many readers will now see what criteria were used in each decomposition. In the first decomposition the criterion used was to make each major step in the processing a module. One might say that to get the first decomposition one makes a flowchart. This is the most common approach to decomposition or modularization. It is an outgrowth of all programmer training which teaches us that we should begin with a rough flowchart and move from there to a detailed implementation. The flowchart was a useful abstraction for systems with on the order of 5,000-10,000 instructions, but as we move beyond that it does not appear to be sufficient; something additional is needed.

The second decomposition was made using "information hiding" as a criterion. The modules no longer correspond to steps in the processing. [...] Every module in the second decomposition is characterized by its knowledge of a design decision which it hides from all others. Its interface or definition was chosen to reveal as little as possible about its inner workings."

in: On the Criteria To Be Used in Decomposing Systems into Modules, D.L. Parnas, 1972

On the information hiding design principle

"Our module structure is based on the decomposition criterion known as information hiding [IH]. According to this principle, system details that are likely to change independently should be the secrets of separate modules; the only assumptions that should appear in the interfaces between modules are those that are considered unlikely to change. Each data structure is used in only one module; it may be directly accessed by one or more programs within the module but not by programs outside the module. Any other program that requires information stored in a module’s data structures must obtain it by calling access programs belonging to that module.

Applying this principle is not always easy. It is an attempt to minimize the expected cost of software and requires that the designer estimate the likelihood of changes. Such estimates are based on past experience, and may require knowledge of the application area, as well as an understanding of hardware and software technology."

in: The Modular Structure of Complex Systems, D.L. Parnas, 1985

On module hierarchies

"In discussions of system structure it is easy to confuse the benefits of a good decomposition with those of a hierarchical structure. We have a hierarchical structure if a certain relation may be defined between the modules or programs and that relation is a partial ordering. The relation we are concerned with is "uses" or "depends upon". [...] The partial ordering gives us two additional benefits. First, parts of the system are benefited (simplified) because they use the services of lower levels. Second, we are able to cut off the upper levels and still have a usable and useful product. [...] The existence of the hierarchical structure assures us that we can "prune" off the upper levels of the tree and start a new tree on the old trunk. If we had designed a system in which the "low level" modules made some use of the "high level" modules, we would not have the hierarchy, we would find it much harder to remove portions of the system, and "level" would not have much meaning in the system."

in: On the Criteria To Be Used in Decomposing Systems into Modules, D.L. Parnas, 1972

On the separation of concerns

"Let me try to explain to you, what to my taste is characteristic for all intelligent thinking. It is, that one is willing to study in depth an aspect of one's subject matter in isolation for the sake of its own consistency, all the time knowing that one is occupying oneself only with one of the aspects. We know that a program must be correct and we can study it from that viewpoint only; we also know that it should be efficient and we can study its efficiency on another day, so to speak. In another mood we may ask ourselves whether, and if so: why, the program is desirable. But nothing is gained —on the contrary!— by tackling these various aspects simultaneously. It is what I sometimes have called "the separation of concerns", which, even if not perfectly possible, is yet the only available technique for effective ordering of one's thoughts, that I know of. This is what I mean by "focussing one's attention upon some aspect": it does not mean ignoring the other aspects, it is just doing justice to the fact that from this aspect's point of view, the other is irrelevant. It is being one- and multiple-track minded simultaneously."

in: On the role of scientific thought, Edsger W. Dijkstra, 1974

On conceptual integrity

"Such design coherence in a tool not only delights, it also yields ease of learning and ease of use. The tool does what one expects it to do. I argued [...] that conceptual integrity is the most important consideration in system design. Sometimes the virtue is called coherence, sometimes consistency, sometimes uniformity of style [...] The solo designer or artist usually produces works with this integrity subconsciously; he tends to make each microsdecision the same way each time he encounters it (barring strong reasons). If he fails to produce such integrity, we consider the work flawed, not great."

in: The Design of Design, Frederick P. Brooks, 2010 (originally introduced in: The Mythical Man Month, 1975)

5' on IT-Architecture: four laws of robust software systems

2012-06-25T15:39:00.003+02:00

Murphy's Law ("If anything can go wrong, it will") was born at Edwards Air Force Base in 1949 at North Base. It was named after Capt. Edward A. Murphy, an engineer working on Air Force Project MX981, (a project) designed to see how much sudden deceleration a person can stand in a crash. One day, after finding that a transducer was wired wrong, he cursed the technician responsible and said, "If there is any way to do it wrong, he'll find it."

For that described reason it may be good to put some quality assurance process in place. I could also call this blog "the four laws of steady software quality". It's about some fundamental techniques that can help to achieve superior quality over a longer distance. This is particularly important if you're developing some central component that will cause serious damage if it fails in production. OK, here is my (never final and not holistic) list of practical quality assurance tipps.

Law 1: facilitate change

There is nothing permanent except change. If a system isn't designed in accordance to this superior important reality, then the probability of failure may increase above average. A widely used technique to facilitate change is the development of a sufficient set of unit tests. Unit testing enables to uncover regressions in existing functionality after changes have been made to a system. It also encourages to really think about the desired functionality and required design of the component under development.

Law 2: don't rush through the functional testing phase

In economics, the marginal utility of a good is the gain (or loss) from an increase (or decrease) in the consumption of that good. The law of diminishing marginal utility says, that the marginal utility of each (homogenous) unit decreases as the supply of units increases (and vice versa). The first functional test cases often walk through the main scenarios covering the main paths of the considered software. All the code tested wasn't executed before. These test cases have a very high marginal utility. Subsequent test cases may walk through the same code ranges except specific sidepaths at specific validation conditions for instance. These test cases may cover three or four additional lines of code in your application. As a result, they will have a smaller marginal utility then the first test cases.

My law about functional testing suggests: as long the execution of the next test case yields a significant utility the following applies: the more time you invest into testing the better the outcome! So don't rush through a functional testing phase and miss out some useful test case (this assumes the special case in which usefulness can be quantified). Try to find the useful test cases that promise a significant gain in perceptible quality. On the other hand, if you're executing test cases with a negative marginal utility you're actually investing more effort then you gain in terms of perceptible quality. There is this special (but not uncommon) situation where the client does not run functional tests on systematic bases. This law will then suggest: the longer the application is in the test environment, the better the outcome.

Law 3: run (non-functional) benchmark tests

Another peace of good permanent software quality is a regular load test. To make results usable load tests need a defined steady environment and a baseline of measured values (a benchmark). These values are at least: CPU, response time, memory footprint. Load tests of new releases can be compared to those load tests of older releases. That way we can also bypass the often stated requirement that the load test environment needs to have the same capacity parameters then the production environment. In many cases it is possible to see the real big issues with a relatively small set of parallel users (e.g. 50 users).

It makes limited sense to do load testing if single user profiling results are bad. Therefore it's a good idea to perform repeatable profiling test cases with every release. This way profiling results can be compared to each other (again: the benchmark idea). We do CPU and elapsed time profiling as well as memory profiling. Profiling is an activity that runs in parallel to actual development. It makes sence to focus on the main scenarios used regularly in production.

Law 4: avoid dependency lock-in

The difference between trouble and severe crisis is the time it takes to fix the problem that causes the trouble. For this reason you may always need a way back to your previous release, you need a fallback scenario to avoid a production crisis with severe business impact. You enable rollback by avoiding dependency lock-in. Runtime-dependencies of your application may exist to neighbouring systems by joint interface or contract changes during development. If you implemented requirements that resulted in changed interfaces and contracts, then you cannot simply roll back, that's obvious. Therefore you need to avoid too many interface and contract changes. Small release cycles help to reduce dependencies between application versions in one release 'cause less changes are rolled to production. Another counteraction against dependency lock-in is to let neighbouring systems be downwoards compatible for one version.

That's it in terms of robust systems.
Cheers, Niklas

5' on IT-Architecture: three laws of good software architecture

2012-05-15T16:38:00.000+02:00

The issue with architectural decisions is that they effect the whole system and/or you often need to make them early in the development process. It means a lot effort if you change that decision a couple of months later. From an economic standpoint architectural decisions are often irrevocable. Good architecture is one that allows an architect to make late decisions without superior effect on efforts and costs. Let's put that on record.

Law 1: Good architecture is one that enables architects to have a minimum of irrevocable decisions.

To minimize the set of irrevocable decisions the system needs to be responsive to change. There is a major lesson I have learned about software development projects: Nothing is permanent except change. The client changes his opinion about requirements. The stakeholders change their viewpoint of what's important. People join and leave the project team. The fact that change alone is unchanging leads me to the second rule of good architecture, that is:

Law 2: To make decisions revocable you need to design for flexibility.

This is the most provocative statement and I am having controversial discussions here. The reason is that flexibility introduces the need for abstraction. Abstraction uses a strategy of simplification, wherein formerly concrete details are left ambiguous, vague, or undefined (from Wikipedia). This simplification process isn't always simple to do and to follow for others in particular. "Making something easy to change makes the overall system a little more complex, and making everything easy to change makes the entire system very complex. Complexity is what makes software hard to change." from M. Fowler) This is one core problem of building good software architecture: Developing software that is easy to change but at the same time understandable. There are several concepts that try to tackle this paradox problem: design patterns and object oriented design principles. Polymorphism, loose coupling and high cohesion are flexibility enablers to me.

Law 3: To make use of flexibility one needs to refactor mercilessly.

Flexibility is not an end in itself. You need to actively make use of flexible design. If something is changing and it makes a previous design or architectural decision obsolete you need to go into the code and change the software. Otherwise the effort of building flexible software is useless and technical debt may cause late delays and a maintenance nightmare. The fact that you take rigorous action on your code base requires continuous feedback about the qualities of your software. To be able to refactor it is therefore essential that the code base is covered by a sufficient amount of automated tests. In an ideal scenario everything is integrated into a continuous integration environment to receive permanent feedback about the health of your code base.

Java 7: NIO.2 I/O operations on asynchronous channels are not atomic

2012-05-04T15:09:00.004+02:00

This part of my NIO.2 series wasn't on schedule when I started writing about NIO.2 aasynchronous file channels. It deals with an important detail: read and write operations are not atomic. What that means is that AsynchronousFileChannel -> write() does not garantee to write all bytes passed as parameter to the destination file. Instead it returns the number of bytes written as return parameters of the corresponding I/O operations and the client needs to deal with situations where the bytes written isn't equal to the remaining bytes in the passed ByteBuffer.

Let's recall the method signatures of the read() and write() operations in the AsynchronousFileChannel interface for a moment.

public abstract <A> void write(ByteBuffer src,
                                   long position,
                                   A attachment,
                                   CompletionHandler<Integer,? super A> handler);

    public abstract <A> void read(ByteBuffer dst,
                                  long position,
                                  A attachment,
                                  CompletionHandler<Integer,? super A> handler);

As you can see these signatures offer to pass a completion handler. I've already intruduced the completion handler in my last blog about closing file channels safely. You could also use the completion handler to enforce that all bytes are written or read when you perform I/O operations on an asynchronous channel. Here is the code snippet that does the Job.

The readAll (line 14) and the writeFully methods (line 35) both call the corresponding read or write operations on asynchronous file channels recursively. This recursion ends, when the bytes of the source ByteBuffer are transferred completely. Notice that the main thread has to wait for these recursions to finnish. Therefore a CountDownLatch stops the main thread until all bytes are processed by the I/O thread that executes the CompletionHandler.

The explained procedure works because the position of the source or destination ByteBuffer is always in sync with the actual bytes transferred. Another important fact is that the write() and read() operations in the CompletionHandler are chained. That is when the write task completes one new one is issued and when this completes one new one is issued and so forth. Allthough different JVM threads will participate, there won't be an issue in sharing that same (non thread safe) ByteBuffer instance.

The NIO.2 file channels series:
- Introduction
- Applying custom thread pools
- Closing file channels without loosing data
- I/O operations are not atomic

Java 7: Closing NIO.2 file channels without loosing data

2012-05-04T15:08:00.002+02:00

Closing an asynchronous file channel can be very difficult. If you submitted I/O tasks to the asynchronous channel you want to be sure that the tasks are executed properly. This can actually be a tricky requirement on asynchronous channels for several reasons. The default channel group uses deamon threads as worker threads, which isn't a good choice, cause these threads just abandon if the JVM exits. If you use a custom thread pool executor with non-deamon threads (see last part of this series) you need to manage the lifecycle of your thread pool yourself. If you don't the threads just stay alive when the main thread exits. Hence, the JVM actually does not exit at all, what you can do is kill the JVM.

Another issue when closing asynchronous channels is mentioned in the javadoc of AsynchronousFileChannel: "Shutting down the executor service while the channel is open results in unspecified behavior." This is because the close() operation on AsynchronousFileChannel issues tasks to the associated executor service that simulate the failure of pending I/O operations (in that same thread pool) with an AsynchronousCloseException. Hence, you'll get RejectedExecutionException if you perform close() on an asynchronous file channel instance when you previously closed the associated executor service.

That all being said, the proposed way to safely configure the file channel and shutdown that channel goes like this:

The custom thread pool executor service is defined in lines 6 and 7. The file channel is defined in lines 10 to 13. In the lines 18 to 20 the asynchronous channel is closed in an orderly manner. First the channel itself is closed, then the executor service is shutdown and last not least the thread awaits termination of the thread pool executor.

Although this is a safe way to close a channel with a custom executor service, there's a new issue introduced. The clients submitted asynchronous write tasks (line 16) and may want be sure that, once they've been submitted successfully, those tasks will definitely be executed. Always waiting for Future.get() to return (line 23), isn't an option, cause in many cases this would lead *asynchronous* file channels ad adsurdum. The snippet above will return lot's of "Task wasn't executed!" messages cause the channel is closed immediately after the write operations were submitted to the channel (line 18). To avoid such 'data loss' you can implement your own CompletionHandler and pass that to the requested write operation.

The CompletionHandler.failed() method (line 16) catches any runtime exception during task processing. You can implement any compensation code here to avoid data loss. When you work on mission critical data, then it may be a good idea to use CompletionHandlers. But *still* there's another issue. The clients can submit tasks but they don't know if the pool will successfully process these tasks. Successful in this context means that the bytes submitted actually reach their destination (the file on the hard disk). If you want to be sure that all submitted tasks are actually processed before closing, it gets a little trickier. You need a 'graceful' closing mechanism, that waits until the work queue is empty *before* it actually closes the channel and the associated executor service (this isn't possible using standard lifecycle methods).

Introducing GracefulAsynchronousChannel

My last snippets introduce the GracefulAsynchronousFileChannel. You can get the complete code here in my Git repository. The behaviour of that channel is like this: guarantee to process all successfully submitted write operations and throw an NonWritableChannelException if the channel prepares shutdown. It takes two things to implement that behaviour. Firstly, you'll need to implement the afterExecute() in an extension of ThreadPoolExecutor that sends a signal when the queue is empty. This is what DefensiveThreadPoolExecutor does.

The afterExecute() method (line 12) is executed after each processed task by the thread that processed that given task. The implementation sends the isEmpty signal in line 18. The second part you need two gracefully close a channel is a custom implementation of the close() method of AsynchronousFileChannel.

Study that code for a while. The interesting bits are in line 11 where the innerChannel gets replaced by a read-only channel. That causes any subsequent asynchronous write requests to fail with an NonWritableChannelException. In line 16 the close() method waits for the isEmpty signal to happen. When this signal is send after the last write task the close() method continues with an orderly shutdown procedure (line 27 ff.). Basically, the code adds a shared lifecycle state across the file channel and the associated thread pool. That way both objects can communicate during the shutdown procedure and avoid data loss.

Here is a logging client that uses the GracefulAsynchronousFileChannel.

The client starts two threads, one thread issues write operations in an infinite loop (line 6 ff.). The other thread closes the file channel asynchronously after one second of processing (line 25 ff.). If you run that client, then the following output is produced:

Starting graceful shutdown ...
Deal with the fact that the channel was closed asynchronously ... java.nio.channels.NonWritableChannelException
Channel blocked for write access ...
Waiting for signal that queue is empty ...
Issueing signal that queue is empty ...
Received signal that queue is empty ... closing
File closed ...
Pool closed ...
Expected file size (bytes): 400020
Actual file size (bytes): 400020
No write operation was lost!

The output shows the orderly shutdown procedure of participating threads. The logging thread needs to deal with the fact that the channel was closed asynchronously. After the queued tasks are processed the channel resources are closed. No data was lost, everything that the client issued was really written to the file destination. No AsynchronousClosedExceptions or RejectedExecutionExceptions in such a graceful closing procedure.

That's all in terms of safely closing asynchronous file channels. The complete code is here in my Git repository. I hope you've enjoyed it a little. Looking forward to your comments.
Cheers, Niklas

The NIO.2 file channels series:
- Introduction
- Applying custom thread pools
- Closing file channels without loosing data
- I/O operations are not atomic

Threading stories: ThreadLocal in web applications

2012-04-19T14:22:00.001+02:00

This week I spend reasonable time to eliminate all our ThreadLocal variables in our web applications. The reason was that they created classloader leaks and we coudn't undeploy our applications properly anymore. Classloader leaks happen when a GC root keeps referencing an application object after the application was undeployed. If an application object is still referenced after undeploy, then the whole class loader can't be garbage collected cause the considered object references your applications class file which in turn references the classloader. This will cause an OutOfMemoryError after you've undeployed and redeployed a couple of times.

ThreadLocal is one classic candidate that can easily create classloader leaks in web applications. The server is managing its threads in a pool. These threads live longer then your web application. In fact they don't die at all until the underlying JVM dies. Now, if you put a ThreadLocal in a pooled thread that references an object of your class you *must* be careful. You need to make sure that this variable is removed again using ThreadLocal.remove(). The issue in web applications is: where is the right place to safely remove ThreadLocal variables? Also, you may not want to modify that "removing code" every time a colleague decided to add another ThreadLocal to the managed threads.

We've developed a wrapper class around thread local that keeps all the thread local variables in one single ThreadLocal variable. Here is the code.

The advantage of the utility class is that no developer needs to manage the thread local variable lifecycle individually. The class puts all the thread locals in one map of variables. The destroy() method can be invoked where you can safely remove all thread locals in your web application. In our case thats a ServletRequestListener -> requestDestroyed() method. You will also need to place finally blocks elsewhere. Typical places are near the HttpServlet, in the init(), doPost(), doGet() methods. This may remove all thread locals in the pooled worker threads after the request is done or an exception is thrown unexpectedly. Sometimes it happens that the main thread of the server leaks thread local variables. If that is the case you need to find the right places where to call the ThreadLocalUtil -> destroy() method. To do that figure out where the main thread actually *creates* the thread variables. You could use your debugger to do that.

Many guys out there suggest to ommit ThreadLocal in web applications for several reasons. It can be very difficult to remove them in a pooled thread environment so that you can undeploy the applications safely. ThreadLocal variables can be useful, but it's fair to consider other techniques before applying them. An alternative for web applications to carry request scope parameters is the HttpServletRequest. Many web frameworks allow for generic request parameter access as well as request/session attribute access, without ties to the native Servlet/Portlet API. Also many framework support request scoped beans to be injected into an object tree using dependency injection. All these options fulfill most requirements and should be considered prior to using ThreadLocal.

Java 7: NIO.2 File Channels on the test bench - Part 2 - Applying custom thread pools

2012-04-05T09:42:00.009+02:00

Asynchronous file processing isn't a green card for high performance. In my last post I have demonstrated that conventional I/O can be faster then asynchronous channels. There are some additional important facts to know when applying NIO.2 file channels. The Iocp class that performs all the asynchronous I/O tasks in NIO.2 file channels is, by default, backed by a so called "cached" thread pool. That's a thread pool that creates new threads as needed, but will reuse previously constructed threads *when* they are available. Look at the code of the ThreadPool class held by the Iocp.

The thread pool in the default channel group is constructed as ThreadPoolExecutor with a maximum thread count of Integer.MAX_VALUE and a keep-alive-time of Long.MAX_VALUE. The threads are created as daemon threads by the thread factory. A synchronous hand-over queue is used to trigger thread creation if all threads are busy. There are several issues with this configuration:

1. If you perform write operations on asynchronous channels in a burst you will create thousands of worker threads which likely results in an OutOfMemoryError: unable to create new native thread.
2. When the JVM exits, then all deamon threads are abandoned - finally blocks are not executed, stacks are not unwound.

In my other blog I have explained why unbounded thread pools can 'cause trouble. Therefore, if you use asynchronous file channels, it may be an option to use custom thread pools instead of the default thread pool. The following snippet shows an example custom setting.

The javadoc of AsynchronousFileChannel states that the custom executor should "minimally [...] support an unbounded work queue and should not run tasks on the caller thread of the execute method." That's a risky statement, it is only reasonable if resources aren't an issue, which is rarely the case. It may make sense to use bounded thread pools for asynchronous file channels. You cannot get a too-many-threads issue, also you cannot flood your heap with work queue tasks. In the example above you have five threads that execute asynchonous I/O tasks and the work queue has a capacity of 2500 tasks. If the capacity limit is exceeded the rejected-execution-handler implements the CallerRunsPolicy where the client has to execute the write task synchronously. This can (dramatically) slow down the system performance because the workload is "pushed back" to the client and executed synchronously. However, it can also save you from much more severe issues where the result is unpredictable. It's a good practice to work with bounded thread pools and to keep the thread pool sizes configurable, so that you can adjust them at runtime. Again, to learn more about robust thread pool settings see my other blog entry.

Thread pools with synchronous hand-over queues and unbound maximum thread pool sizes can aggressively create new threads and thus can seriously harm system stability by consuming (pc registers and java stacks) runtime memory of the JVM. The 'longer' (elapsed time) the asynchronous task, the more likely you'll run into this issue.

Thread pools with unbounded work queues and fixed thread pool sizes can aggressively create new tasks and objects and thus can seriously harm system stability by consuming heap memory and CPU through excessive garbage collection activity. The larger (in size) and longer (in elapsed time) the asynchronous task the more likely you'll run into this issue.

That's all in terms of applying custom thread pools to asynchronous file channels. My next blog in this series will explain how to close asynchronous channels safely without loosing data.

The NIO.2 file channels series:
- Introduction
- Applying custom thread pools
- Closing file channels without loosing data
- I/O operations are not atomic

Java 7: NIO.2 File Channels on the test bench - Part 1 - Introduction

2012-04-05T09:40:00.009+02:00

Another blog post about new JDK 7 features. This time I am writing about the new AnsynchronousFileChannel class. I am analyzing the new JDK 7 features in depth for a couple of weeks now and I have decided to number my posts consecutively. Just to make sure I don't get confused :-) Here is my 7th post about Java 7 (I admit that - by coincidence - this was also a little confusing). Using NIO.2 asynchronous file channels effectively is a wide topic. There are some things to consider here. I have decided to devide the stuff into four posts. In this first part I will introduce the involved concepts when you use asynchonous file channels. Since these file channels work asynchronously, it is interesting to look at their performance compared to conventional I/O. The second part deals with issues like memory and CPU consumption and explains how to use the new NIO.2 channels safely in a high performance scenario. You also need to understand how to close asynchronous channels without loosing data, that's part three. Finally, in part four, we'll take a look into concurrency.

Notice: I won't explain the complete API of asynchronous file channels. There are enough posts out there that do a good job on that. My posts dive more into practical applicability and issues you may have when using asynchronous file channels.

OK, enough vague talking, let's get started. Here is a code snippet that opens an asynchronous channel (line 7), writes a sequence of bytes to the beginning of the file (line 9) and waits for the result to return (line 10). Finally, in line 14 the channel is closed.

Important participants in asynchonous file channel calls

Before I continue to dive into the code, let's introduce quickly the involved concepts in the asynchronous (file) channel galaxy. The callgraph in figure 1 shows the sequence diagram in a call to the open()-method of the AsynchronousFileChannel class. A FileSystemProvider encapsulates all the operating systems specifics. To amuse everybody I am using a Windows 7 client when I am writing this. Therefeore a WindowsFileSystemProvider calls the WindowsChannelFactory which actually creates the file and calls the WindowsAsynchronousFileChannelImpl which returns an instance of itself. The most important concept is the Iocp, the I/O completion port. It is an API for performing multiple simultaneous asynchronous input/output operations. A completion port object is created and associated with a number of file handles. When I/O services are requested on the object, completion is indicated by a message queued to the I/O completion port. Other processes requesting I/O services are not notified of completion of the I/O services, but instead check the I/O completion port's message queue to determine the status of its I/O requests. The I/O completion port manages multiple threads and their concurrency. Is you can see from the diagram the Iocp is a subtype of AsynchronousChannelGroup. So in JDK 7 asynchronous channels the asynchronous channel group is implemented as an I/O completion port. It owns the ThreadPool responsible for performing the requested asynchronous I/O operation. The ThreadPool actually encapsulates a ThreadPoolExecutor that does all the multi-threaded asynchronous task execution management since Java 1.5. Write operations to asnchronous file channels result in calls to the ThreadPoolExecutor.execute() method.

Figure 1: Callgraph on open call to asynchronous file channel

Some benchmarks

It's always interesting to look at the performance. Asynchronous non blocking I/O must be fast, right? To find an answer to that question I have made my benchmark analysis. Again, I am using Heinz' tiny benchmarking framework to do that. My machine is an Intel Core i5-2310 CPU @ 2.90 GHz with four cores (64-bit). In a benchmark I need a baseline. My baseline is a simple conventional synchronous write operation into an ordinary file. Here is the snippet:

As you can see in line 25, the benchmark performs a single write operation into an ordinary file. And these are the results:

Test: Performance_Benchmark_ConventionalFileAccessExample_1
Warming up ...
EPSILON:20:TESTTIME:1000:ACTTIME:1014:LOOPS:365947
EPSILON:20:TESTTIME:1000:ACTTIME:1014:LOOPS:372298
Starting test intervall ...
EPSILON:20:TESTTIME:1000:ACTTIME:1000:LOOPS:364706
EPSILON:20:TESTTIME:1000:ACTTIME:1014:LOOPS:368309
EPSILON:20:TESTTIME:1000:ACTTIME:1014:LOOPS:370288
EPSILON:20:TESTTIME:1000:ACTTIME:1001:LOOPS:364908
EPSILON:20:TESTTIME:1000:ACTTIME:1014:LOOPS:370820
Mean: 367.806,2
Std. Deviation: 2.588,665
Total started thread count: 12
Peak thread count: 6
Deamon thread count: 4
Thread count: 5

The following snippet is another benchmark which also issues a write operation (line 25), this time to an asynchronous file channel:

This is the result of the above benchmark on my machine:

Test: Performance_Benchmark_AsynchronousFileChannel_1
Warming up ...
EPSILON:20:TESTTIME:1000:ACTTIME:1015:LOOPS:42667
EPSILON:20:TESTTIME:1000:ACTTIME:1015:LOOPS:193351
Starting test intervall ...
EPSILON:20:TESTTIME:1000:ACTTIME:1015:LOOPS:191268
EPSILON:20:TESTTIME:1000:ACTTIME:1015:LOOPS:186916
EPSILON:20:TESTTIME:1000:ACTTIME:1014:LOOPS:189842
EPSILON:20:TESTTIME:1000:ACTTIME:1014:LOOPS:191103
EPSILON:20:TESTTIME:1000:ACTTIME:1015:LOOPS:192005
Mean: 190.226,8
Std. Deviation: 1.795,733
Total started thread count: 17
Peak thread count: 11
Deamon thread count: 9
Thread count: 10

Since the snippets above do the same thing, it's safe to say that asynchronous files channels aren't necessarily faster then conventional I/O. That's an interesting result I think. It's difficult to compare conventional I/O and NIO.2 to each other in a single threaded benchmark. NIO.2 was introduced to provide an I/O technique in highly concurrent scenarios. Therefore asking what's faster - NIO or conventional I/O - isn't quite the right question. The appropriate question was: what is "more concurrent"? However, for now, the results above suggest:

Consider using conventional I/O when only one thread is issueing I/O-operations.

That's enough for now. I have explained the basic concepts and also pointed out that conventional I/O still has its right to exist. In the second post I will introduce some of the issues you may encounter when you use default asynchronous file channels. I will also show how to avoid those issues by applying some more viable settings.

The NIO.2 file channels series:
- Introduction
- Applying custom thread pools
- Closing file channels without loosing data
- I/O operations are not atomic

Java 7: A complete invokedynamic example

2012-02-02T15:40:00.000+01:00

Another blog entry in my current Java 7 series. This time it's dealing with invokedynamic, a new bytecode instruction on the JVM for method invocation. The invokedynamic instruction allows dynamic linkage between a call site and the receiver of the call. That means you can link the class that is performing a method call to the class (and method) that is receiving the call at run-time. All the other JVM bytecode instructions for method invocation, like invokevirtual, hard-wire the target type information into your compilation, i.e. into your class file. Let's look at an example.

The bytecode snippet above shows an invokevirtual method call of java.lang.String -> length() in line 20. It refers to item 65 in the contsant pool table which is a MethodRef entry (see line 6). Items 42 and 66 in the constant pool table refer to the class and the method descriptor entries. As you can see, the target type and method of the invokevirtual call is completely resolved and hard-wired into the bytecode. Now, let's return to invokedynamic!

It is important to notice that it is not possible to compile Java code into bytecode that contains an invokedynamic instruction. Java is statically typed. That means that Java performs type checking at compile time. Therefore, in Java, it is possible (and wanted!) to hard-wire all type information of method call receivers into the callers class file. The caller knows the type name of the call target, as demonstrated in our example above. The use of invokedynamic - on the other hand - enables the JVM to resolve exactly that type information at run-time. This is only required (and wanted!) for dynamic languages, such as JRuby or Rhino.

Now, suppose you want to implement a new language on the JVM that is dynamically typed. I am not suggesting you should invent *another* language on the JVM, but *suppose* you would, and *suppose* your new language should be dynamically typed. That would mean, in your new language, the linking between a caller and a receiver of a method call is performed at run-time. Since Java 7 this is possible on the bytecode level using the invokedynamic instruction.

Because I cannot create an invokedynamic instruction using a Java compiler, I will create a class file that contains invokedynamic myself. Once this class file is created I will run that class file's main method using an ordinary java launcher. How can you create a class file without a compiler? This is possible by using bytecode manipulation frameworks like ASM or Javassist.

The following code snippet shows the SimpleDynamicInvokerGenerator that can generate a class file SimpleDynamicInvoker.class which contains an invokedynamic instruction.

I am using ASM here, an all purpose Java bytecode manipulation and analysis framework, to do the job of creating a correct class file format. In line 30 the visitInvokeDynamicInsn creates the invokedynamic instruction. Generating a class that does an invokedynamic call is only half of the story. You also need some code that links the dynamic call site to the actual target, this is the real purpose of invokedynamic. Here is an example.

The bootstrap method in line 9-14 selects the actual target of the dynamic call. In our case the target is the sayHello() method. To learn how the bootstrap method is linked to the invokedynamic instruction we need to dive into the bytecode of SimpleDynamicInvoker that we've generated with SimpleDynamicInvokerGenerator.

In line 49 you can see the invokedynamic instruction. The logical name of the dynamic method is runCalculation, this is a fictitious name. You can use any name that makes sense, also names like "+" are allowed. The instruction refers to item 20 in the contant pool table (see line 33). This in turn refers to index 0 in the BootstrapMethods attribute (see line 8). There you can see the link to the SimpleDynamicLinkageExample.bootstrapDynamic method that links the invokedynamic instruction to the call target.

Now if you call the SimpleDynamicInvoker using the java launcher, then the invokedynamic call is executed.

The following sequence diagram illustrates what's happening when the SimpleDynamicInvoker is called using the java launcher.

The first call of runCalculation using invokedynamic issues a call to the bootstrapDynamic method. This method does the dynamic linkage between the calling class (SimpleDynamicInvoker) and the receiving class (SimpleDynamicLinkageExample). The bootstrap method returns a MethodHandle that targets the receiving class. This method handle is cached for repetitive invocations of the runCalculation method.

That's all in terms of invokedynamic. I have some more sophisticated examples published here in my Git repo. I hope you've enjoyed reading this - in times of shortage!

Cheers, Niklas

References:

http://docs.oracle.com/javase/7/docs/technotes/guides/vm/multiple-language-support.html
http://asm.ow2.org/
http://java.sun.com/developer/technicalArticles/DynTypeLang/
http://asm.ow2.org/doc/tutorial-asm-2.0.html
http://weblogs.java.net/blog/forax/archive/2011/01/07/calling-invokedynamic-java
http://nerds-central.blogspot.com/2011/05/performing-dynamicinvoke-from-java-step.html

Java 7: How to write really fast Java code

2012-01-17T17:11:00.218+01:00

When I first wrote this blog my intention was to introduce you to a class ThreadLocalRandom which is new in Java 7 to generate random numbers. I have analyzed the performance of ThreadLocalRandom in a series of micro-benchmarks to find out how it performs in a single threaded environment. The results were relatively surprising: although the code is very similar, ThreadLocalRandom is twice as fast as Math.random()! The results drew my interest and I decided to investigate this a little further. I have documented my anlysis process. It is an examplary introduction into analysis steps, technologies and some of the JVM diagnostic tools required to understand differences in the performance of small code segments. Some experience with the described toolset and technologies will enable you to write faster Java code for your specific Hotspot target environment.

OK, that's enough talk, let's get started! My machine is an ordinary Intel x86, Family 6, 3 GHz, 32-bit, dual core running Windows XP Professional.

Math.random() works on a static singleton instance of Random whilst ThreadLocalRandom -> current() -> nextDouble() works on a thread local instance of ThreadLocalRandom which is a subclass of Random. ThreadLocal introduces the overhead of variable look up on each call to the current()-method. Considering what I've just said, then it's really a little surprising that it's twice as fast as Math.random() in a single thread, isn't it? I didn't expect such a significant difference.

Again, I am using a tiny micro-benchmarking framework presented in one of Heinz blogs. The framework that Heinz developed takes care of several challenges in benchmarking Java programs on modern JVMs. These challenges include: warm-up, garbage collection, accuracy of Javas time API, verification of test accuracy and so forth.

Here are my runnable benchmark classes:

public class ThreadLocalRandomGenerator implements BenchmarkRunnable {

 private double r;
 
 @Override
 public void run() {
  r = r + ThreadLocalRandom.current().nextDouble();
 }

 public double getR() {
  return r;
 }

 @Override
 public Object getResult() {
  return r;
 }
  
}

public class MathRandomGenerator implements BenchmarkRunnable {

 private double r;

 @Override
 public void run() {
  r = r + Math.random();
 }

 public double getR() {
  return r;
 }

 @Override
 public Object getResult() {
  return r;
 }
}

Let's run the benchmark using Heinz' framework:

Notice: To make sure the JVM does not identify the code as "dead code" I return a field variable and print out the result of my benchmarking immediately. That's why my runnable classes implement an interface called RunnableBenchmark. I am running this benchmark three times. The first run is in default mode, with inlining and JIT optimization enabled:

Benchmark target: MathRandomGenerator
Mean execution count: 14773594,4
Standard deviation: 180484,9
To avoid dead code coptimization: 6.4005410634212025E7
Benchmark target: ThreadLocalRandomGenerator
Mean execution count: 29861911,6
Standard deviation: 723934,46
To avoid dead code coptimization: 1.0155096190946539E8

Then again without JIT optimization (VM option -Xint):

Benchmark target: MathRandomGenerator
Mean execution count: 963226,2
Standard deviation: 5009,28
To avoid dead code coptimization: 3296912.509302683
Benchmark target: ThreadLocalRandomGenerator
Mean execution count: 1093147,4
Standard deviation: 491,15
To avoid dead code coptimization: 3811259.7334526842

The last test is with JIT optimization, but with -XX:MaxInlineSize=0 which (almost) disables inlining:

Benchmark target: MathRandomGenerator
Mean execution count: 13789245
Standard deviation: 200390,59
To avoid dead code coptimization: 4.802723374491231E7
Benchmark target: ThreadLocalRandomGenerator
Mean execution count: 24009159,8
Standard deviation: 149222,7
To avoid dead code coptimization: 8.378231170741305E7

Let's interpret the results carefully: With full JVM JIT optimization the ThreadLocalRanom is twice as fast as Math.random(). Turning JIT optimization off shows that the two perform equally good (bad) then. Method inlining seems to make 30% of the performance difference. The other differences may be due to other otimization techniques.

One reason why the JIT compiler can tune ThreadLocalRandom more effectively is the improved implementation of ThreadLocalRandom.next().

public class Random implements java.io.Serializable {
...
    protected int next(int bits) {
        long oldseed, nextseed;
        AtomicLong seed = this.seed;
        do {
            oldseed = seed.get();
            nextseed = (oldseed * multiplier + addend) & mask;
        } while (!seed.compareAndSet(oldseed, nextseed));
        return (int)(nextseed >>> (48 - bits));
    }
...
}

public class ThreadLocalRandom extends Random {
...
    protected int next(int bits) {
        rnd = (rnd * multiplier + addend) & mask;
        return (int) (rnd >>> (48-bits));
    }
...
}

The first snippet shows Random.next() which is used intensively in the benchmark of Math.random(). Compared to ThreadLocalRandom.next() the method requires significantly more instructions, although both methods do the same thing. In the Random class the seed variable stores a global shared state to all threads, it changes with every call to the next()-method. Therefore AtomicLong is required to safely access and change the seed value in calls to nextDouble(). ThreadLocalRandom on the other hand is - well - thread local :-) The next()-method does not have to be thread safe and can use an ordinary long variable as seed value.

About method inlining and ThreadLocalRandom

One very effective JIT optimization is method inlining. In hot paths executed frequently the hotspot compiler decides to inline the code of called methods (child method) into the callers method (parent method). "Inlining has important benefits. It dramatically reduces the dynamic frequency of method invocations, which saves the time needed to perform those method invocations. But even more importantly, inlining produces much larger blocks of code for the optimizer to work on. This creates a situation that significantly increases the effectiveness of traditional compiler optimizations, overcoming a major obstacle to increased Java programming language performance."

Since OpenJDK 7 debug builds you can monitor method inlining by using diagnostic JVM options. Running the code with '-XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining' will show the inlining efforts of the JIT compiler. Here are the relevant sections of the output for Math.random() benchmark:

@ 13   java.util.Random::nextDouble (24 bytes)
  @ 3   java.util.Random::next (47 bytes)   callee is too large
  @ 13   java.util.Random::next (47 bytes)   callee is too large

The JIT compiler cannot inline the Random.next() method that is called in Random.nextDouble(). This is the inlining output of ThreaLocalRandom.next():

@ 8   java.util.Random::nextDouble (24 bytes)
  @ 3   java.util.concurrent.ThreadLocalRandom::next (31 bytes)
  @ 13   java.util.concurrent.ThreadLocalRandom::next (31 bytes)

Due to the fact that the next()-method is shorter (31 bytes) it can be inlined. Because the next()-method is called intensively in both benchmarks this log suggests that method inlining may be one reason why ThreadLocalRandom performs significantly faster.

To verify that and to find out more it is required to deep dive into assembly code. With Java 7 JDKs it is possible to print out assembly code into the console. See here on how to enable -XX:+PrintAssembly VM Option. The option will print out the JIT optimized code, that means you can see the code the JVM actually executes. I have copied the relevant assembly code into the links below.

Assembly code of ThreadLocalRandomGenerator.run() here.
Assembly code of MathRandomGenerator.run() here.
Assembly code of Random.next() called by Math.random() here.

Assembly code is machine-specific and low level code, it's more complicated to read then bytecode. Let's try to verify that method inlining has a relevant effect on performance in my benchmarks and: are there other obvious differences how the JIT compiler treats ThreadLocalRandom and Math.random()? In ThreadLocalRandomGenerator.run() there is no procedure call to any of the subroutines like Random.nextDouble() or ThreatLocalRandom.next(). There is only one virtual (hence expensive) method call to ThreadLocal.get() visible (see line 35 in ThreadLocalRandomGenerator.run() assembly). All the other code is inlined into ThreadLocalRandomGenerator.run(). In the case of MathRandomGenerator.run() there are two virtual method calls to Random.next() (see block B4 line 204 ff. in the assembly code of MathRandomGenerator.run()). This fact confirms our suspicion that method inlining is one important root cause for the performance difference. Further more, due to synchronization hassle, there are considerably more (and some expensive!) assembly instructions required in Random.next() which is also counterproductive in terms of execution speed.

Understanding the overhead of the invokevirtual instruction

So why is (virtual) method invocation expensive and method inlining so effective? The pointer of invokevirtual instructions is not an offset of a concrete method in a class instance. The compiler does not know the internal layout of a class instance. Instead, it generates symbolic references to the methods of an instance, which are stored in the runtime constant pool. Those runtime constant pool items are resolved at run time to determine the actual method location. This dynamic (run-time) binding requires verification, preparation and resolution which can considerably effect performance. (see Invoking Methods and Linking in the JVM Spec for details)

That's all for now. The disclaimer: Of course, the list of topics you need to understand to solve performance riddles is endless. There is a lot more to understand then micro-benchmarking, JIT optimization, method inlining, java byte code, assemby language and so forth. Also, there are lot more root causes for performance differences then just virtual method calls or expensive thread synchronization instructions. However, I think the topics I have introduced are a good start into such deep diving stuff. Looking forward to critical and enjoyable comments!

Cheers,
Niklas

Java 7: Understanding the Phaser

2011-12-28T19:13:00.009+01:00

Java 7 introduces a flexible thread synchronization mechanism called Phaser. If you need to wait for threads to arrive before you can continue or start another set of tasks, then Phaser is a good choice. Here is the listing, everything is explained step-by-step.

import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import java.util.concurrent.Phaser;

public class PhaserExample {

 public static void main(String[] args) throws InterruptedException {

  List tasks = new ArrayList<>();

  for (int i = 0; i < 2; i++) {
   Runnable runnable = new Runnable() {
    @Override
    public void run() {
     int a = 0, b = 1;
     for (int i = 0; i < 2000000000; i++) {
      a = a + b;
      b = a - b;
     }
    }
   };

   tasks.add(runnable);

  }

  new PhaserExample().runTasks(tasks);

 }

 void runTasks(List tasks) throws InterruptedException {

  final Phaser phaser = new Phaser(1) {
   protected boolean onAdvance(int phase, int registeredParties) {
    return phase >= 1 || registeredParties == 0;
   }
  };

  for (final Runnable task : tasks) {
   phaser.register();
   new Thread() {
    public void run() {
     do {
      phaser.arriveAndAwaitAdvance();
      task.run();
     } while (!phaser.isTerminated());
    }
   }.start();
   Thread.sleep(500);
  }

  phaser.arriveAndDeregister();

 }

}

This example allows to learn a lot about the internals of a Phaser. Let's go through the code:

Line 8-26: The main-Method that creates two Runnable tasks
Line 28: Task list is passed to the runTasks-Method

The runTasks-Method actually uses a Phaser to synchronize the tasks in a way that each task in the list needs to arrive at the barrier before they are executed in parallel. The task list is executed twice. The first cycle is started when both threads arrived at the barrier (see image mark 1). The second cycle is started when both threads arrived at the barrier (see image mark 2).

Notice: "party" is a term in the Phaser context that is equivalent to what we mean by a thread. When one party arrives, then one thread arrived at the synchronization barrier.

Line 34: create a Phaser that has one registered party (this means: at this time phaser expects one thread=party to arrive before it can start the execution cycle)
Line 35: implement the onAdvance-Method to explain that this task list is executed twice (done by: Line 36 says that it returns true if phase is equal or higher then 1)
Line 40: iterate over the list of tasks
Line 41: register this thread with the Phaser. Notice that a Phaser instance does not know the task instances. It's a simple counter of registered, unarrived and arrived parties, shared across participating threads. If two parties are registered then two parties must arrive at the phaser to be able to start the first cycle.
Line 45: tell the thread to wait at the barrier until the arrived parties equal the registered parties
Line 50: Just for demonstration purposes, this line delays execution. The original code snippet prints internal infos about the Phaser state to standard out.
Line 51: two tasks are registered, in total three parties are registered.
Line 53: deregister one party. This results in two registered parties and two arrived parties. This causes the threads waiting (Line 45) to execute the first cycle. (in fact the third party arrived while three were registered - but it does not make a difference)

The original code snippet stored in my Git repository creates the following output:

After phaser init -> Registered: 1 - Unarrived: 1 - Arrived: 0 - Phase: 0
After register -> Registered: 2 - Unarrived: 2 - Arrived: 0 - Phase: 0
After arrival -> Registered: 2 - Unarrived: 1 - Arrived: 1 - Phase: 0
After register -> Registered: 3 - Unarrived: 2 - Arrived: 1 - Phase: 0
After arrival -> Registered: 3 - Unarrived: 1 - Arrived: 2 - Phase: 0
Before main thread arrives and deregisters -> Registered: 3 - Unarrived: 1 - Arrived: 2 - Phase: 0
On advance -> Registered: 2 - Unarrived: 0 - Arrived: 2 - Phase: 0
After main thread arrived and deregistered -> Registered: 2 - Unarrived: 2 - Arrived: 0 - Phase: 1
Main thread will terminate ...
Thread-0:go  :Wed Dec 28 16:09:16 CET 2011
Thread-1:go  :Wed Dec 28 16:09:16 CET 2011
Thread-0:done:Wed Dec 28 16:09:20 CET 2011
Thread-1:done:Wed Dec 28 16:09:20 CET 2011
On advance -> Registered: 2 - Unarrived: 0 - Arrived: 2 - Phase: 1
Thread-0:go  :Wed Dec 28 16:09:20 CET 2011
Thread-1:go  :Wed Dec 28 16:09:20 CET 2011
Thread-1:done:Wed Dec 28 16:09:23 CET 2011
Thread-0:done:Wed Dec 28 16:09:23 CET 2011

Line 1: when the Phaser is initialized in line 34 of the code snippet then one party is registered and none arrived
Line 2: after the first thread is registered in Line 41 in the code example there are two registered parties and two unarrived parties. Since no thread reached the barrier yet, no party is arrived.
Line 3: the first thread arrives and waits at the barrier (line 45 in the code snippet)
Line 4: register the second thread, three registered, two unarrived, one arrived
Line 5: the second thread arrived at the barrier, hence two arrived now
Line 7: one party is deregistered in the code line 53 of the code example, therefore onAdvance-Method is called and returns false. This starts the first cycle since registered parties equals arrived parties (i.e. two). Phase 1 is started -> cycle one (see image mark 1)
Line 8: since all threads are notified and start their work, two parties are unarrived again, non arrived
Line 14: After the threads executed their tasks once they arrive again (code line 45) the onAdvance-Method is called, now the 2nd cycle is executed

OK, go through it and look into my comments in the original code snippet to learn more.

Java 7: Fork and join decomposable input pattern

2011-12-15T18:02:00.003+01:00

In my recent blog I have introduced the fork and join framework of Java 7. This blog presents a little framework on top of raw fork and join. The framework implements the decomposable input pattern (dip) - which originated from my own laziness when I was using the framework a couple of times. I have realized that I was writing the same code every time when I was implementing a slightly different use case. And you know, let's write a little peace of software that I can reuse. The decomposable input pattern framwork was born.

You can download the binary here.
The API-documentation is hosted here.
And the sources are also available here.

Now what's different when you use that framework? I'd say the difference is that the dip-framework follows good OO design principles, like the open-closed-principle that says: "A module should be open for extension but closed for modification." In other words I have seperated concerns in a fork and join scenario to make the whole more flexible and easy to change.

In my last blog I presented a code snippet that illustrated how to use plain fork and join to calculate offers of car insurances. Let's see how this can be done using my dip-framwork.

The input to the proposal calculation is - well - a list of proposals :-) In the dip framework you wrap the input of a ForkJoinTask into a subclass of DecomposableInput. The name originates from the fact that input to ForkJoinTask is decomposable. Here is the snippet:

The class wraps the raw input to ForkJoinTask and provides a method how that input can be decomposed. Also, it provides a method computeDirectly() that can decide on whether this input needs further decomposition to be small enough for direct computation.

The output of proposal calculation is a list of maps of prices. If you have four input proposals, you'll get a list of four maps with various prices. In the dip framework, you wrap the output into a subclass of ComposableResult.

The class implements the compose method that can compose an atomic result of a computation into the existing raw result. It returns a ComposableResult instance that holds the new composition.

I agree it's a little abtsract. Not only that concurrency is inherently complex. I am also putting another abstraction onto it. But once you've used the framework you'll realize the strength. So stay tuned, we're almost finnished :-)

Now, you have an input and an ouptut and the last thing you need is a computation object. In my example that's the pricing engine. To connect the pricing engine to the dip framework, you'll need to implement a subclass of ComputationActivityBridge.

The PricingEngineBridge implements the compute method that calls the pricing engine. It translates the DecomposableInput into an input that the pricing engine accepts. And it creates an instance of ComposableResult that contains the output of the pricing engine.

Last thing to do is to get the stuff started.

The example creates an instance of GenericRecursiveTask and passes the ListOfProposals as well as the PricingEngineBrige as input. If you pass that to the ForkJoinPool then you receive an instance of ListOfPrices as ouput.

What's the advantage when you use the dip-framework? For instance:

- you could pass arbitrary processing input to GenericRecursiveTask by implementing a subclass of DecomposableInput
- you could implement your own custom RecursiveTask the same way I have implemented GenericRecursiveTask and pass the proposals and the PricingEngineBridge to that task
- you could implement a custom ForkAndJoinProcessor and use that by subclassing GenericRecursiveTask: that way you can control the creation of subtask and their distribution across threads
- you could exchange the processing activity (here: PricingEngineBridge) by implementing a custom ComputationActivityBridge and try alternative pricing engines or make something completely different then calculating prices ...

I think I have made my point: the whole is closed for modification, but open for extention now.

The complete example code is here in my git repository.

Let me know if you like it. Looking forward to critical and enjoyable comments.

Cheers, Niklas

Java 7: Fork and join and the jam jar

2011-12-09T16:01:00.010+01:00

Another Java 7 blog, this time it's about the new concurrency utilities. It's about plain fork and join in particular. Everything is explained with a straight code example. Compared to Project Coin concurrency is an inherently complex topic. The code example is a little more complex. Let's get started.

The Fork and Join Executor Service

Fork and join employs an efficient task scheduling algorithm that ensures optimized resource usage (memory and cpu) on multi core machines. That algorithm is known as "work stealing". Idle threads in a fork join pool attempt to find and execute subtasks created by other active threads. This is very efficient 'cause larger units get divided into smaller units of work that get distributed accross all active threads (and CPU's). Here is an analogy to explain the strength of fork and join algorithms: if you have a jam jar and you fill it with ping-pong balls, there is a lot air left in the glass. Think of the air as unused CPU resource. If you fill your jam jar with peas (or sand) there is less air in the glass. Fork and join is like filling the jam jar with peas. There is also more volume in your glass using peas, 'cause there is less air (less waste). Fork and join algorithms always ensure an optimal (smaller) number of active threads then work sharing algorithms. This is for the same "peas reason". Think of the jam jar being your thread pool and the peas are your tasks. With fork and join you can host more tasks (and complete volume) with the same amount of threads (in the same jam jar).

Image 1: Fork and join in the jam jar

Here is a plain fork and join code example:

Fork and join tasks always have a similar typical fork and join control flow. In my example I do want to calculate the prices for a list of car insurance offers. Let's go through the example.

Line 10: Fork and join tasks extend RecursiveTask or RecursiveAction. Tasks do return a result, actions doesn't. RecursiveTasks allow to specify the return type using generics. The result of my example is a List of Maps which contain the prices for the car insurance covers. One map of prices for each proposal.
Line 12: The task will calculate prices for proposals.
Line 22: Fork and join tasks implement the compute method. Again, the compute method returns a list of maps that contain prices. If there are four proposals in the input list, then there will be four maps of prices.
Line 24-26: Is the task stack (list of proposals) small enough to compute directly? If yes, then compute in this thread, which means call the pricing engine to calculate the proposal. If no, continue: split the work and call task recursively.
Line 31: Determine where to split the list.
Line 33: Create a new task for the first part of the split list.
Line 34: Fork that task: allow some other thread to perform that smaller subtask. That thread will call compute recursively on that subtask instance.
Line 35: Create a new task for the second part of the split list.
Line 36: Prepare the composed result list of the two devided subtask (you need to compose the results of the two subtwasks into a single result of the parent task)
Line 37: Compute the second subtask in this current thread and add the result to the result list.
Line 38: In the meantime the first subtask f1 was computed by some other thread. Join the result of the first subtask into the composed result list.
Line 39: Return the composed result.

You need to start the fork and join task.

Line 49: Create the main fork and join task with the initial list of proposals.
Line 53: Create a fork and join thread pool.
Line 55: Submit the main task to the fork and join pool.

That's it. You can look into the complete code here. You'll need the PricingEngine.java and the Proposal.java.

Java 7: Project Coin in code examples

2011-12-02T08:19:00.000+01:00

This blog introduces - by code examples - some new Java 7 features summarized under the term Project Coin. The goal of Project Coin is to add a set of small language changes to JDK 7. These changes do simplify the Java language syntax. Less typing, cleaner code, happy developer ;-) Let's look into that.

Prerequisites

Install Java 7 SDK on your machine
Install Eclipse Indigo 3.7.1

You need to look out for the correct bundles for your operating system.

In your Eclipse workspace you need to define the installed Java 7 JDK in your runtime. In the Workbench go to Window > Preferences > Java > Installed JREs and add your Java 7 home directory.

Next you need to set the compiler level to 1.7 in Java > Compiler.

Project Coin

Improved literals

A literal is the source code representation of a fixed value.

"In Java SE 7 and later, any number of underscore characters (_) can appear anywhere between digits in a numerical literal. This feature enables you to separate groups of digits in numeric literals, which can improve the readability of your code." (from the Java Tutorials)

public class LiteralsExample {

 public static void main(String[] args) {
  System.out.println("With underscores: ");
  
  long creditCardNumber = 1234_5678_9012_3456L;
  long bytes = 0b11010010_01101001_10010100_10010010;
  
  System.out.println(creditCardNumber);
  System.out.println(bytes);
  
  System.out.println("Without underscores: ");
  
  creditCardNumber = 1234567890123456L;
  bytes = 0b11010010011010011001010010010010;
  
  System.out.println(creditCardNumber);
  System.out.println(bytes);
  
 }
}

Notice the underscores in the literals (e.g. 1234_5678_9012_3456L). Results written to the console:

With underscores: 
1234567890123456
-764832622
Without underscores: 
1234567890123456
-764832622

As you can see, the underscores do not make a difference to the values. They are just used to make the code more readible.

SafeVarargs

Pre-JDK 7, you always got an unchecked warning when calling certain varargs library methods. Without the new @SafeVarargs annotation this example would create unchecked warnings.

public class SafeVarargsExample {

 @SafeVarargs
 static void m(List... stringLists) {
  Object[] array = stringLists;
  List tmpList = Arrays.asList(42);
  array[0] = tmpList; // compiles without warnings
  String s = stringLists[0].get(0); // ClassCastException at runtime
 }

 public static void main(String[] args) {
  m(new ArrayList());
 }
 
}

The new annotation in line 3 does not help to get around the annoying ClassCastException at runtime. Also, it can only be applied to static and final methods. Therefore, I believe it will not be a great help. Future versions of Java will have compile time errors for unsafe code like the one in the example above.

Diamond

In Java 6 it required some patience to create, say, list of maps. Look at this example:

As you can see in the right part of the assignment in lines 3 and 4 you need to repeat your type information for the listOfMaps variable as well as of the aMap variable. This isn't necessary anymore in Java 7:

Multicatch

In Java 7 you do not need a catch clause for every single exception, you can catch multiple exceptions in one clause. You remember code like this:

public class HandleExceptionsJava6Example {
 public static void main(String[] args) {
  Class string;
  try {
   string = Class.forName("java.lang.String");
   string.getMethod("length").invoke("test");
  } catch (ClassNotFoundException e) {
   // do something
  } catch (IllegalAccessException e) {
   // do the same !!
  } catch (IllegalArgumentException e) {
   // do the same !!
  } catch (InvocationTargetException e) {
   // yeah, well, again: do the same!
  } catch (NoSuchMethodException e) {
   // ...
  } catch (SecurityException e) {
   // ...
  }
 }
}

Since Java 7 you can write it like this, which makes our lives a lot easier:

public class HandleExceptionsJava7ExampleMultiCatch {
 public static void main(String[] args) {
  try {
   Class string = Class.forName("java.lang.String");
   string.getMethod("length").invoke("test");
  } catch (ClassNotFoundException | IllegalAccessException | IllegalArgumentException | InvocationTargetException | NoSuchMethodException | SecurityException e) {
   // do something, and only write it once!!!
  }
 }
}

String in switch statements

Since Java 7 one can use string variables in switch clauses. Here is an example:

public class StringInSwitch {
 public void printMonth(String month) {
  switch (month) {
  case "April":
  case "June":
  case "September":
  case "November":
  case "January":
  case "March":
  case "May":
  case "July":
  case "August":
  case "December":
  default:
   System.out.println("done!");
  }
 }
}

Try-with-resource

This feature really helps in terms of reducing unexpected runtime execptions. In Java 7 you can use the so called try-with-resource clause that automatically closes all open resources if an exception occurs. Look at the example:

import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;

public class TryWithResourceExample {

 public static void main(String[] args) throws FileNotFoundException {
  
  // Java 7 try-with-resource
  
  String file1 = "TryWithResourceFile.out";
  try (OutputStream out = new FileOutputStream(file1)) {
   out.write("Some silly file content ...".getBytes());
   ":-p".charAt(3);
  } catch (StringIndexOutOfBoundsException | IOException e) {
   System.out.println("Exception on operating file " + file1 + ": " + e.getMessage());
  }
  
  // Java 6 style
  
  String file2 = "WithoutTryWithResource.out";
  OutputStream out = new FileOutputStream(file2);
  try {
   out.write("Some silly file content ...".getBytes());
   ":-p".charAt(3);
  } catch (StringIndexOutOfBoundsException | IOException e) {
   System.out.println("Exception on operating file " + file2 + ": " + e.getMessage());
  }

  // Let's try to operate on the resources
  
  File f1 = new File(file1);
  if (f1.delete())
   System.out.println("Successfully deleted: " + file1);
  else
   System.out.println("Problems deleting: " + file1);

  File f2 = new File(file2);
  if (f2.delete())
   System.out.println("Successfully deleted: " + file2);
  else
   System.out.println("Problems deleting: " + file2);
  
 }
}

In line 14 the try-with-resource clause is used to open a file that we want to operate on. Then line 16 generates a runtime exception. Notice that I do not explicitly close the resource. This is done automatically when you use try-with-resource. It *isn't* when you use the Java 6 equivalent shown in lines 21-30.

The code will write the following result to the console:

Exception on operating file TryWithResourceFile.out: String index out of range: 3
Exception on operating file WithoutTryWithResource.out: String index out of range: 3
Successfully deleted: TryWithResourceFile.out
Problems deleting: WithoutTryWithResource.out

That's it in terms of Project Coin. Very useful stuff in my eyes.

Java EE 6 and the snowball effect

2011-11-13T17:37:00.002+01:00

Java EE application servers increase their feature sets (APIs and administration features) whilst business applications get smaller and smaller. This introduces a new issue: if you need a single feature of a new application server version you'll get a complete package of features that you didn't need in the first place (the snowball effect). Let me give you an example: in WebSphere 7 IBM provides a high speed integration adapter for IMS assets. We need that, but we don't need all the rest that gives us a headache in terms of migration efforts. Now, if the amount of APIs increase in Java EE with every version, I predict that this problem starts to get more and more complicated. That's a reason why I don't appreciate the fact that Java EE standardizes former framework functionality like dependency injection (CDI). Business applications may get smaller, but application server feature sets get huge this way. Is that a good trend?

I typically try to keep application bundles small in my business applications. Not in size necessarily but in terms of packaged features. This way I don't create artificial dependencies. You need to deploy a new web service version? Then I can only deploy that specific web service module without creating a snowball effect of additional required deployments. Small packages reduce the need of branch development. Small packages reduce the risk of instability. Small packages reduce the necessity to synchronize deployments of different development teams. If you finnish your work or you need to deploy a new version of your application then you should create as little dependency as possible. In an ideal scenario you don't need to talk to any other developer if you want to deploy the features your responsible for. I think this should also apply to application server software. The trend to shrink business application feature sets and increase those of application servers should stop. Or: administration features of application servers should be seperated and compatible to the different Java EE standards. A more moduralized approach would be desireable.

Characteristics of successful developers

2011-11-12T12:12:00.072+01:00

Many blogs exist about personal (soft) characteristics of successful developers. Here is a short listing of some interesting links:

50 characteristics of a great software developer
Top 10 Traits of a Rockstar Software Engineer
Five essential skills for software developers
Manifesto for Agile Software Development
Manifesto for Software Craftsmanship

This one blog now is my personal view on that very topic. It's of course subjective to my own history and environment and I don't claim that the list is complete. Also, I do not have the discipline to always have all those characteristics a 100% myself. We're all humans, so don't take them to serious :-) Last not least: success must not be the target of your work. The target is to work on your own virtues, some of those virtues are the topic of this blog.

The will to be good at something

It's not easy to work as a developer! I say that for a couple of reasons that make our lifes a little harder compared to other professions. The fact - for instance - that the technology cycle in the IT world is very short, the actual knowledge becomes outdated in a few years. Therefore, we need to learn continuously - new things get important. To stay on top of things we really need the strong will to be good at our job. That's probably the most important characteristic to me: being an excellent knowledge worker with great technical abillities, and have the will to be that over decades!

To ask one's way

Because it's impossible to know everything to do the job, it's absolutely necessary that a developer finds its way through a new topic. How I typically do that is I use google and I talk to other experts to find out what they think. "I did not know what to do!" is not an argument for me. 'Cause if I didn't know enough about that new technology yet, I spent the energy that's necessary to learn what I need to know to do the job. We need to work through the learning curve and make the last-ditch effort to get good at what we're doing!

To make oneself useful

If I have some time left because I completed my tasks earlier then expected, then: I take a coffee and play tabletop soccer. I take a rest. Afterwards I think about what I could do that helps the team to achieve its targets, 'cause some of my team mates probably didn't finish! (at least if I didn't met them at tabletop soccer) If everyone's finished then I think about improvements to the process or team organisation. I make myself usefull.

To care

Some years ago I attended a software architecture course held by one of my idols Dana Bredemeyer. I had a discussion with him what it really takes to make a team successful or to be a successful team leader. He said: "Well, you need some people that really care!" I think there is a lot truth in that statement. If we do not care about quality, timelines, good team culture, respectful communication (!!), clean code, software-craftsmanship, if all this doen't matter to us, then I believe the probability is higher that we fail.

Being productive

Peter Kruchten put it right in his TAO for the software architect:

"Those who know don't talk.
Those who talk don't know.
Those who do not have a clue are still debating about the process.
Those who know just do it."

I am trying to be productive every week - at the end of a week I look back and I ask myself what I have produced. This could be paperwork, community days or (best!!) programming code.

Working solution-orientied

In many situations where people had trouble to achieve it's targets I saw them debating about all the problems and the difficulties to solve the issue. They blamed each other and discussed about THE PAST. I am trying not to do that, I don't blame others, I don't just look at the difficulties. I am trying to suggest solutions instead! And yes, there is always a solution to a problem. Most of the times there are at least three solutions.

Be good with people

Because our job typically involves to work in a (most wanted: cross-functional!) team, it's important that we're (more or less) good in dealing with other individuals. They have their own strengths and weaknesses, just like ourselves. It's important to treat all the team mates with respect, regardless of their technical competence or contributions. Of course, sometimes people deserve a clear statement, but try to do these things one-on-one. Make sure nobody looses his face. Attend the meetings at the coffe bar, be good at tabletop soccer and go out once in a while to have a beer with your team. You know what I'm talking about.

That's it. I am looking forward to your thoughts and comments!

Threading stories: volatile and synchronized

2011-10-25T15:38:00.005+02:00

In my last blog about the volatile modifier I have introduced a little program that illustrates the behaviour of volatile in a Java 6 (26) Hotspot VM. Since that day I had some interesting discussions that I wanted to share in this blog. It adds another valuable insights on the volatile modifier.

Here is my little program, which I have adopted a little to make it easier. My previous example was originally thought as a thread contention example, which will be the topic in one of my upcoming posts.

import java.util.Timer;
import java.util.TimerTask;

public class AnotherVolatileExampleA {

 private volatile boolean expired = false;
 private long counter = 0;
 private Object mutex = new Object();

 private class Worker implements Runnable {
  @Override
  public void run() {
   synchronized (mutex) {
    final Timer timer = new Timer();
    timer.schedule(new TimerTask() {
     public void run() {
      expired = true;
      System.out.println("Timer interrupted main thread ...");
      timer.cancel();
     }
    }, 1000);
    while (!expired) {
     counter++; // do some work
    }
    System.out.println("Main thread was interrupted by timer ...");
   };
  }
 }

 public static void main(String[] args) throws InterruptedException {
  AnotherVolatileExampleA volatileExample = new AnotherVolatileExampleA();
  Thread thread1 = new Thread(volatileExample.new Worker(), "Worker-1");
  thread1.start();
 }
}

Now, this program still behaves similar like the one of my last blog. With volatile in line 6 the result written to the console is:

Timer interrupted main thread ...
Main thread was interrupted by timer ...

Without volatile in line 6 the result is:

Timer interrupted main thread ...

One question in a discussion was, why that happens although everything takes place in a synchronized block. The Java VM specification says the synchronized keywork garantees that (less formal!) a variable is written into the memory heap and is read from the memory heap (read here). Now, this is true, but it's missing the point that the thread only needs to read the variable ONCE within a single synchronized block. In the example above the expired variable is read once at the very first while loop. Afterwards the thread does not need to read the variable again. Consider this program:

import java.util.Timer;
import java.util.TimerTask;

public class AnotherVolatileExampleB {

 private boolean expired = false;
 private long counter = 0;
 private Object mutex = new Object();

 private class Worker implements Runnable {
  @Override
  public void run() {
   final Timer timer = new Timer();
   timer.schedule(new TimerTask() {
    public void run() {
     expired = true;
     System.out.println("Timer interrupted main thread ...");
     timer.cancel();
    }
   }, 1000);
   boolean tmpExpired = false;
   while (!tmpExpired) {
    synchronized (mutex) {
     tmpExpired = expired;
    }
    counter++; // do some work
   }
   System.out.println("Main thread was interrupted by timer ...");
  }
 }

 public static void main(String[] args) throws InterruptedException {
  AnotherVolatileExampleB volatileExample = new AnotherVolatileExampleB();
  Thread thread1 = new Thread(volatileExample.new Worker(), "Worker-1");
  thread1.start();
 }
}

In that case the synchronized block is within the while loop (lines 23-25) and the thread is now forced to re-read the expired variable from the main memory in each loop 'cause synchronized garantees to read from memory once (same applies to Java 5 locks). The result of that program will be as expected from a synchronized block:

Timer interrupted main thread ...
Main thread was interrupted by timer ...

Therefore, if you wish to read a variable from memory in a synchronized block (or within a Java 5 lock), remember that the thread only garantees to read the variable once from the memory heap. The volatile modifier, on the other hand, always garantees a "memory heap read" (see here).

Threading stories: Why volatile matters

2011-10-19T08:16:00.035+02:00

Many years ago when I learned Java (in 2000) I was not so concerned about multithreading. In particular I wasn't concerned about the volatile modifier. I don't know why, but I never had problems without volatile, so maybe I thought it could not be so relevant. I've suddenly changed my mind when I first analyzed a wierd behaviour of an application that only existed when the application was deployed to a server JVM. Todays JVMs make a lot magic stuff to optimize runtime performance on server applications. In this blog I show you an example to get fimiliar with problems that arrize in multithreaded applications, when you don't recognize the importance of understanding how Java treats shared data in multithreaded programs.

This code snippet demonstrates why understanding volatile is important. Here is the code that you can use to play around. Notice in line 8 the expired variable is declared volatile:

import java.util.Timer;
import java.util.TimerTask;

public class VolatileExample {

 private volatile boolean expired;
 private long counter = 0;
 private Object mutext = new Object();

 @Override
 public Object[] execute(Object... arguments) {
  synchronized (mutext) {
   expired = false;
   final Timer timer = new Timer();
   timer.schedule(new TimerTask() {
    public void run() {
     expired = true;
     System.out.println("Timer interrupted main thread ...");
     timer.cancel();
    }
   }, 1000);
   while (!expired) {
    counter++; // do some work
   }
   System.out.println("Main thread was interrupted by timer ...");
  };
  return new Object[] { counter, expired };
 }

 private class Worker implements Runnable {
  @Override
  public void run() {
   while (!Thread.currentThread().isInterrupted()) {
    execute();
   }
  }
 }

 @SuppressWarnings("static-access")
 public static void main(String[] args) throws InterruptedException {
  VolatileExample volatileExample = new VolatileExample();
  Thread thread1 = new Thread(volatileExample.new Worker(), "Worker-1");
  Thread thread2 = new Thread(volatileExample.new Worker(), "Worker-2");
  thread1.start();
  thread2.start();
  Thread.currentThread().sleep(60000);
  thread1.interrupt();
  thread2.interrupt();
 }
}

Start that with Hotspot VM with -server option set. What you'll get is the following expected output:

Timer interrupted main thread ...
Main thread was interrupted by timer ...
Timer interrupted main thread ...
Main thread was interrupted by timer ...
Timer interrupted main thread ...
Main thread was interrupted by timer ...
Timer interrupted main thread ...
Main thread was interrupted by timer ...
Timer interrupted main thread ...
Main thread was interrupted by timer ...

Now take out the volatile in line 8 above and restart, again with -server option set. What you should get is the following output:

Timer interrupted main thread ...

What happened? The Timer thread sets the expired flag to true but the main thread does not see the change. This is exactly what volatile is all about: it ensures that threads share the actual value of a specific variable. If you declare a variable as volatile all threads read that specific value from the memory heap. In the described example the timer thread set the expired value within the thread and this update was not reflected in the memory heap! Notice, that I cancel the timer thread when I set the expired variable to true. This causes the timer thread to die immediately after the run()-method is passed. The main memory heap may be updated now, but the worker thread continues to work on the 'cached' data in the thread memory.

Next: now restart the code again without the volatile modifier. This time you set the -client JVM option (which is the default mode on Windows). The result is the following:

Timer interrupted main thread ...
Main thread was interrupted by timer ...
Timer interrupted main thread ...
Main thread was interrupted by timer ...
Timer interrupted main thread ...
Main thread was interrupted by timer ...
Timer interrupted main thread ...
Main thread was interrupted by timer ...
Timer interrupted main thread ...
Main thread was interrupted by timer ...

In the client mode the JVM obviously behaves different and does not optimize so aggressively like in server mode. So even if you missed out the volatile modifier, you may not necessarily see an error during development. The JVM options influence the way how strict the JVM optimzes your code. Without volatile it is not garanteed that data changes made by the timer thread are visible to the main thread. But in this case for instance everything still works OK in client mode, which shows that the result of your program relies on the JVM options set.

Benchmark series on simple caching solutions in Java

2011-09-30T16:10:00.014+02:00

Caching is a very common solution when you don't want to repeat CPU intense tasks. The last days I was benchmarking options to do caching with ConcurrentHashMap. In this blog I publish the first results. I have used Heinz Kabutz' Performance Checker to do this. I also added some features based on my readings of this article series.

My Conclusions up front

I am testing three different cache implementations: "Check null", "check map" and "putIfAbsent" cache. The code is listed below the results. Also, I am using three different cache sizes: 10 (small), 100000 (large) and 1000000 (very large) possible key values. Again: The 10 units cache can have 10 different key values. The 100000 units cache can have 100000 different key values etc.

None of the implementation options show significant performance differences assuming cache size is equivalent. This is disappointing (!!) but also good to know. It's also a little surprising I think, 'cause for example "putIfAbsent" cache appears to be more complex then the others. All the solutions seem to decrease logarithmical in performance when the cache increases to very large sizes. The main reason for that should be the increased time for cache initialization, 'cause in my test harness I start with an empty cash and a fixed set of possible key values (i.e. 10, 100000 or 1000000). I'll come up with a benchmark series for fully initialized cache later.

Knowing what I know now, I would carefully recommend to use the "putIfAbsent" cache solution. That's because it has equivalent performance but gives you great flexibility to design the behaviour of your cache in highly concurrent scenarios. See the pattern solution of my last article about multithreading as an example of more complex use cases.

If you're interested in the test harness take a look at the implementations: CacheSolution_CheckMap.java, CacheSolution_CheckNull.java and CacheSolution_PutIfAbsent.java. I appreciate your comments very much!

My JVM was in mixed JIT mode and with -server option set. OK, let's go into this!

Here are the results

5 test runs each / 500 ms each test run
CL before/after = Classes Loaded before and after test harness
JIT before/after = Total JIT time before and after test harness
Small cache = 10 units
Large cache 100000 units
Very large cache = 1000000 units

Check null cache

Check map cache

The putIfAbsent cache

Your web applications work - by sheer coincidence!

2011-09-09T22:01:00.005+02:00

This blog describes a solution to a typical concurrency problem in a web application environment, and it illustrates that you - in all likelihood - cannot be sure that your application is thread-safe. It just works - by sheer coincidence.

Last week we had a severe problem in a critical web module of our production environment. The problem was that we had to restart a server at a time where we had very high user and transaction volumes (we do ~500.000 transactions in that module per business day). The server shutdown deleted our XSL template cache. As a consequence many threads tried to recompile the templates concurrently after restart. This in turn introduced a (CPU) system overload issue. We could only restart the server by blocking the incoming requests at the web server level. In fact we only allowed some threads to bypass the web server and to enter the application server. After some warmup time we opened the web server and the system started to work at an acceptable CPU load.

We've looked at the application code that caused the issue and decided to implement an intelligent concurrency pattern that fulfills following requirements:

- the number of threads that perform a CPU intense task concurrently (like XSL template compilation) should be limited to a configurable size (=> avoid CPU overload on startup in user load peek times)
- cache the result of each CPU intense task so that it will only execute once (we had that already in our error prone solution)
- enable the system to determine whether the CPU intense task needs to execute again (=> if the application was redeployed the templates need to recompile)

I spare me the effort (and the painfullness) to post the old non-safe code. The bugs in that code were:

firstly - it did not limit the number of threads allowed to compile templates
secondly - it did not ensure that only one thread compiles a specific template at a given time (e.g. our startup page) - instead many threads tried to compile the same template concurrently - this one is critical!

That all being said, here comes the solution to such a "too-many-threads concurrency issue". Here's the code and I also added a class diagram 'cause I believe this is a common situation in web applications.

The following snippet shows our new HTMLDocumentGenerator, its responsibility is to create and cache ConcurrentTransformer instances, one for each XSL document. The responsibility of ConcurrentTransformer is to compile a XSL template and to store the result in its template variable. HTMLDocumentGenerator owns the cache (Line 3), defines a limit of threads that can compile XSL documents (Line 5) and declares a Semaphore that implements a "thread bouncer" (Line 6). The createDocument-Method (Lines 10-20) creates and caches the new ConcurrentTransformer instances for each XSL document. Example: /start/index.xsl will have its own ConcurrentTransformer instance and /start/main_menu.xsl will also have its own unique instance.

Notice that we use ConcurrentHashmap.putIfAbsent() which allows cache lookup and cached object creation in a single step. This is equivalent to:

if (!map.containsKey(key))
   return map.put(key, value);
else
   return map.get(key);

This is a perfect approach in a multithreaded high volume application. You could do the same with owned locks or synchronized blocks, but it will hardly be so safe (and fast!) like the example above. You will understand my statement if you look at the implementation of ConcurrentHashmap. It locks segments internally and not the whole table! This allows very high volumes of concurrent access without lock contention. In Line 18 we call the method generateTemplate that actually performs the CPU intense task (template compilation).

In Line 5 we declared a Semaphore which acts as a bouncer for the CPU intense code sections. Using this class you can control the number of concurrent threads that perform a certain task concurrently. It's important to declare the Semaphore in the HTMLDocumentGenerator class, 'cause the thread limit applies to all cached ConcurrentTransformer instances.

Now let's look at the ConcurrentTransformer that I declared as a protected inner class of HTMLDocumentGenerator. Using an inner class ConcurrentTransformer instances have immediate access to member variables of HTMLDocumentGenerator. There is one ConcurrentTransformer for each unique XSL template.

Let's go through generateTemplate step-by-step:

Line 9: check if thread needs to compile this XSL template. If false, return the compiled template. mustCompile returns true if for example the template variable is null, which means the template wasn't compiled yet. The first thread that enters generateTemplate will get mustCompile() = true 'cause the ConcurrentTransformer was just created by that thread and the template variable is null.
Line 12: Block all subsequent threads 'cause only ONE thread can compile THIS XSL template at a given time.
Line 14: check again if thread needs to compile the XSL template. Sounds weird? Imagine a second thread that was blocked at Line 12 'cause the first call to mustCompile returned true. This thread does not need to compile again 'cause the other (faster/first) thread already compiled the template.
Line 17: check if thread exceeds the permitted number of threads allowed to do compile jobs. Because the bouncer was defined in the HTMLDocumentGenerator only a limited number of threads will enter the next code block. This actually was the tricky part of the solution: Lock threads that want to compile a specific template but also ensure that the total number of active compilation tasks does not exceed a defined limit! Because the bouncer applies to all cached ConcurrentTransformer instances this is possible.
Line 19-25: Do the actual compilation work which is the CPU intense part that aused the system overload.
Line 27: Release a permit to allow other thread (in a different ConcurrentTransformer instance) to enter the compilation code.
Line 30: Release the lock so that other threads waiting for that XSL document can continue their processing. (they will return the template that was just compiled, see line 14)

Done! This solution will enforce that (1) only one thread will try to compile a specific XSL document and that (2) only a limitted amount of threads can do compilation work.

Here is the class diagram that shows the pattern-style structure of the solution:

Sleeper bugs and why systems work by coincidence

Some weeks ago I met Dr. Heinz Kabbutz. I attended his Java Specialist Master Course and he opened my eyes for problems like this. Multi-threading was his first lesson and he started by saying something like this: "Your web applications are not thread safe, they just work - by sheer coincidence!" Although its a provocative statement, he is not wrong by saying that ... The described concurrency bug was in our code for a _long_ time, say for 8 years? It just didn't show up. What changed? We migrated from WebSphere Application Server 6 to 7. And we decided to use a new XSL parser, optimized for z/OS. The old XSL parser performed good at compilation time, but the compilation result did not perform so well. The new XSL parser promised an optimized compilation result. But obviously it did take more CPU during compilation. Now, since the compilation was CPU intense with the new parser our consurrency bug turned out to be a big problem suddenly. I'd call that a "sleeper bug". It is there for years, and suddenly it sabotages your system. I believe there are many bugs like this in todays Java applications, and I believe it's not too provocative to say that you will also have some sleeper bugs in your systems. Your systems work - but by sheer coincidence!