Last week we had a severe problem in a critical web module of our production environment. The problem was that we had to restart a server at a time where we had very high user and transaction volumes (we do ~500.000 transactions in that module per business day). The server shutdown deleted our XSL template cache. As a consequence many threads tried to recompile the templates concurrently after restart. This in turn introduced a (CPU) system overload issue. We could only restart the server by blocking the incoming requests at the web server level. In fact we only allowed some threads to bypass the web server and to enter the application server. After some warmup time we opened the web server and the system started to work at an acceptable CPU load.
We've looked at the application code that caused the issue and decided to implement an intelligent concurrency pattern that fulfills following requirements:
- the number of threads that perform a CPU intense task concurrently (like XSL template compilation) should be limited to a configurable size (=> avoid CPU overload on startup in user load peek times)
- cache the result of each CPU intense task so that it will only execute once (we had that already in our error prone solution)
- enable the system to determine whether the CPU intense task needs to execute again (=> if the application was redeployed the templates need to recompile)
I spare me the effort (and the painfullness) to post the old non-safe code. The bugs in that code were:
firstly - it did not limit the number of threads allowed to compile templates
secondly - it did not ensure that only one thread compiles a specific template at a given time (e.g. our startup page) - instead many threads tried to compile the same template concurrently - this one is critical!
That all being said, here comes the solution to such a "too-many-threads concurrency issue". Here's the code and I also added a class diagram 'cause I believe this is a common situation in web applications.
The following snippet shows our new
HTMLDocumentGenerator
, its responsibility is to create and cache ConcurrentTransformer
instances, one for each XSL document. The responsibility of ConcurrentTransformer
is to compile a XSL template and to store the result in its template
variable. HTMLDocumentGenerator
owns the cache (Line 3), defines a limit of threads that can compile XSL documents (Line 5) and declares a Semaphore
that implements a "thread bouncer" (Line 6). The createDocument
-Method (Lines 10-20) creates and caches the new ConcurrentTransformer
instances for each XSL document. Example: /start/index.xsl
will have its own ConcurrentTransformer
instance and /start/main_menu.xsl
will also have its own unique instance.Notice that we use
ConcurrentHashmap.putIfAbsent()
which allows cache lookup and cached object creation in a single step. This is equivalent to:if (!map.containsKey(key)) return map.put(key, value); else return map.get(key);
This is a perfect approach in a multithreaded high volume application. You could do the same with owned locks or
synchronized
blocks, but it will hardly be so safe (and fast!) like the example above. You will understand my statement if you look at the implementation of ConcurrentHashmap
. It locks segments internally and not the whole table! This allows very high volumes of concurrent access without lock contention. In Line 18 we call the method generateTemplate
that actually performs the CPU intense task (template compilation). In Line 5 we declared a
Semaphore
which acts as a bouncer for the CPU intense code sections. Using this class you can control the number of concurrent threads that perform a certain task concurrently. It's important to declare the Semaphore
in the HTMLDocumentGenerator
class, 'cause the thread limit applies to all cached ConcurrentTransformer
instances. Now let's look at the
ConcurrentTransformer
that I declared as a protected inner class of HTMLDocumentGenerator
. Using an inner class ConcurrentTransformer
instances have immediate access to member variables of HTMLDocumentGenerator
. There is one ConcurrentTransformer
for each unique XSL template. Let's go through
generateTemplate
step-by-step:Line 9: check if thread needs to compile this XSL template. If
false
, return the compiled template. mustCompile
returns true
if for example the template
variable is null
, which means the template wasn't compiled yet. The first thread that enters generateTemplate
will get mustCompile() = true
'cause the ConcurrentTransformer
was just created by that thread and the template
variable is null
.Line 12: Block all subsequent threads 'cause only ONE thread can compile THIS XSL template at a given time.
Line 14: check again if thread needs to compile the XSL template. Sounds weird? Imagine a second thread that was blocked at Line 12 'cause the first call to
mustCompile
returned true
. This thread does not need to compile again 'cause the other (faster/first) thread already compiled the template. Line 17: check if thread exceeds the permitted number of threads allowed to do compile jobs. Because the
bouncer
was defined in the HTMLDocumentGenerator
only a limited number of threads will enter the next code block. This actually was the tricky part of the solution: Lock threads that want to compile a specific template but also ensure that the total number of active compilation tasks does not exceed a defined limit! Because the bouncer
applies to all cached ConcurrentTransformer
instances this is possible.Line 19-25: Do the actual compilation work which is the CPU intense part that aused the system overload.
Line 27: Release a permit to allow other thread (in a different
ConcurrentTransformer
instance) to enter the compilation code.Line 30: Release the lock so that other threads waiting for that XSL document can continue their processing. (they will return the template that was just compiled, see line 14)
Done! This solution will enforce that (1) only one thread will try to compile a specific XSL document and that (2) only a limitted amount of threads can do compilation work.
Here is the class diagram that shows the pattern-style structure of the solution:
Sleeper bugs and why systems work by coincidence
Some weeks ago I met Dr. Heinz Kabbutz. I attended his Java Specialist Master Course and he opened my eyes for problems like this. Multi-threading was his first lesson and he started by saying something like this: "Your web applications are not thread safe, they just work - by sheer coincidence!" Although its a provocative statement, he is not wrong by saying that ... The described concurrency bug was in our code for a _long_ time, say for 8 years? It just didn't show up. What changed? We migrated from WebSphere Application Server 6 to 7. And we decided to use a new XSL parser, optimized for z/OS. The old XSL parser performed good at compilation time, but the compilation result did not perform so well. The new XSL parser promised an optimized compilation result. But obviously it did take more CPU during compilation. Now, since the compilation was CPU intense with the new parser our consurrency bug turned out to be a big problem suddenly. I'd call that a "sleeper bug". It is there for years, and suddenly it sabotages your system. I believe there are many bugs like this in todays Java applications, and I believe it's not too provocative to say that you will also have some sleeper bugs in your systems. Your systems work - but by sheer coincidence!
Hi Niklas,
ReplyDeleteI remember your comment at the end of day 1 of our course: "I only understood half of what you said." and you meant that as a compliment :-)
Thank you so much for this write-up - it is a perfect example of why we need to understand concurrency and the effect that concurrency bugs can have.
In the Law of the Overstocked Haberdashery (http://www.javaspecialists.eu/archive/Issue149.html), I mentioned this in regard to how many threads are healthy (even if they are inactive): What if a few hundred threads become active suddenly?
I'm glad you sorted out your problems and happy that you attended the course just-in-time (JIT).
Regards from Crete :-)
Heinz
Quick question, should'nt the 'mustCompile' method by synchronized, and would that remove the need for calling that method twice?
ReplyDeleteThis is a VERY impressive post. My name is Jim Manico from the OWASP Foundation where we just added your article to our appSec news feed.
ReplyDeleteWould you be interested in converting this and other information into a "OWASP Thread Safety Cheat Sheet" by any chance? We have quite a few already - check out https://www.owasp.org/index.php/Cheat_Sheets and let me know if you would like to join the party.
jim@owasp.org
Aloha!
@Anonymous
ReplyDeletethe first mustCompile call is necessary to prevent subsequent threads from locking overhead when THIS template is compiled already. They just return w/o locking anything ... The second call is necessary if two threads are in a race condition after the first mustCompile call. In that case the one thread gets the lock, calls mustCompile and will compile the template 'cause mustCompile returns "true". The second thread was waiting at the lock in the meantime. When the lock is released it enters the locked code block and calls mustCompile, this time it returns "false" and the thread immedieately returns the compiled template ...
Hi Niklas,
ReplyDeleteCool blog! Is there an email address I can contact you in private?
I'd suggest using a thread-safe Map of Futures generated by submit calls to a ThreadPoolExecutor for Callable XSL compiler objects.
ReplyDeletejava.util.concurrent makes this kind of stuff much easier.
@Anonymous
ReplyDeleteThx for the comment. I do think your siggestion makes sense outside an application server. In our case the issue is that the application server continues to start threads that try to compile the same templates ....