Connection time out. Careful with your threads
Last week we had a bottle neck on our application and it took us several days to find it. So, here is what you should not do.
The architecture is as followed :
- 2 Alteons to spread the load
- 2 reverse proxy
- a layer of Firewalls
- 2 HTTP compressors
- 4 web server iPlanet (Sun Java Web Server 6.0)
- a cluster of 40 Weblogic server running on 4 different boxes
- a database server.
Everything running on HP-Ux11. iPlanet only dispatch static pages and images and Weblogic has the presentation (JSP/Servlet) and EJB layers (it does 99% of the work).
The problem was that a lot of connections were falling on time out. We kept on focusing on weblogic and controlling the thread and memory activity to see if we had any dead lock of memory leek (there are some JNI calls somewhere). Weblogic wasn‘t doing anything, not under stressed at all. The application was configured to run 2000 concurent users that‘s why each instance of weblogic was running 60 threads (60 threads * 40 instance = 2400).
After some days of analysing and twisting weblogic, we remembered that we had a layer of iPlanets on the front. The admin guys were sure that there was no problem with iPlanet because each server was supposed to run 512 threads. It wasn‘t the case. Each instance was configured with 128 thread (128*4 instance = 512). That was why weblogic was doing nothing and connections were falling.
A rule of thumb is that the number of threads should decrease on the back end layers. Web layer deals with presentation (jsp, servlet, tag, HTTP session, SSO…) and not always calls go through the app server into the database. So your number of threads on the web layer should be higher that the number of threads of your app server.