The EJB Specification, Concurrency, and Batch Processing

Sunday, March 1, 2009 12:54
Posted in category JBoss, JEE, Java

The EJB specification does not leave much room for implementing concurrent processing within the EJB container. This poses a problem to developers who need to implement solutions for processing long running batch processes as part of the business logic of an application. The EJB specification states the following:

The enterprise bean must not attempt to manage threads. The enterprise bean must not attempt to start, stop, suspend, or resume a thread, or to change a thread’s priority or name. The enterprise bean must not attempt to manage thread groups.

These functions are reserved for the EJB container. Allowing the enterprise bean to manage threads would decrease the container’s ability to properly manage the runtime environment.

Researching this topic I found that while the specification states the restrictions, the EJB containers do not actually enforce the rule. I was able to test an implementation using a ThreadPoolExecutor in JBoss with no problems. Most implementations addressing long running batch processing solutions specify developing applications outside the EJB container. While the design respects the EJB programming restrictions it is not always a feasible solution for developers. Currently, I am confronted with business requirements to process a large set of files within the process flow of a business method.  The process flow would be to select files based on some criteria, specify what to do with the file collection, and submit the task asynchronously as a background process through JMS to a message-driven bean (MDB) for processing. When the job is done, update the database with the new file paths and notify the user of successful completion. There are several underlying issues:

  1. The user submits a request to process a very large set of files. The time to process this request may take hours to complete and will most-likely exceed the default configured transaction time-out of the message-driven bean at which point a TransactionTimeOut Exception is thrown and the operation halts. This can be overcome to a certain extent by increasing the transaction timeout period in the application server settings.
  2. The single large task executes in a sequential manner, and does not take advantage of all available CPU’s on a multi-processor machine.
  3. It would be more efficient and reduce execution time if the task could be broken up into smaller tasks (or batches) and executed in parallel, or distributed across a group of servers which can process each sub-task individually.

The design concepts are relatively simple but the implementation details are complex due to the programming restrictions in the EJB specification. This screams concurrency! The most commonly suggested solution for concurrency is to use JMS. One solution I found is to break up the task, create multiple JMS messages that reply to an acknowledgment queue. Then set the Session Bean to listen to that ack queue using a message filter to collect the results of each sub-task. This is a good solution but a bit more complex to implement than bending the rules and using a ThreadPoolExecutor. I think the real issue with managing threads within the EJB container is the topic of reentrancy. In my case, I am calling a remote service so there is no chance of that.

There are many factors to consider when implementing a batch processing framework for your application. There is no one size fits all solution. You really have to evaluate all the options, weigh the pros and cons of each, and decide on your own which approach works best in your environment. Here is a list of articles I found very helpful in making that decision:

No related posts.

You can leave a response, or trackback from your own site.

One Response to “The EJB Specification, Concurrency, and Batch Processing”

  1. Snehal Antani says:

    November 15th, 2009 at 11:39 am

    An interesting design point to consider is that the degree of parallelism should be a point-in-time decision, influenced by the available capacity of the system, service-level agreements (SLA’s) of concurrent workloads (OLTP, Real-Time, etc), approaching deadlines for other batch jobs in execution, and so on. It’s easy for a developer to arbitrarily spin off 10 commonj threads for parallel execution, but 10 threads may be too few or too many given the other workloads in the system. For designing parallel processing for batch, design the application to not care about the degree of parallelization. Shift the burden of assessing the degree of parallelization to the infrastructure, where some external component applies a partitioning algorithm to the job, dispatches the many instances of the job across the collection of application server threads, where each job instance operates on its own segment of the data. This is how we designed it in WebSphere Compute Grid, IBM’s batch processing technology.

    The following article and presentation discuss batch application design patterns. They are generally technology agnostic, though the examples are provided in the context of WebSphere Compute Grid.

    - Batch application design:
    - http://sites.google.com/site/snehalantani/designingBatchApps.zip
    - http://sites.google.com/site/snehalantani/DesigningBatchApplications.pdf

    Batch processing infrastructure overview:
    - http://sites.google.com/site/snehalantani/WebSphereDataIntensiveApps.pdf
    - http://www-128.ibm.com/developerworks/websphere/techjournal/0804_antani/0804_antani.html

    Customer experience with modern batch processing:
    - http://www-01.ibm.com/software/tivoli/features/ccr2/ccr2-2008-12/swissre-websphere-compute-grid-zos.html

    Latest Compute Grid overview presentation:
    - http://sites.google.com/site/snehalantani/latestpresentationmaterial

Leave a Reply