End-to-end monitoring of BPEL process instances across composite borders is a great feature of Oracle SOA/BPM Suite 11g. It is shown in nearly every PreSales presentation and when you were used to know SOA Suite 10g or to work with other kinds of distributed, heterogenous systems it is a real improvement.
But when you implement large process chains you might realize that the newly won process transparency can raise new challenges. Imagine you have a root process which creates several instances of sub-processes. In such a case without doing any extra work you will get one flow trace for the process and all of its sub-process instances. For large process chains you need to consider the following facts:
– Transparency: Although it shows an end-to-end view of the whole execution tree, trying to find a faulted sub-process might be a real challenge. It doesn’t matter if you start the search from the root process instance or from one of the sub-processes – the flow trace always displays all components of the execution context. When you click on a sub-process and you go back to the flow trace you might have to expand all child nodes again and again.
– DataSetTooLargeException: When your flow trace becomes longer and longer, you will observe, that there is a maximum size for the audit trail that can be displayed by Enterprise Manager. Usually it results in a
java.lang.RuntimeException: oracle.soa.management.facade.DataSetTooLargeException: Requested audit trail size is larger than threshold … chars
For large execution trees, sub-process instances might not be displayed or you might not be able to see things in detail.
– Low Memory: It is not only the visible representation of your instance, which struggles. A huge audit trail implicitly means that your needed memory allocation for executing your process instance grows. It can grow to this extent that your process instance crashes because of running low in memory.
– Purging: With large flow traces you should always have an eye on the capacity of your soa-infra database. Why is it so? Usually you should have purging routines installed to keep your system healthy – regular deletion of “old” instances from the dehydration store. To say it in a nutshell, the purging routines eliminate “completed” instances. If one of your sub-process instances goes into a faulted state, all processes in the same execution context are ignored from being purged (except when you define the ignore_state attribute to true). This means, although 99 percent of your instances have been executed correctly and could be purged the whole instance data, which can be huge as we stated earlier, are kept in your dehydration store.
So, how to deal with all these challenges? There are two relatively small changes which we describe in the following two posts:
1) Splitting large flow traces by setting a new execution context
2) Using the CompositeInstanceTitle-Property and composite sensors to simplify instance identification via the Enterprise Manager