Gurrala Ajay Kumar

Best Practices to be followed in Message Broker Development Perspective

1. MESSAGE FLOW DEVELOPMENT STANDARDS

-Below are the Best practices during the message flow development stage, including how to avoid message flow implementations that can cause performance problems.
To separate configuration information from business logic, do not externalize configuration information to a file or database. This technique can reduce performance, because reading a configuration or parameters file is a one-time activity at the time of the first instance of a node is created or at the time the first message is processed, instead of a loop checkup for each message. Since Message Broker is more CPU-oriented than I/O-oriented, it is usually best to avoiding I/O operations involving files or databases when possible.

Avoid overuse of Compute and Java Compute nodes because tree copying is processor heavy instead put reusable nodes into sub-flows. There are no additional nodes inserted into the message flow as a result of using sub flow's.

For efficient code re-use consider using ESQL modules and schema rather than sub flows. The addition of extra compute nodes to perform initialization and finalization for the processing that is done in the sub flow results in extra message tree copying which is relatively expensive as it is a copy of a structured object.

Avoid consecutive short message flows in which the output of a message flow is immediately processed by another message flow as opposed to the output of the message flow being read by an external application. By using consecutive short message flows you are forcing additional parsing and serialization of messages which is likely to be expensive. The only exception to this is the use of the Aggregation nodes.

It is important to think about the structure of your message flows and to think about how they will process incoming data. If a unique message flow is produced for each different type of message it is referred to as a specific flow and if several message flows, each processing a different group of messages we call it as a generic flow.

• There are advantages to both the specific and generic approaches. From a message throughput point of view it is better to implement specific flows. From a management and operation point of view it is better to use generic flows. Which approach you choose will depend on what is important in your own situation.

Maximize the use of the built-in parsers. It is better to attach more than one wire format to a single logical message set model and allow the Message Broker writers to convert the data when it is written to the wire, than having to use multiple lines of ESQL or Java to copy field values from one logical message set model to another. This will often require more time and effort in the construction of the model, but will save coding effort in return, and will provide a smaller runtime memory footprint which will be long lasting.

Avoid Parsing cost during routing a message in a message flow which needs to look at a field in the body of the incoming message(several megabytes in size) in order to make a routing decision.

• A technique to reduce this cost would be to have the application which creates this message copy the field that is needed for routing into a header within the message, say in an MQRFH2 header for an MQ message or as a JMS property if it is a JMS message. If you were to do this it would no longer be necessary to parse the message body so potentially saving a large amount of processing effort. The MQRFH2 header or JMS Properties folder would still need to be parsed but this is going to be smaller amount of data. The parsers in this case are also more efficient than the general parser for a message body as the structure of the header is known.

• A second approach to not parsing data is to not send it in the first place. Where two applications communicate consider sending only the changed data rather than sending the full message. This requires additional complexity in the receiving application but could potentially save significant parsing processing dependent on the situation. This technique also has the benefit of reducing the amount of data to be transmitted across the network.

Use opaque parsing (XMLNS and XMLNSC domains only) where you do not need to access the elements of the sub tree, for example you need to copy a portion of the input tree to the output message but may not care about the contents in this particular message flow. You accept the content in the sub folder and have no need to validate or process it in any way.

- Opaque parsing is a technique that allows the whole of an XML sub tree to place in the message tree as a single element. The entry in the message tree is the bit stream of the original input message. This technique has two benefits:
• It reduces the size of the message tree since the XML sub tree is not expanded into the individual elements.

• The cost of parsing is reduced since less of the input message is expanded as individual elements and added
to the message tree.

For Example: The element <p56:requestAssessorAvailability> in the below SQL code snippet was large with many child elements with it. In this case the cost of populating the message tree would be large. As no part of <p56:requestAssessorAvailability> is needed in the message flow we can opaquely parse this element.

CREATE LASTCHILD OF OutputRoot
DOMAIN(’XMLNS’)
PARSE (BitStream
ENCODING InputRoot.Properties.Encoding
CCSID InputRoot.Properties.CodedCharSetId
FORMAT ’XMLNS_OPAQUE’
TYPE ’p56:requestAssessorAvailability ’);

Note: It is not currently possible to use the CREATE statement to opaquely parse a message in the XMLNSC domain.
Use the compact parsers (XMLNSC, MRM XML and RFH2C). The compact parsers discard comments and white space in the input message. Dependent on the contents of your messages this may have an effect of not. By comparison the other parsers include data in the original message, so white space and comments would be inserted into the message tree.

Avoid using Reset Content Descriptor nodes. An RCD node is intended to change the message domain which actually parses the complete message tree. This is both memory and CPU intensive activity. A logical combination of IF statements, "CREATE with PARSE" statement and ESQL ASBITSTREAM can be used to eliminate RCD nodes and multiple compute/filter nodes.

Do not use trace nodes on production environments. Using ${Root} expression is expensive operation as this causes the complete message tree parsing. This happens even if the destination is not an active one.

Wherever possible use user-exits and redirect the audit / logging information appropriately. User exit feature gives the flexibility to activate & deactivate them dynamically during message processing.

Using destination list is more recommended rather than using more nodes when message has to be written to multiple destinations.

If XMLT nodes are to be used then make use of style sheet caching wherever possible.

Ensure transaction mode is set to 'NO' for input nodes and 'Automatic' for output nodes during message processing. These should differ only when processing has to be done under transaction mode.

Always have exceptional handling mechanism for the message flows rather than relying on the default broker exception handler. The default exceptional handler can block the message consumption when a single poisoned message processing is failed.

If there are database manipulating nodes then promote the data source name property as it might not be same across various environments (development, test, production, etc). Promoting the property helps to change it, at a flow level rather than at each node level during deployments on various environments. The same applies to other node properties like style sheet name for XMLT node.

Revisit all the java nodes and ensure that there is a clearMessage() called on every MbMessage object especially in the finally block. MbMessage object is used to create output message tree, environment tree, local environment tree, exception list tree, etc. So where ever the message trees are created, clear them out in the try - finally block.

Each message flow is a thread. For effective processing integrity it is not good to spawn additional threads in the message flow nodes. If a business requirement arises then all the threads should be maintained by the node itself and release them at the time of node deletion ensuring that there is no thread blocking during message processing.

2. PUBLISH/SUBSCRIBE BEST PRACTICES

When using pub/sub, the number of subscribers per topic will affect message throughput as this will determine how many output messages have to be written per publication on a topic. The messages will be written in a single unit of work so the broker queue manager log needs to be tuned when using persistent messages. You should also consider the use message batching, which is achieved through the use of Commit Count. The value for commit count is specified on the message flow configuration panel in the BAR file editor.

Use of publication nodes can lead to increased use of the broker database especially if they are retained publications as broker stores them in the broker database. So be judicious if the publications have to be retained publications or not.

Details of each subscription registration and de-registration is stored in the broker database table. If the level of dynamic subscribing or unsubscribing by applications is too high, then there will be greater level of broker database operations. All I/O and DB operations are expensive. So design a solution in such a way that these operations are minimized or tune the database for high performance.

When designing publish / subscribe model consider content based routing over topic based routing. By using content based routing, it is easy to evaluate an SQL expression against the contents of a message and take a decision whether the subscriber really need to get the message or not. This helps in the reduction of number of messages sent from the broker to the subscribers. On topic based routing, the subscriber would get all the messages on the registered topic. The subscriber might not need all the messages and might discard them based on the content. Thus content based routing helps the subscribers get the most of those messages that they really need.

Where the number of subscribers matching on a topic is high (in the hundreds or thousands) this may result in a message rate and message volume which is beyond the capabilities of a single queue manager. This will also depend on the publication rate. In this case consider using a collective of brokers in order to distribute load. The subscribers can then be allocated across the members of the collective rather than them all trying to use the same broker.

Use of collective brokers also improves availability of the Publish/Subscribe service. In the event of failure on any one broker, subscribers would need to connect to another broker and re-subscribe. A publisher may also need to reconnect to a broker in the collective.

3. DATABASE BEST PRACTICES FROM IIB PERSPECTIVE

I/O operations and database operations are expensive. Wherever possible minimize the number of such operations in the solution. Try to build cache wherever possible. The decision is purely business scenario driven. Excessive cache build is also not recommended.

Tune application heap size and the application control heap size .It is not possible to recommend a fixed value as it depends on the business condition and solution implementation. In order to determine a value issue the largest message transaction to the database (as per business requirement) and monitor the heap size.

Tune the bufferpool size if the application has the ability to work with large objects such as BLOBs, CLOBs and VarChars (as these are accessed using the memory area of the database).

Ensure that locklist and maxlocks are large enough, or else reduce the unit of work by issuing commit statements more often.

Use indexes wherever possible to reduce the contention between message flow instances and applications.

Where a message flow only reads data from a table, consider using a read only view of that table. This reduces the amount of locking within the database manager and reduces the processing cost of the read.

If database operations are unavoidable then at least reduce them by:
• Making the database local to the system where message broker resides.
• Having high buffer sizes.
• Using fast disks for data and logs.

When using the SELECT statement, make the WHERE clauses efficient to minimize the amount of data retrieved from a database.

When possible, use stored procedures as they are already compiled and stored in the database. This increase the speed of data retrieval.

When possible, avoid complex joins as they are expensive due to the processing time consumption.

4. LARGE FILE HANDLING BEST PRACTICES

Manipulation of a large message tree can, therefore, demand a great deal of storage. If you design a message flow that handles large messages made up of repeating structures, you can code specific ESQL statements that help to reduce the storage load on the broker. These ESQL statements cause the broker to perform limited parsing of the message, and to keep only that part of the message tree that reflects a single record in storage at a time.

Copy the body of the input message as a bit stream to a special folder in the output message. This creates a modifiable copy of the input message that is not parsed and which therefore uses a minimum amount of memory.

Avoid any inspection of the input message; this avoids the need to parse the message.

You can refer to the below IBM Knowledge center link to know more information on manipulating a large message tree
http://www-01.ibm.com/support/knowledgecenter/api/content/SSMKHH_9.0.0/com.ibm.etools.mft.doc/ac20702_.htm

5. ESQL CODING STANDARDS AND BEST PRACTICES

- Below are the Best programming practices during ESQL development in the message flows, including developing reusable code and minimizing optimized ESQL code in the message flows, purely from a performance improvement perspective.
Array subscripts [ ] are expensive in terms of performance because of the way in which subscript is evaluated dynamically at run time. By avoiding the use of array subscripts wherever possible, you can improve the performance of your ESQL code. You can use reference variables instead, which maintain a pointer into the array and which can then be reused; for example:
DECLARE myref REFERENCE TO InputRoot.XML.Invoice.Purchases.Item[1];
-- Continue processing for each item in the array
WHILE LASTMOVE(myref)=TRUE DO
-- Add 1 to each item in the array
SET myref = myref + 1;
-- Do some processing
-- Move the dynamic reference to the next item in the array
MOVE myref NEXTSIBLING;
END WHILE;

Avoid the use of CARDINALITY in a loop; for example:

WHILE ( I < CARDINALITY (InputRoot.MRM.A.B.C[]

The CARDINALITY function must be evaluated each time the loop is traversed, which is costly in performance terms. This is particularly true with large arrays because the loop is repeated more frequently. It is more efficient to determine the size of the array before the WHILE loop (unless it changes in the loop) so that it is evaluated only once; for example:

SET ARRAY_SIZE = CARDINALITY (InputRoot.MRM.A.B.C[]
WHILE ( I < ARRAY_SIZE )

Reduce the number of DECLARE statements (and therefore the performance cost) by declaring a variable and setting its initial value within a single statement. Alternatively, you can declare multiple variables of the same data type within a single ESQL statement rather than in multiple statements. This technique also helps to reduce memory usage.

The EVAL statement is sometimes used when there is a requirement to dynamically determine correlation names. However, it is expensive in terms of CPU use, because it involves the statement being run twice. The first time it runs, the component parts are determined, in order to construct the statement that will be run; then the statement that has been constructed is run.

Avoid the use of the PASSTHRU statement with a CALL statement to invoke a stored procedure. As an alternative, you can use the CREATE PROCEDURE ... EXTERNAL ... and CALL ... commands.

When using the PASSTHRU statement use host variables (parameter markers) for data values rather than coding literal values. This allows the dynamic SQL statement to be reused by the dynamic SQL statement processor within database. An SQL PREPARE on a dynamic statement is an expensive operation in performance terms, so it is more efficient to run this only once and then EXECUTE the statement repeatedly, rather than to PREPARE and EXECUTE every time.
For example, the following statement has two data and literal values, 100 and IBM:
PASSTHRU(’UPDATE SHAREPRICES AS SP SET Price = 100 WHERE SP.COMPANY = ‘IBM’’);

This statement is effective when the price is 100 and the company is IBM. When either the Price or Company changes, another statement is required, with another SQL PREPARE statement, which impacts performance.

However, by using the following statement, Price and Company can change without requiring another statement or another PREPARE:

PASSTHRU(’UPDATE SHAREPRICES AS SP SET Price = ? WHERE SP.COMPANY = ?’,
InputRoot.XML.Message.Price,InputRoot.XML.Message.Company);

Use reference variables to refer to long correlation names such as InputRoot.XMLNSC.A.B.C.D.E. Declare a reference pointer as shown in the following example:

DECLARE refPtr REFERENCE to InputRoot.XMLNSC.A.B.C.D.E;

To access element E of the message tree, use the correlation name refPtr.E

You can use REFERENCE and MOVE statements to help reduce the amount of navigation within the message tree, which improves performance. This technique can be useful when you are constructing a large number of SET or CREATE statements; rather than navigating to the same branch in the tree, you can use a REFERENCE variable to establish a pointer to the branch and then use the MOVE statement to process one field at a time.

String manipulation functions used within ESQL can be CPU intensive; functions such as LENGTH, SUBSTRING, and RTRIM must access individual bytes in the message tree. These functions are expensive in performance terms, so minimizing their use can help to improve performance. Use the REPLACE function in preference to a complete re-parsing. Where possible, also avoid executing the same concatenations repeatedly, by storing intermediate results in variables.

Avoid nested IF statements instead use ELSEIF or CASE WHEN clauses to get quicker drop-out.

Use new FORMAT clause (in CAST function) where possible to perform data and time formatting.

Performance is affected by the SET statement being used to create many more fields because navigating over all the fields that precede the specified field causes the loss in performance. as shown in the following example:

SET OutputRoot.XMLNSC.TestCase.StructureA.ParentA.field1 = '1';
SET OutputRoot.XMLNSC.TestCase.StructureA.ParentA.field2 = '2';
SET OutputRoot.XMLNSC.TestCase.StructureA.ParentA.field3 = '3';
SET OutputRoot.XMLNSC.TestCase.StructureA.ParentA.field4 = '4';

If you are accessing or creating consecutive fields or records, you can solve this problem by using reference variables for example:

SET OutputRoot.XMLNS.TestCase.StructureA.ParentA.field1 = '1';
DECLARE outRef REFERENCE TO OutputRoot.XMLNS.TestCase.StructureA.ParentA;
SET outRef.field2 = '2';
SET outRef.field3 = '3';
SET outRef.field4 = '4';
SET outRef.field5 = '5';
When referencing repeating input message tree fields, you can use the following ESQL:
DECLARE myChar CHAR;
DECLARE inputRef REFERENCE TO InputRoot.MRM.myParent.myRepeatingRecord[1];
WHILE LASTMOVE(inputRef) DO
SET myChar = inputRef;
MOVE inputRef NEXTSIBLING NAME 'myRepeatingRecord';
END WHILE;

Wherever possible avoid using Local Environment tree and use Environment tree to store information while the message flow processes the message. Only one copy of the Environment tree exists for all the nodes of the message flow instance but for Local Environment, the message tree is copied for every node it is propagated.

If there is a need to send multiple output messages of the same input message then Use PROPAGATE function in the ESQL. This helps to reclaim the storage of the output message tree, for every message propagation that is done. This way memory utilization of the message flow can be reduced.

Use ROW and LIST constructors to create lists of fields. Also it is good to do variable initialization at the time of declaration itself. Thus wherever possible reduce the number of ESQL statements. This increases performance and reduces the amount of internal memory objects that were created and parsed.

Limit the use of shared variables to a small number of entries, tens of entries rather than hundreds or thousands, when using an array of ROW variables or order in probability of usage (current implementation is not indexed so performance can degrade with higher numbers of entries).

Code ESQL using the fewest number of lines possible. This will help to reduce memory and CPU usage at runtime. It is logical that the fewer lines of code that are used the more efficient processing will be.

The throughput of the message processing in ESQL is faster than the JAVA. (Java is at least 10%-20% slower than ESQL) because while processing the messages, JAVA uses Xpath syntax, which slowdowns the performance of the message flow, this is because XPath basically searches the full XML document with each XPath expression. While in esql uses field reference to navigate to the fields in XML.

Gurrala Ajay Kumar

Tuesday, 26 September 2017

Best Practices to be followed in Message Broker Development Perspective

No comments: