Troubleshooting the Message Broker Profile
You can troubleshoot and trace possible errors that can occur with the EI Message Broker profile in a given environment by using the methods given below.
Debugging
The following table provides descriptions of the important classes in the EI Message Broker that will be useful when you debug a session.
Class | Description | |
---|---|---|
Inbound |
| All inbound events (such as message arrival, subscription add/close events etc.) are handled through this class. |
| The incoming message goes through this processor first, where its message ID and destination data are populated to ensure the message order closest to the message arrival time. | |
| This processor will take the message content chunks, convert them to the andes core chunk size and delegate the rest of the work to the | |
| This processor will write the message metadata and content chunks to the storage database using a batch approach. | |
| Upon saving the message to storage, this handler is triggered to notify a message received event, or to notify a message acknowledged event from the consumer. | |
| This event is used to communicate the transaction commit/rollback events from the publisher. | |
Outbound |
| This processor is used to deliver the message to one/all of the active subscriptions (based on the message destination). |
| This class is used to handover the message to the consumer after reading from the internal message buffer ( | |
| There are multiple slot delivery workers managed by the | |
| This is where the coordinator logic resides within the broker. All slots are managed and distributed through this class across the cluster. | |
AMQP |
| A channel is used for delivering and accepting messages to/from the broker. Each AMQP consumer/publisher has its own unique channel with a channel ID. |
| This is used as the bridge between the Qpid messaging events and Andes events. |
Message tracing
This is a broker-specific logging implementation for tracing a message through its inbound event until it is delivered to the consumer application. This implementation has minimal impact on the performance of the broker functionality. To enable message tracing in the broker:
- Open the
log4j.properties
file stored in the<EI_HOME>/wso2/broker/conf
folder. Uncomment the following:
#log4j.logger.org.wso2.andes.tools.utils.MessageTracer=TRACE,CARBON_TRACE_LOGFILE
Once message tracing is enabled, you can start the server and execute a grep
command with the relevant message ID you want to trace. This will print all the logs related to your message ID on your terminal.
Head dump and thread stack analysis
As with any other java product, if the broker cluster fails due to a resource exhaustion, the heap and thread dumps will always point you towards the cause of the leak. Therefore, it is important to be able to retrieve heap and thread dumps from an environment at the point when an error occurs. This will avoid the necessity of reproducing the exact issue again (specially in case of production issues). A resource exhaustion can happen for two reasons:
- Due to a bug in the system.
- An actual limitation of resources based on low configuration values.
You can easily create a heap dump and thread dump using the CarbonDump tool that is shipped with your product. These will also provide information about the product version and any patch inconsistencies.
Using wireshark to analyze protocol communication
Wireshark is a network traffic analysis tool with great filtering features. Given that the broker uses the AMQP and MQTT protocols (which are different from HTTP), wireshark is a good way of capturing the network traffic and verifying if the packets are going in the expected order with correct data.
Detecting database anomalies
This section explains how you can identify errors by evaluating the condition of the database. Even though most of the database schema is self-explanatory, it is still good to know the special cases where the slot ranges are being stored and how the safe zone is being evaluated. The following diagram illustrates the slot-based message delivery algorithm:
Given that the coordinator is the decision maker on all operations, information on slots are also required to be maintained in a central location. Therefore, all the slot related information in the database are stored in mainly four tables as shown below.
Table | Description |
---|---|
| Each slot, the assigned node ID and the current status are maintained here. |
| Whenever a node communicates a possible slot range to the coordinator, the node will decide on the appropriate message ID range to be included in the slot and update this table with the last |
| This table contains the last published message ID for each node in order to calculate the global safe zone (minimum messageID from all nodes) that is required for deleting slots upon completion. |
| Whenever a slot is given by the coordinator to a broker node for processing, its |
With the above information, you can infer the following validations in the database at any given time:
There should not be any slots in the
MB_SLOT
table if theMB_METADATA
table is empty. This is an eventual guarantee. Even if there are slots queued for deletion, this rule must still be satisfied after some time.There should be no records in the
MB_METADATA
table if theMB_CONTENT
table is empty (one-to-one relationship).- Given the minimum message ID in the
MB_NODE_TO_LAST_PUBLISHED_ID
table, all slots within theMB_SLOT
table with the “assigned” status (state = 2) and the endMessageID less than the minimum published ID should be deleted (or at-least be cleared after some time).
Retrieving logs from the JMS client
You can simply monitor the logs from the JMS clients connecting to the broker by enabling the following startup property on the clients:
-Damqj.protocol.logging.level=true
Monitoring JAVA metrics
The metrics dashboard of the broker provides general JVM metrics as well as broker-specific metrics to help you identify how the broker is running in a loaded/relaxed environment. This functionality will give you information such as the unexpected increases of delivery channels, latencies of database reads/writes etc., which will help you identify possible errors in the system. See the documentation on metrics for instructions on how to configure and use the metrics dashboard.
Identifying common warnings/logs
The following table details some of the most common warning messages/logs that can be encountered when working with EI Message Broker. You will also find here the possible causes and solutions for such warnings/logs.
Warning | Cause | Solutions / Approach |
---|---|---|
[WARN] Invalid message state transition from <state1> | This means that the message lifecycle has deviated from the expected execution path. Example: getting SENT -> SCHEDULED_TO_SEND. | First, distinguish the expected execution order using the “org.wso2.andes.kernel.MessageStatus” class and then try to identify the exact point at which the issue occurred. There is a high probability of race conditions. |
[WARN] Invalid State transition from <stateA> suggested : <stateB> Slot ID : MyQueue|22309482... | This means that the slot lifecycle (used for delivery) has deviated from the expected execution path. Example: CREATED -> DELETED. | First, distinguish the expected execution order using the “org.wso2.andes.kernel.SlotState” class and then try to identify the exact point at which the issue occurred. There is a high probability of race conditions. |
[WARN] Error when trying to read property <property>. Switching to default value : <defaultValue> | This can happen if the broker's configuration file ( | Be sure that the correct version of the |
[INFO] Local subscription ADDED [TestQueue]ID=635@NODE/10.100.5.115:4000/T=1456518837302/D=true/X=false/O=null/E= | This log is printed whenever a new queue/topic subscription is added to the cluster. The lifecycle of a subscription is ADDED -> DELETED. In case of durable topic subscriptions, a subscription may be disconnected before being deleted. If a subscription is disconnected, the messages will still persist until the subscriber is deleted. Note that this log is not printed if a durable subscription is disconnected and added for the second time after the cluster starts. | Info Level log. |
Channel created (ID: 21765) {org.wso2.andes.kernel.AndesChannel} | This log is printed every time a publish/subscribe channel is established between the client and the broker node. An Andes channel is mapped (one-to-one) to an | Info level log. |