This site contains the documentation that is relevant to older WSO2 product versions and offerings.
For the latest WSO2 documentation, visit https://wso2.com/documentation/.

Troubleshooting the Message Broker Profile

You can troubleshoot and trace possible errors that can occur with the EI Message Broker profile in a given environment by using the methods given below.

Debugging

The following table provides descriptions of the important classes in the EI Message Broker that will be useful when you debug a session.

 

 

Class

Description

Inbound 

org.wso2.andes.kernel.disruptor.inbound.InboundEventManager

All inbound events (such as message arrival, subscription add/close events etc.) are handled through this class.

org.wso2.andes.kernel.disruptor.inbound.MessagePreProcessor

The incoming message goes through this processor first, where its message ID and destination data are populated to ensure the message order closest to the message arrival time.

org.wso2.andes.kernel.disruptor.inbound.ContentChunkHandler

This processor will take the message content chunks, convert them to the andes core chunk size and delegate the rest of the work to the MessageWriter.

org.wso2.andes.kernel.disruptor.inbound.MessageWriter

This processor will write the message metadata and content chunks to the storage database using a batch approach.

org.wso2.andes.kernel.disruptor.inbound.StateEventHandler

Upon saving the message to storage, this handler is triggered to notify a message received event, or to notify a message acknowledged event from the consumer.

org.wso2.andes.kernel.disruptor.inbound.InboundTransactionEvent

This event is used to communicate the transaction commit/rollback events from the publisher.

Outbound

org.wso2.andes.kernel.disruptor.delivery.DeliveryEventHandler

This processor is used to deliver the message to one/all of the active subscriptions (based on the message destination).

org.wso2.andes.kernel.MessageFlusher

This class is used to handover the message to the consumer after reading from the internal message buffer (readButUndeliveredMessages).

org.wso2.andes.kernel.slot.SlotDeliveryWorker

There are multiple slot delivery workers managed by the SlotDeliveryWorkerManager. These will read messages from the database after selecting a slot range from the coordinator. The messages are then pushed to the message flusher for delivery.

org.wso2.andes.kernel.slot.SlotManagerClusterMode

This is where the coordinator logic resides within the broker. All slots are managed and distributed through this class across the cluster.

AMQP

org.wso2.andes.server.AMQChannel

A channel is used for delivering and accepting messages to/from the broker. Each AMQP consumer/publisher has its own unique channel with a channel ID.

org.wso2.andes.amqp.QpidAndesBridge

This is used as the bridge between the Qpid messaging events and Andes events.

 

Message tracing

This is a broker-specific logging implementation for tracing a message through its inbound event until it is delivered to the consumer application. This implementation has minimal impact on the performance of the broker functionality. To enable message tracing in the broker:

  1. Open the log4j.properties file stored in the <EI_HOME>/wso2/broker/conf folder.
  2. Uncomment the following:

    #log4j.logger.org.wso2.andes.tools.utils.MessageTracer=TRACE,CARBON_TRACE_LOGFILE

Once message tracing is enabled, you can start the server and execute a grep command with the relevant message ID you want to trace. This will print all the logs related to your message ID on your terminal.

Head dump and thread stack analysis

As with any other java product, if the broker cluster fails due to a resource exhaustion, the heap and thread dumps will always point you towards the cause of the leak. Therefore, it is important to be able to retrieve heap and thread dumps from an environment at the point when an error occurs. This will avoid the necessity of reproducing the exact issue again (specially in case of production issues). A resource exhaustion can happen for two reasons:

  • Due to a bug in the system.
  • An actual limitation of resources based on low configuration values.

You can easily create a heap dump and thread dump using the CarbonDump tool that is shipped with your product. These will also provide information about the product version and any patch inconsistencies.

Using wireshark to analyze protocol communication

Wireshark is a network traffic analysis tool with great filtering features. Given that the broker uses the AMQP and MQTT protocols (which are different from HTTP), wireshark is a good way of capturing the network traffic and verifying if the packets are going in the expected order with correct data.

Detecting database anomalies

This section explains how you can identify errors by evaluating the condition of the database. Even though most of the database schema is self-explanatory, it is still good to know the special cases where the slot ranges are being stored and how the safe zone is being evaluated. The following diagram illustrates the slot-based message delivery algorithm:

Given that the coordinator is the decision maker on all operations, information on slots are also required to be maintained in a central location. Therefore, all the slot related information in the database are stored in mainly four tables as shown below.

TableDescription

MB_SLOT

Each slot, the assigned node ID and the current status are maintained here.

MB_SLOT_MESSAGE_ID

Whenever a node communicates a possible slot range to the coordinator, the node will decide on the appropriate message ID range to be included in the slot and update this table with the last endMessageID submitted by the node for a given queue/topic.

MB_NODE_TO_LAST_PUBLISHED_ID

This table contains the last published message ID for each node in order to calculate the global safe zone (minimum messageID from all nodes) that is required for deleting slots upon completion.

MB_QUEUE_TO_LAST_ASSIGNED_ID

Whenever a slot is given by the coordinator to a broker node for processing, its endMessageID is updated in this table against the destination name.

With the above information, you can infer the following validations in the database at any given time:

  1. There should not be any slots in the MB_SLOT table if the MB_METADATA table is empty. This is an eventual guarantee. Even if there are slots queued for deletion, this rule must still be satisfied after some time.

  2. There should be no records in the MB_METADATA table if the MB_CONTENT table is empty (one-to-one relationship).

  3. Given the minimum message ID in the MB_NODE_TO_LAST_PUBLISHED_ID table, all slots within the MB_SLOT table with the “assigned” status (state = 2) and the endMessageID less than the minimum published ID should be deleted (or at-least be cleared after some time).

Retrieving logs from the JMS client

You can simply monitor the logs from the JMS clients connecting to the broker by enabling the following startup property on the clients:

-Damqj.protocol.logging.level=true

Monitoring JAVA metrics

The metrics dashboard of the broker provides general JVM metrics as well as broker-specific metrics to help you identify how the broker is running in a loaded/relaxed environment. This functionality will give you information such as the unexpected increases of delivery channels, latencies of database reads/writes etc., which will help you identify possible errors in the system. See the documentation on metrics for instructions on how to configure and use the metrics dashboard.

Identifying common warnings/logs

The following table details some of the most common warning messages/logs that can be encountered when working with EI Message Broker. You will also find here the possible causes and solutions for such warnings/logs.

Warning

Cause

Solutions / Approach

[WARN] Invalid message state transition from <state1>
suggested: <state2>  Message ID: 93293291982

This means that the message lifecycle has deviated from the expected execution path. Example: getting SENT -> SCHEDULED_TO_SEND.

First, distinguish the expected execution order using the org.wso2.andes.kernel.MessageStatus class and then try to identify the exact point at which the issue occurred. There is a high probability of race conditions.

[WARN] Invalid State transition from <stateA> suggested : <stateB> Slot ID : MyQueue|22309482...

This means that the slot lifecycle (used for delivery) has deviated from the expected execution path. Example: CREATED -> DELETED.

First, distinguish the expected execution order using the org.wso2.andes.kernel.SlotState” class and then try to identify the exact point at which the issue occurred. There is a high probability of race conditions.

[WARN] Error when trying to read property <property>. Switching to default value : <defaultValue>

This can happen if the broker's configuration file (broker.xml) is not up to date, or if it has been edited.

Be sure that the correct version of the broker.xml file is within the pack.

[INFO] Local subscription ADDED [TestQueue]ID=635@NODE/10.100.5.115:4000/T=1456518837302/D=true/X=false/O=null/E=
amq.direct/ET=org.wso2.andes.server.exchange.DirectExchange$1@2db707df/EUD=0/S=true {org.wso2.andes.subscription.SubscriptionStore}

This log is printed whenever a new queue/topic subscription is added to the cluster.

The lifecycle of a subscription is ADDED -> DELETED. In case of durable topic subscriptions, a subscription may be disconnected before being deleted. If a subscription is disconnected, the messages will still persist until the subscriber is deleted.

Note that this log is not printed if a durable subscription is disconnected and added for the second time after the cluster starts.

Info Level log.

Channel created (ID: 21765) {org.wso2.andes.kernel.AndesChannel}

This log is printed every time a publish/subscribe channel is established between the client and the broker node. An Andes channel is mapped (one-to-one) to an AMQChannel or an MQTTPublisherChannel.

Info level log.