Troubleshooting AR Terminations

View previous topic View next topic Go down

Troubleshooting AR Terminations

Post  giby.varghese@gmail.com on Sun Nov 15, 2009 6:46 pm

AR System server terminated when a signal/exception was received by the server (ARNOTE 20)

AR System server terminated when a signal/exception was received by the server (ARNOTE 20)

How do you trouble shoot this problem

giby.varghese@gmail.com

Posts : 107
Points : 222
Reputation : 3
Join date : 2009-11-11

View user profile

Back to top Go down

Re: Troubleshooting AR Terminations

Post  giby.varghese@gmail.com on Sun Nov 15, 2009 6:46 pm

Overview



This error indicates that a thread in AR Server crashed. The error usually contains the RPC program number right before the error and after the time stamp. If the 390600 number is shown, it implies that the AR Server crashed and must be manually restarted. Sometimes, this error occurs on just a single thread, such as a Fast, List or Private thread with RPC numbers of 390620 (Fast), 390635 (List) and private threads specified by their RPC program number in the Admin Tool under File->Server Information on the Server Ports and Queues tab.



When a thread crashes, it is usually a bad thing. Sometimes the impact to the user community is minimal, sometimes it means that AR Server is unusable.



Threads can crash for the following reasons:

-During software code execution, a condition is encountered for which there is no code written on how to handle the condition resulting in termination.

-Something kills the thread.

-Some resource is unavailable (database, memory, etc.)

-Some error condition exposes a condition for which there is no code written on how to handle the event, resulting in termination.



The armonitor.exe process is designed to automatically restart threads if they die, however, depending on the nature of the crash, it may not be possible to restart the threads, requiring a restart of AR Server as the only other option.



Trouble Shooting Overview

There are basically 3 options for trouble shooting termination errors:

Apply the latest patch to AR Server.
Enable logging and try to capture the termination to find out what AR Server is doing at the time of the crash.
Implement a special debug-version of AR Server so when the termination occurs, a core-dump file is produced that can be analyzed to determine root cause.
One could argue a 4th option would be to upgrade to a higher version of AR Server, but this would be considered as a last resort and also imply the AR Server is an unsupported version an FCS release.



The options that are pursed depend on the specific case and environment. Sometimes, terminations occur randomly, making it very challenging to capture with logging enabled. Sometimes, change control protocols (or other reasons) make applying a patch a difficult option in a production environment, or the server is already on the latest patch.







Details Regarding Option 1

From Supports perspective, this is the best option to try first. It is the easiest and quickest option plus it usually has the highest chance of success (although this depends on the current patch level and the bugs fixed in patches from the current patch to the latest patch). The draw back is that is that it is a change. Change control policies vary and sometimes are cumbersome to deal with in some customer environments. It is possible the change could introduce other problems too.

However, another compelling reason why this is highly recommended is that with option #3, using a debug server, the debug server is compiled using the latest patch level. If the latest patch is not tried, then applying a debug server may resolve the problem, but it would be unknown why. Therefore, from an efficiency standpoint, support always recommends option #1 and #2 first before going the route of a debug server.



Details Regarding Option 2

This is a status-quo option. The idea is to figure out what AR Server is doing at the exact moment the termination occurs. The best case scenario is that we encounter another error condition, such as a filter push fields that is failing, or a database error. Error conditions can trigger terminations. Fixing the error condition then resolves the termination. However, if the logs do not expose any error condition (besides the termination itself) then were mainly just doing general trouble shooting. General trouble shooting involves looking for patterns to the problem (does the termination occur when a certain API fires, does it occur during when accessing a certain table, does it occur when certain workflow fires, etc.). This helps to further define the problem, which can speed up resolution time, but it also usually implies a debug server will be required to determine root cause.

The specific logs to enable would be API, Escalation, Filter, SQL and Thread from the Server (via the Admin Tool). If it was suspected that plugins were at all involved, then a Plugin log too. Additionally, client side logs, especially Active Link logging, would also be requested, however, usually terminations are not reproducible. If they are not reproducible, it is nearly impossible to get any kind of client logging. Note: The arerror log, which is always enabled, is also required.



Details Regarding Option 3

This option is only taken if option #1 has been done already. Depending on the nature of the problem and the specific environment, sometimes it necessary to forego option #2 and go straight to a debug server.

Debug servers are applied just like a patch, only using the file replacement method (they are never compiled into an installer). There is a separate document on the procedure that is passed out with the debug server. This document explains how to use a Microsoft debugger called ADPlus which is a separately running process that attaches itself to the arserver.exe process.



When the termination occurs, the debug server causes a core dump file to be produced. The core dump contains information from memory about the termination. It is analyzed by an engineer and usually leads to root cause of the problem, or at least a deeper, code-level understanding of what s causing the termination (sometimes terminations can occur due to a bug in a database client, the OS or network layer).

giby.varghese@gmail.com

Posts : 107
Points : 222
Reputation : 3
Join date : 2009-11-11

View user profile

Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum