Troubleshooting the ASP.net System.OutOfMemoryException with DebugDiag v1.1

 

 

 

 

Outline

 

·        Diagnosis

·        Preliminary Steps (before the OOM exception occurs)

1.      Debug=True Check

2.      Install DebugDiag 1.1

3.      Replace SOS.dll

4.      Configure a Crash Rule

5.      Configure a Perfmon Capture

·        Wait

·        Reactive Steps (after waiting for the OOM condition to be reached)

1.      Check User Dump Count

2.      Stop Perfmon Capture

3.      Recover the Server

4.      Data Collection

 

·        Other Considerations

1.      Leak Tracking

2.      Isolation into Application Pools

 

 

 

 

 

 

Diagnosis:

 

This page is for troubleshooting when we either see the 'System.OutOfMemoryException' reported by ASP.net in a client browser, in the event logs, or in a memory dump. 

 

Unlike ASP.net 1.1, ASP.net 2.0 and 3.5 are very good about recording their exceptions into the System event log and Application event log.   The steps in this page are geared for troubleshooting events like these:

 

ASP.NET 2.0.50727.0      

Event ID 1334      

"An unhandled exception occurred and the process was terminated.

Exception: System.OutOfMemoryException

Message: Exception of type 'System.OutOfMemoryException' was thrown.

 

ASP.NET 2.0.50727.0          

Event ID 1309      

Event code: 3005 Event message: An unhandled exception has occurred.

Process name: w3wp.exe    

Exception type: OutOfMemoryException    

Exception message: Exception of type 'System.OutOfMemoryException' was thrown. 

 

.NET Runtime 2.0 Error Reporting     

Event ID 5000      

EventType clr20r3, P1 w3wp.exe, P2 6.0.3790.3959, P3 45d6968e, P4 system, P5 2.0.0.0, P6 4889de7a, P7 2dc7, P8 16, P9 system.outofmemoryexception, P10 NIL.

 

 

 

 

 

Preliminary steps:

 

1.      Debug=false Check

 

Ensure all web.config files have Debug statements set to FALSE rather than TRUE.  Probably the most common cause for OOM conditions is having debug statements set to true on a high traffic web server.  (Read more here)

 

2.      Install DebugDiag v 1.1

 

Install Debugdiag v1.1 (or higher) by browsing to http://microsoft.com/downloads and searching All Downloads for DEBUGDIAG. (These instructions do not work in Debugdiag v1.0.  It must be v1.1 or higher.)

 

3.      Replace SOS.dll

 

If it takes only a few minutes to reach the OOM condition, this step is not needed.  If however it takes several hours or a few days to reach the OOM condition, I give high recommendations for replace the SOS.dll on the server with an improved version.  This will prevent a possible memory leak in DebugDiag 1.1 that will ultimately kill the debugdiag process and the iis process which it is attached to. 

 

To replace the SOS.dll  (for use with a 32-bit iis process and Debugdiag 1.1 x86)

A.      If applicable, deactivate or remove any debugdiag rules you may have already created.

B.      Download the new sos.dll from http://viisual.net/tools/sosdll/SOS.dll and save it to the server(s) that you need to monitor CLR exceptions on.

C.      On the affected server(s), open Windows Explorer and navigate to C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727. Rename the sos.dll in that folder to sos.dll.old and copy the new sos.dll into this folder.

D.     Also in Windows Explorer, navigate to C:\WINDOWS\Microsoft.NET\Framework\v1.1.4322. Rename the sos.dll in that folder to sos.old and copy the new sos.dll into this folder.

E.      Navigate to C:\Program Files\DebugDiag\Exts (or possibly C:\Program Files (x86)\DebugDiag\Exts) and copy the new sos.dll into this folder.  For good measure, also copy the new sos.dll into the C:\Program Files\DebugDiag folder. 

F.       In the services console (start > Run > open: services.msc) please stop and restart the Debug Diagnostic Service.

G.     Create the crash rule using the steps below.

 

To replace the SOS.dll  (for use with a 64-bit iis process and Debugdiag 1.1 64-bit Beta)

A.      If applicable, deactivate or remove any debugdiag rules you may have already created.

B.      Download the new sos.dll from http://viisual.net/tools/sosdll/AMD64/SOS.dll and save it to the server(s) that you need to monitor CLR exceptions on.

C.      On the affected server(s), open Windows Explorer and navigate to C:\windows\Microsoft.NET\Framework64\v2.0.50727. Rename the sos.dll in that folder to sos.dll.old and copy the new sos.dll into this folder.

D.     Also in Windows Explorer, navigate to C:\Program Files\DebugDiag\Exts and copy the new sos.dll into this folder.  For good measure, also copy the new sos.dll into the C:\Program Files\DebugDiag folder. 

E.      In the services console (start > Run > open: services.msc) please stop and restart the Debug Diagnostic Service.

F.       Create the crash rule using the steps below.

 

 

4.      Configure a Crash Rule

 

The following steps are to configure a debugdiag crash rule in a way that a memory dump should automatically be triggered the same microsecond the System.OutOfMemoryException is raised.

 

Launch Debugdiag from the Programs Menu

 

Select “Crash” as the rule type

 

 

Choose either “All IIS/Com+ related processes”

          Or

“A specific process”

          Or

“A Specific IIS Web Application Pool”

         (If unsure, go with “All IIS/Com+ related processes.”)

 

 

Select the Exceptions button

 

 

Select the Add Exception button

 

 

 

Highlight the Exception Code E0434F4D in the left-side pane. 

 

 

 

 

 

In the .NET Exception Type field, carefully type (or cut-and-paste):

 

System.OutOfMemoryException

 

If you mistype it, don’t capitalize the right letters, or leave a space at the end or beginning, the dumps will not be produced.

 

Set Action Type to: Full Userdump

 

Set Action Limit to:  1 or 2 or 3

(We probably only need 1 dump really)

 

Click Save and Close button, Next, Next, Activate the rule whenever you wish to begin monitoring the IIS processes for the OOM condition.

 

As long as the rule is active, debugdiag is monitoring the IIS process(es) and waiting for the next System.OutOfMemoryException to be thrown.

 

 

1.      Configure a PERFMON Capture

 

Before the problem occurs, please set up perfmon captures on the web server(s).  This can be done locally (a server’s own perfmon monitoring the same server) or remotely (one server’s perfmon montoring another server).   The following steps may not line up perfeclty with all versions of Windows 2000, XP, 2003, Vista, 2008, and Windows 7 but they should be adequate for getting the idea across of how to set it up. 

 

Steps:

 

Click the Windows Start Button > Run > Open: Perfmon [Enter]

Expand “Performance Logs and Alerts”

Right Click on “Counter Logs”

Choose “New Log Settings…”

 

cid:image001.png@01CA2BF1.8A6D44D0

 

Enter a descriptive name (such as “OOM”)

Note the log file location for later (or go to the “Log Files” tab and change the location)

Click the “Add” button

Click the “All Counters” and “All Instances” radio buttons

Select the following from the “Performance Object” dropdown, being sure to “Add” each one as you select it:

   

·        Add every Object that begins with “.NET”  (such as, .NET CLR Data, .NET CLR Exceptions,  .NET CLR Interop, etc.)

·        Add every Object that begins with ASP.NET (such as ASP.NET, ASP.NET Applications, etc.)

·        Memory

·        Process

·        Processor

·        Thread

·        Web Service

 

cid:image002.jpg@01CA2BF1.8A6D44D0   cid:image003.jpg@01CA2BF1.8A6D44D0  cid:image004.jpg@01CA2BF1.8A6D44D0

 

 

Click “Close”

Click “OK”

 

Note:  you may have to choose “ADD OBJECTS” instead.  When the object is added, presumably all counters for that object should be included.  My list of steps for perfmon may be in need of revision.

 

Stop the perfmon capture by right-clicking it and selecting stop.  Leave it stopped until you’re ready to begin troubleshooting.   Start the capture by right clicking it and selecting START.  We want the perfmon capture to be running long before the OOM condition is reached.   But also want to avoid having perfmon blg files that are over 1 gigabyte in size.  So you may want to be cautious about when to start the perfmon capture.  You may also want to stop and restart the perfmon capture every few hours.  

 

 

 

 

WAIT

 

Wait for the problem to occur. 

 

The debugdiag crash rule should be active while waiting. 

 

The perfmon capture should be active while waiting.   (But don’t let it grow too big please.)

 

If you want to inject debugdiag’s leak trackers, you may also do this while waiting.

 

 

 

 

 

 Reactive Steps

 

When the OOM state is reached and problems are reported, log onto the affected server and…

 

 

1.      Check User Dump Count in DebugDiag

 

When the next System.OutOfMemoryException exception is thrown inside of the selected process, debugdiag should create a memory dump of the process(es) being monitored.  In theory, when asp.net throws the first system.outofmemory exception a dump will be triggered.  Assuming this works as planned, the userdump count in debugdiag will increase from 0 to 1.

 

If for some reason the dumps are not automatically produced as expected when the OOM condition begins, a good Plan B will be to manually trigger a single set of hang dumps when the OOM condition has begun to hang the website (but before it crashes the IIS process).  Steps for Plan B:

 

a)      Launch DebugDiag

b)      Click Cancel if given the choice of making a Crash Rule or Hang Rule

c)       Expand the Tools menu

d)      Select “Create IIS/Com+ Hang Dump” and wait for dump creation to occur (may take 30 seconds or more depending on size of the w3wp.exe)

Wait for the dumps to finish (this may take a few minutes)

 

 

 

2.      STOP PERFMON CAPTURE

 

   Stop the Performance Monitor log that corresponds to the affected server.

In Performance Monitor:

1. Right click on your log that is now listed under "Counter Logs"

2. Choose "Stop Log"

3. Save it as a .blg file to the location of your choice (it should save automatically)

4.  You can zip and upload this log later.

 

 

3.      RECOVER the SERVER

 

After Debugdiag has finished making its dumps—whether automated dumps from the crash rule or from manually triggered dumps in the middle of a barrage of oom exceptions-- feel free to restart the w3wp.exe to recover from the OOM condition by (a.) recycling the application pool, (b.) running iisreset or (c.) rebooting the server.

 

4.      DATA COLLECTION

 

If you kept the default settings, you can find that dump file(s) by clicking the icon of the manila folder.

 

 

 

You can load sos.dll into windbg.exe and try to interpret the dump for yourself.

 

You can also run the two analysis scripts in Debugdiag’s Advanced Analysis tab against the dumps.

 

Or you can zip the dump, open a support case with Microsoft, and ask the ASP.net team to analzye the dump to reveal the root cause of the System.OutOfMemoryException.   To collect and zip the dumps, event logs, iis log, and .net config files, expand the tools menu in debugdiag, select Advanced Data Collection, and select Create Full Cabinet File.

 

 

 

 

 

 

 

Other Considerations

 

 

IS LEAK TRACKING NEEDED?

 

Sometimes yes and sometimes no.  Debugdiag’s leak tracking feature is great for troubleshooting memory leaks in native code, but not for leaks in managed code.  So if there is a memory leak in the native code which leads to ASP.net complaining about not being able to find free blocks of contiguous memory to use, then, yes, we do want to inject leak tracking before the System.OutOfMemoryException dump is triggered by a System.OutOfMemoryException.  If however it is managed code that is occupying the memory space, we do not want to inject leaktracking.  For leaktracking its self can use up a lot of memory.

 

Usually I prefer to start troubleshooting the System.OutOfMemoryException by getting a System.OutOfMemoryException dump without leak tracking.  From that dump I can judge whether or not the memory is being taken up by managed code (no leak tracking needed) or by native code (leak tracking is a good idea).   If leak tracking is needed, we go for a second dump.

 

If we decide that we need to “inject leak tracking” into a process, you still follow the same steps above.  And then while waiting for the System.OutOfMemoryException to be thrown, you inject leak tracking as follows.

 

Browse to the website in focus to get the w3wp.exe spawned for the AppPool that is in focus.  Give it a bit of stress for 20 minutes or so before injecting leaktrack.

 

Switch to the Processes tab in DebugDiag

 

Focus on the w3wp.exe processes.  Scroll to the right on the Processes tab to see what the Application Pool Names are.  This way you can tell which w3wp.exe corresponds with which Application Pool.

 

Right-click all of the w3wp.exe processes that we need to focus on, one at a time, and select "Monitor for leaks" from the gray menu.

 

cid:image020.png@01CA0245.89D99EE0

 

 

 

Wait for ASP.net to begin reporting the throwing of the OOM exceptions and collect the dump(s) as usual.

 

 

 

 

 

 

 

 

ISOLATION into New AppPools?

 

One good question to ask in some cases is whether or not it is possible that there are too many websites and/or web applications assigned to the same application pool.   If there are multiple webapps assigned to the same IIS Application Pool, it may be good to consider isolating some of the main websites or main webapps into their own new Application Pools.   You also might want to isolate some of the websites/webapps that you’re most suspicious of into their own application pools.  But only do this with careful testing first as isolation can possibly break dependencies or state that different web applications might possible be sharing.  And, yes, it is possible to have too many application pools.  Generally, however, ten or fifteen AppPools may be safe.