Our team is responsible for backend process of supply and
delivery of customer online shopping for one of the worlds leading retailers.
Recently we were working on the new project release and also
system upgrade across all the servers where we operate for day to day
activities. Interestingly we had a very peculiar issue and took some time out
of our schedules and halted our rollout operations. Our new server build is ready and our
application is installed on our OPS environment for the signoff of the build
with the latest application versions.
Our Applications deployment specifications were like this:
1)DotNet Framework
1.1 / 2.0 / 3.5
2)IIS 6.0
3)We have lot of applications with Windows / web -
Asp.net website , WCF and Webservers also.
4)Only One App-pool has been configured on IIS.
Intermittently we faced the server error occurring and it was stopping our testing activities.
Initially we thought it might be some memory or pagesize issue or app-pool
recycle issue.
Client
Application (Windows Service) consuming IIS hosted WCF service is failing with
0x80070006 (The handle is invalid) error, once App Pool is recycled.
And the
event log captured says something like this :-
Event
code: 3005
Event
message: An unhandled exception has occurred.
Event
time: 06/07/2012 17:48:54
Event
time (UTC): 06/07/2012 16:48:54
Event
ID: 9bae3be46e794c248586e5ee0f66504f
Event
sequence: 3
Event
occurrence: 2
Event
detail code: 0
Application
information:
Application
domain: /LM/W3SVC/1/Root/GHS_PCS_WS-1-129860669341875000
Trust
level: Full
Application
Virtual Path: /GHS_PCS_WS
Application
Path: D:\Inetpub\wwwroot\GHS_PCS_WS\
Machine
name: GHS9159FSR01
Process
information:
Process
ID: 22360
Process
name: w3wp.exe
Account
name: NT AUTHORITY\NETWORK SERVICE
Exception
information:
Exception
type: COMException
Exception
message: The handle is invalid. (Exception from HRESULT: 0x80070006 (E_HANDLE))
Request
information:
Request
URL: http://localhost/GHS_PCS_WS/DeliveryWebService.svc
Request
path: /GHS_PCS_WS/DeliveryWebService.svc
User
host address: 127.0.0.1
User:
NT AUTHORITY\SYSTEM
Is
authenticated: True
Authentication
Type: Negotiate
Thread
account name: NT AUTHORITY\NETWORK SERVICE
Thread
information:
Thread
ID: 6
Thread
account name: NT AUTHORITY\NETWORK SERVICE
Is
impersonating: False
Stack
trace:
Custom
event details:
For
more information, see Help and Support Center at
http://go.microsoft.com/fwlink/events.asp.
What is
the exact text of any error messages that you received that are associated with
this problem? Event code: 3005
Event
message: An unhandled exception has occurred.
Event
time: 06/07/2012 17:48:54
Event
time (UTC): 06/07/2012 16:48:54
Event
ID: 9bae3be46e794c248586e5ee0f66504f
Event
sequence: 3
Event
occurrence: 2
Event
detail code: 0
Application
information:
Application
domain: /LM/W3SVC/1/Root/GHS_PCS_WS-1-129860669341875000
Trust
level: Full
Application
Virtual Path: /GHS_PCS_WS
Application
Path: D:\Inetpub\wwwroot\GHS_PCS_WS\
Machine
name: GHS9159FSR01
Process
information:
Process
ID: 22360
Process
name: w3wp.exe
Account
name: NT AUTHORITY\NETWORK SERVICE
Exception
information:
Exception
type: COMException
Exception
message: The handle is invalid. (Exception from HRESULT: 0x80070006 (E_HANDLE))
Request
information:
Request
URL: http://localhost/GHS_PCS_WS/DeliveryWebService.svc
Request
path: /GHS_PCS_WS/DeliveryWebService.svc
User
host address: 127.0.0.1
User:
NT AUTHORITY\SYSTEM
Is
authenticated: True
Authentication
Type: Negotiate
Thread
account name: NT AUTHORITY\NETWORK SERVICE
Thread
information:
Thread
ID: 6
Thread
account name: NT AUTHORITY\NETWORK SERVICE
Is
impersonating: False
We could replicate the issue when we do a apppool recycle
and after resetting iis it was working allright. But in Live server we could
not sustain such a uncertainty as it involves huge loss.
A team of 3 engineers started analyzing the issue and step
by step opting out the possibilities and the options tried are
1. Iis reset
2. Force
App-pool recycle
3. Scheduled
time App-pool recycle.
4. Stop
App-pool recycling.
5. Change
the Page Size limit
6. Change
Authentication types.
7. App Pool
separation for different framework applications.
But in spite of all the above changes and implementing all
the suggestions on some of the blogs we could not over come the issue and we
have to raise a ticket to Microsoft team as we didn’t had much time to do more
R&D on the issue.
This is the first time in my career where I interacted with
Microsoft for technical assistance.
A assignee of the ticket has started the next hour to
investigate the issue. He has called me many times to understand our project
the system specifications, and also learnt how to replicate the issue.
We had already sent the Event logs to the team for their
analysis. But they had come back with more log requirements for their
investigation
1. Application
Event Log
2. IIS Log
3. Process
Monitor Log
4. ETW Trace
Log
5. WCF Trace
Log
6. Com+ and
Com Diagonostic Tracing
7. AdPlus
FullOnFirst Crash Dump.
We had to replicate the issue and capture all the above logs
at the same time.
After this they analyzed the logs and came up with the
suggestions.
1. Remove
impersonation in WCF web site.
2. Separation
of Webservies and WCF services website
3. App-Pool
Separation of all the Framework and different type of application
When we implement these changes, those issues disappeared
and now its smooth operation.
Some Information on the logs:
1) AddPlus- FullOnFirst Crash Dump:
A user mode memory dump is a file typically with a DMP extension that contains a portion of virtual memory and CPU registers.
Two important items captured by the memory dump include the call stack for each thread and the exception record. This information is captured for the moment in time that the memory dump is taken.
A call stack is the way the operating system keeps track of the currently executing functions within the thread. Each thread within the process will have its own call stack. The call stack is a LIFO queue with the currently executing function being at the top of the stack.
The call stack reported by the tools used to capture the memory dump will only be as good as the symbols that are present. Without symbols for a module the reports output by the tools will show the memory offset from the beginning of the module, or try to get an export from the binaries, in which case it will be from the nearest function call. In the event that the memory dump is captured on a machine that does not have the symbols, or the symbols are not setup correctly they will be recalculated by memory address when it is analyzed on a machine with the proper files.
2) Process Monitor Logs:
Process Monitor is an advanced monitoring tool that shows real-time file system, registry and process activity. It combines the features of two legacy Sysinternals utilities, Filemon and Regmon, and adds a number of other enhancements.
Process Monitor can be used to track system and software activity to troubleshoot some of product issues, especially when it is necessary to track what particular application or process accesses a file or a registry key.
The main Process Monitor window lists all system operations along with their exact time, process name, ID and the result for each single operation:
To access advanced information on any single operation right-click on the operation line and choose Properties:
When analyzing a Process Monitor log, it is recommended to filter out entries. For example you can right click on Successes under Results, and exclude it. You can also filter out Processes and generally any field you like.
3) IIS Logs
The IIS logs provide a great deal of information about the activity of a Web application. You can find the IIS logs in systemroot\System32\LogFiles\W3SVCnumber, where numberis the site ID for the Web site.
4)ETW Tracing
Event Tracing for Windows (ETW) is a feature of the Windows operating system that allows you to troubleshoot issues with HTTP requests in IIS components like Active Server Pages (ASP), Internet Server Application Programming Interface (ISAPI) extensions, and the Secure Sockets Layer (SSL) Filter service. If an HTTP request in IIS fails or becomes unresponsive while ETW is enabled, you can view ETW trace data, called events, to determine which component caused the failure
Difference between IIS Logs and ETW Tracing
In general, IIS log is more traditonal way to collect data, and ETW provides more dynamic and verbose information.
For performance tracking, both provide time-taken information per request.
By using ETW, we can collect data in real time or file. But, you need to write consumer to get it in real-time.
5) Com+ and Com Diagnostic Tracing.
COM tracing can help you troubleshoot issues that are related to permission, activation, or registration.
The tracing mechanism produces a binary file. The binary file must be formatted by Microsoft so that it can be analyzed.
You can trace the following COM+ and COM providers:
• COMSVCS
• COMADMIN
• DCOMSCM (RPCSS)
• OLE32
6) WCF Trace Log:
Windows Communication Foundation (WCF) outputs the following data for diagnostic tracing:
• Traces for process milestones across all components of the applications, such as operation calls, code exceptions, warnings and other significant processing events.
• Windows error events when the tracing feature malfunctions. See Event Logging.
WCF tracing is built on top of System.Diagnostics. To use tracing, you should define trace sources in the configuration file or in code. WCF defines a trace source for each WCF assembly. The System.ServiceModel trace source is the most general WCF trace source, and records processing milestones across the WCF communication stack, from entering/leaving transport to entering/leaving user code. The System.ServiceModel.MessageLogging trace source records all messages that flow through the system.
Tracing is not enabled by default. To activate tracing, you must create a trace listener and set a trace level other than "Off" for the selected trace source in configuration; otherwise, WCF does not generate any traces. If you do not specify a listener, tracing is automatically disabled. If a listener is defined, but no level is specified, the level is set to "Off" by default, which means that no trace is emitted.
7)Event Logs
The application Event log contains events logged by programs. For example, a database program may record a file error in the application log. Events that are written to the application log are determined by the developers of the software program.
References:
1) http://blogs.msdn.com/b/cclayton/archive/2010/02/11/how-to-collect-a-memory-dump-using-adplus-and-debugdiag.aspx
2)http://msdn.microsoft.com/en-us/library/ms733025.aspx
3)http://support.microsoft.com/kb/308427
Some Information on the logs:
1) AddPlus- FullOnFirst Crash Dump:
A user mode memory dump is a file typically with a DMP extension that contains a portion of virtual memory and CPU registers.
Two important items captured by the memory dump include the call stack for each thread and the exception record. This information is captured for the moment in time that the memory dump is taken.
A call stack is the way the operating system keeps track of the currently executing functions within the thread. Each thread within the process will have its own call stack. The call stack is a LIFO queue with the currently executing function being at the top of the stack.
The call stack reported by the tools used to capture the memory dump will only be as good as the symbols that are present. Without symbols for a module the reports output by the tools will show the memory offset from the beginning of the module, or try to get an export from the binaries, in which case it will be from the nearest function call. In the event that the memory dump is captured on a machine that does not have the symbols, or the symbols are not setup correctly they will be recalculated by memory address when it is analyzed on a machine with the proper files.
2) Process Monitor Logs:
Process Monitor is an advanced monitoring tool that shows real-time file system, registry and process activity. It combines the features of two legacy Sysinternals utilities, Filemon and Regmon, and adds a number of other enhancements.
Process Monitor can be used to track system and software activity to troubleshoot some of product issues, especially when it is necessary to track what particular application or process accesses a file or a registry key.
The main Process Monitor window lists all system operations along with their exact time, process name, ID and the result for each single operation:
To access advanced information on any single operation right-click on the operation line and choose Properties:
When analyzing a Process Monitor log, it is recommended to filter out entries. For example you can right click on Successes under Results, and exclude it. You can also filter out Processes and generally any field you like.
3) IIS Logs
The IIS logs provide a great deal of information about the activity of a Web application. You can find the IIS logs in systemroot\System32\LogFiles\W3SVCnumber, where numberis the site ID for the Web site.
4)ETW Tracing
Event Tracing for Windows (ETW) is a feature of the Windows operating system that allows you to troubleshoot issues with HTTP requests in IIS components like Active Server Pages (ASP), Internet Server Application Programming Interface (ISAPI) extensions, and the Secure Sockets Layer (SSL) Filter service. If an HTTP request in IIS fails or becomes unresponsive while ETW is enabled, you can view ETW trace data, called events, to determine which component caused the failure
Difference between IIS Logs and ETW Tracing
In general, IIS log is more traditonal way to collect data, and ETW provides more dynamic and verbose information.
For performance tracking, both provide time-taken information per request.
By using ETW, we can collect data in real time or file. But, you need to write consumer to get it in real-time.
5) Com+ and Com Diagnostic Tracing.
COM tracing can help you troubleshoot issues that are related to permission, activation, or registration.
The tracing mechanism produces a binary file. The binary file must be formatted by Microsoft so that it can be analyzed.
You can trace the following COM+ and COM providers:
• COMSVCS
• COMADMIN
• DCOMSCM (RPCSS)
• OLE32
6) WCF Trace Log:
Windows Communication Foundation (WCF) outputs the following data for diagnostic tracing:
• Traces for process milestones across all components of the applications, such as operation calls, code exceptions, warnings and other significant processing events.
• Windows error events when the tracing feature malfunctions. See Event Logging.
WCF tracing is built on top of System.Diagnostics. To use tracing, you should define trace sources in the configuration file or in code. WCF defines a trace source for each WCF assembly. The System.ServiceModel trace source is the most general WCF trace source, and records processing milestones across the WCF communication stack, from entering/leaving transport to entering/leaving user code. The System.ServiceModel.MessageLogging trace source records all messages that flow through the system.
Tracing is not enabled by default. To activate tracing, you must create a trace listener and set a trace level other than "Off" for the selected trace source in configuration; otherwise, WCF does not generate any traces. If you do not specify a listener, tracing is automatically disabled. If a listener is defined, but no level is specified, the level is set to "Off" by default, which means that no trace is emitted.
7)Event Logs
The application Event log contains events logged by programs. For example, a database program may record a file error in the application log. Events that are written to the application log are determined by the developers of the software program.
References:
1) http://blogs.msdn.com/b/cclayton/archive/2010/02/11/how-to-collect-a-memory-dump-using-adplus-and-debugdiag.aspx
2)http://msdn.microsoft.com/en-us/library/ms733025.aspx
3)http://support.microsoft.com/kb/308427