Alois Kraus

blog

  Home  |   Contact  |   Syndication    |   Login
  133 Posts | 8 Stories | 368 Comments | 162 Trackbacks

News



Archives

Post Categories

Programming

One common thing while debugging is to search who owns/references a handle. For x86 processes you can simply search the full address space but in x64 this approach no longer works. While debugging a hang from a memory dump I did find out that the process MainWindowHandle did change to a different window which was not the main window handle. Searching for the values of NativeWindow objects did also lead to no trace on the managed heap so it must be some unmanaged window handle.

First I look at the Process objects which could contain interesting data

0:033> !DumpHeap -type System.Diagnostics.Process -stat
Statistics:
              MT    Count    TotalSize Class Name
000007fef3c7d090        1           24 System.Diagnostics.ProcessModuleCollection
000007fef3c76470        1           24 System.Diagnostics.ProcessThreadCollection
000007fef3c62c70        1           64 System.Func`2[[System.Diagnostics.Process, System],[System.Boolean, mscorlib]]
000007fef3c47718        8         2240 System.Diagnostics.Process
000007fef3c89140       57         3192 System.Diagnostics.ProcessThread
000007fef5b44458        2         3664 System.Object[]
000007fef3c71b90      291        16296 System.Diagnostics.ProcessModule
000007fef3c871c0      161        20608 System.Diagnostics.ProcessInfo

When I enable .prefer_dml 1 in Windbg I can click on the process instance and get a list of all 8 Process objects where I can look at each one. The -mt command with the Method Table (MT) as argument is the only way to get only Process objects and nothing else. This is exactly what the blue underlined links in Windbg do when you click on them.

   
0:033> !DumpHeap /d -mt 000007fef3c47718
         Address               MT     Size
0000000003b52658 000007fef3c47718      280     
0000000003c29970 000007fef3c47718      280     
0000000003c8b680 000007fef3c47718      280     
0000000003e192e0 000007fef3c47718      280     
0000000003e19d80 000007fef3c47718      280     
0000000003e1c818 000007fef3c47718      280     
0000000003e1ce90 000007fef3c47718      280     
0000000003e520d0 000007fef3c47718      280    

By clicking through the blue links I do not need to enter any command by hand which is quite convenient.

0:033> !DumpObj /d 0000000003b52658
Name:        System.Diagnostics.Process
MethodTable: 000007fef3c47718
EEClass:     000007fef38a2238
Size:        280(0x118) bytes
File:        C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System\v4.0_4.0.0.0__b77a5c561934e089\System.dll
Fields:
              MT    Field   Offset                 Type VT     Attr            Value Name
000007fef5b97de0  4002c1e       f0       System.Boolean  1 instance                1 haveProcessId
000007fef5b992b8  4002c1f       d8         System.Int32  1 instance             5848 processId
000007fef5b97de0  4002c20       f1       System.Boolean  1 instance                0 haveProcessHandle
000007fef3c4f2c8  4002c21       20 ...SafeProcessHandle  0 instance 0000000000000000 m_processHandle
000007fef5b97de0  4002c22       f2       System.Boolean  1 instance                0 isRemoteMachine
000007fef5b96508  4002c23       28        System.String  0 instance 0000000002b347d0 machineName
000007fef3c871c0  4002c24       30 ...stics.ProcessInfo  0 instance 0000000000000000 processInfo
000007fef5b992b8  4002c25       dc         System.Int32  1 instance          2035711 m_processAccess
000007fef3c76470  4002c26       38 ...sThreadCollection  0 instance 0000000000000000 threads
000007fef3c7d090  4002c27       40 ...sModuleCollection  0 instance 0000000000000000 modules
000007fef5b97de0  4002c28       f3       System.Boolean  1 instance                1 haveMainWindow
000007fef5b9a338  4002c29       b8        System.IntPtr  1 instance          11505b4 mainWindowHandle

The main question was who was owning the window handle. While examining all NativeWindow instances which wrap a window handle I could not find the handle in use by any managed window. In that case I want to search the complete address space which can be non trivial in a multi GB x64 application.

Luckily there exists some not so well known Windbg command to help you out. It is the !address command which shows you all allocated regions of memory (mapped files, loaded dlls, thread stacks, native heaps, …). It has also scripting support to execute a command for every address region. Now I only need to search for the window handle for every allocated memory region. %1 is the start address and %2 is the length of the memory region.

0:033> !address -c:"s -d %1 %2 11505b4"
00000000`00df6560  011505b4 00000000 00000004 00000000  ................
00000000`03b52710  011505b4 00000000 00000000 00000000  ................
00000000`28b57370  011505b4 00000000 0000200e 00000000  ......... ......
00000000`28ce4568  011505b4 00000000 000100d4 00000000  ................
00000000`4c603798  011505b4 00000000 00010052 00000000  ........R.......

Then I search one by one who references the pointers to the window handle

0:033> !address -c:"s -d %1 %2 28b57370"
00000000`1b6d2220  28b57370 00000000 000030f8 00000000  ps.(.....0......
00000000`1b6d2740  28b57370 00000000 000030f8 00000000  ps.(.....0......
00000000`1b6e5740  28b57370 00000000 00000000 00000000  ps.(............
00000000`1b706530  28b57370 00000000 00000003 00000000  ps.(............
00000000`203387e8  28b57370 00000000 00001fd7 00000000  ps.(............
00000000`20355450  28b57370 00000000 00000010 00000000  ps.(............
00000000`20448af8  28b57370 00000000 b60115d0 ffffffff  ps.(............
00000000`204854b0  28b57370 00000000 c15f5308 000007fe  ps.(.....S_.....
00000000`204d6a90  28b57370 00000000 00000000 00000000  ps.(............
00000000`28b53290  28b57370 00000000 00000000 00000000  ps.(............
00000000`2bb8f7a0  28b57370 00000000 00000000 00000000  ps.(............
00000000`2c0ffc40  28b57370 00000000 00000001 00000000  ps.(............

Knowing that some thread must hold the window handle I have searched the call stacks with method arguments for all stack locations. The memory region marked red is a region inside a thread stack.

!address
+        0`2b790000        0`2bb8c000        0`003fc000        Stack      [~33; 16d8.2bbc]
+        0`2bb8c000        0`2bb90000        0`00004000        Stack      [~33; 16d8.2bbc]

The chances are high that this thread is the owner of the window handle. From the call stack it gets obvious:

0:051> ~33s
ntdll!NtWaitForMultipleObjects+0xa:
00000000`77c8186a c3              ret
0:033> kb
# RetAddr           : Args to Child                                                           : Call Site
00 000007fe`fdb01430 : 00000000`00000000 00000000`004dbe58 00000000`004dc200 00000000`77c8592e : ntdll!NtWaitForMultipleObjects+0xa
01 00000000`77a31723 : 00000000`2bb8f820 00000000`2bb8f810 00000000`00000000 00000000`2bb8f860 : KERNELBASE!WaitForMultipleObjectsEx+0xe8
02 00000000`77b48f7d : 00000000`2bb8f8a0 00000000`000927c0 00000000`00000001 00000000`000010f4 : kernel32!WaitForMultipleObjectsExImplementation+0xb3
03 00000000`77b462b2 : 00000000`2bb8f9f0 00000000`20481fb0 00000000`00000001 00000000`2bb8f9e0 : user32!RealMsgWaitForMultipleObjectsEx+0x12a
04 000007fe`ff67acd6 : 00000000`2bb8f9e0 00000000`00000000 00000000`2bb8f9f0 00000000`20481fb0 : user32!MsgWaitForMultipleObjectsEx+0x46
05 000007fe`ff79a2a2 : 00000000`80010115 00000000`2bb8faf0 00000000`80010115 00000000`203f1c22 : ole32!CCliModalLoop::BlockFn+0xc2
06 000007fe`c11c1e6d : 00000000`250ab220 00000000`20481f50 00000000`00000000 00000000`20481fb0 : ole32!CoWaitForMultipleHandles+0x102
07 000007fe`c11a33ba : 00000000`00000000 00000000`20481f50 00000000`00000000 00000000`00000000 : mshtml!CDwnTaskExec::ThreadExec+0x153
08 00000000`77a259ed : 00000000`00000000 000007fe`c10d0000 00000000`00000000 00000000`00000000 : mshtml!CExecFT::ThreadProc+0x4e
09 00000000`77c5c541 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0xd
0a 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x1d

It was a hosted Internet Explorer window which has become for some tenth of a second active causing some code to break which assumed that the MainWindowHandle the main window of an application causing sporadic hangs on some automated test agents.

There is one more thing. When you are working in a bigger software shop you have some form of continuous integration and automated testing after a build has succeeded. Usually the tests are automatically deployed on several test agents to run tests in parallel. This makes it a challenge to debug sporadically failing / hanging tests. What do you do? Try to repro on your machine, add tracing, run the tests in a loop in isolation? There is one thing you can do get down to the root cause of sporadically failing tests without all of that.

[Test]
public void Test()
{
    try 
    {
         // blahh bad test 

    }
    catch(Exception)
    {
        ProcessDumper.DumpProcess(); // Take a dump of your own process or several more if you need to
        throw;
    } 
}

Too easy? If you have some hung tests which only fail 1/50 times and you cannot attach a debugger or add tracing because the error goes away if you change the timing automatically taking memory dumps when the test fails can do miracles. This only makes sense if you can open the dump in Windbg. Even long time developers do not want to get into this arcane debugging stuff because "this happens only once a year I do not want to learn all that strange stuff!". I can assure you you will constantly take dumps once you know how to read them because it is the fastest and most efficient way to diagnose random failures without much guessing what the probable root cause could be.

For your test you can simply add procdump from the SysInternals suite as resource, write the file to %temp% and you are ready to create one or more dump files.

I am a huge fan of as much data as I can get to nail a hard to find issue. Memory dumps are an important part of gathering forensic data of crashes and hangs.

If you still do not like Windbg here is a secret: You can get quite far with Visual Studio 2013 and its latest Service Packs! The only thing you must have are matching pdbs from a symbol server or your drop folder. The parallel stacks window in Visual Studio is a great way to check a memory dump with many threads. Besides this Windbg is not really good at resolving STL containers correctly in release builds.

The following code will cause an unhandled exception in C++ in release builds because of a double free call:

#include "stdafx.h"
#include <comutil.h>

bstr_t *g_globalVar;

DECLSPEC_NOINLINE void Allocate()
{
    g_globalVar = new bstr_t(L"Hello world");
    delete g_globalVar;
    _tprintf(L"Created and deleted");
}

DECLSPEC_NOINLINE void FreeASecondTime()
{
    delete g_globalVar;
    _tprintf(L"Deleted a second time");
}

int _tmain(int argc, _TCHAR* argv[])
{
    ::Sleep(10 * 1000); // Give some time to attach dumper
    Allocate();
    FreeASecondTime();

    return 0;
}

With procdump you can create a dump for e.g. first chance exceptions (you can event filter for specific exception types)

C:\>procdump -accepteula -ma -e 1  DoubleFree c:\temp\DoubleFreeDump.dmp

ProcDump v6.00 - Writes process dump files
Copyright (C) 2009-2013 Mark Russinovich
Sysinternals - www.sysinternals.com
With contributions from Andrew Richards

Process:               DoubleFree.exe (5872)
Number of dumps:       1
Hung window check:     Disabled
Exception monitor:     First Chance+Unhandled
Exception filter:      *
Terminate monitor:     Disabled
Dump file:             c:\temp\DoubleFreeDump_YYMMDD_HHMMSS.dmp


Press Ctrl-C to end monitoring without terminating the process.

[23:35:48] Exception: C0000005.ACCESS_VIOLATION

First-chance Exception.
Writing dump file c:\temp\DoubleFreeDump_140929_233548.dmp ...
Writing 10MB. Estimated time (less than) 1 seconds.
Dump written.

Dump count reached.

Now you use VS2013 with File - Open menu to open the dump file:

image

 

Now you need to press Debug with Native Only to start "debugging" and open the Call Stack Window to find the second free call:

image

 

This works with unmanaged, managed and mixed code as well. A very useful VS addition is the Parallel Stacks window which can help you a lot to find hangs in your code. For our single threaded application it looks not terribly useful but if you have 100 threads it can help you a lot.

image

What can you do if you do not have the pdbs because the build did run, test did fail, binaries and pdbs are deleted after a few days and you only have a memory dump left? It is not as bad as it sounds. Since you know the baseline of your software you can rebuild the dlls and pdbs on your local machine.

When you load the dump initially and try to get a stack trace you get:

0:000> kb
ChildEBP RetAddr  Args to Child              
WARNING: Stack unwind information not available. Following frames may be wrong.
00c8f430 003f1d05 00c8f5f0 00c8f7cc 7f82f000 DoubleFree+0x11b9c
00c8f510 003f180b 00c8f6d0 00c8f7cc 7f82f000 DoubleFree+0x11d05
00c8f5f0 003f191b 00c8f7c4 00c8f7cc 7f82f000 DoubleFree+0x1180b
00c8f6d0 003f1b0b 00000001 00c8f898 00c8f7cc DoubleFree+0x1191b
00c8f7c4 003f1d6c 00000000 00000000 7f82f000 DoubleFree+0x11b0b
00c8f898 003f2fe9 00000001 00e6cad8 00e6cb60 DoubleFree+0x11d6c
00c8f8e8 003f31dd 00c8f8fc 7717919f 7f82f000 DoubleFree+0x12fe9
00c8f8f0 7717919f 7f82f000 00c8f940 7744a22b DoubleFree+0x131dd
00c8f8fc 7744a22b 7f82f000 e2df5dd0 00000000 kernel32!BaseThreadInitThunk+0xe
00c8f940 7744a201 ffffffff 7743f20b 00000000 ntdll!__RtlUserThreadStart+0x20
00c8f950 00000000 003f10d2 7f82f000 00000000 ntdll!_RtlUserThreadStart+0x1b

Not terribly useful. But if you add to your symbol path your output folder where you newly built pdb resides.

For this test I have converted the project to a managed C++ project for reasons you will see shortly:

.sympath+  D:\Source\DoubleFree\Debug\

and then force Windbg to load it even if the pdb timestamps do not match:

.reload /i DoubleFree.exe

you get full fidelity with line numbers back:

0:000> !ClrStack
OS Thread Id: 0xd74 (0)
Child SP       IP Call Site
00e3f2f0 7743cd7c [InlinedCallFrame: 00e3f2f0] Unknown
00e3f260 00fe1522 <Module>._bstr_t.Data_t.Release(Data_t*) [c:\program files (x86)\microsoft visual studio 12.0\vc\include\comutil.h @ 773]
00e3f270 00fe1791 <Module>.FreeASecondTime() [c:\program files (x86)\microsoft visual studio 12.0\vc\include\comutil.h @ 639]
00e3f27c 00fe1181 <Module>.wmain(Int32, Char**) [d:\source\doublefree\doublefree.cpp @ 28]
00e3f2f0 00f5a135 [InlinedCallFrame: 00e3f2f0] 
00e3f2ec 00fe112e DomainBoundILStubClass.IL_STUB_PInvoke()
00e3f2f0 73dc2552 [InlinedCallFrame: 00e3f2f0] <Module>._wmainCRTStartup()
00e3f498 73dc2552 [GCFrame: 00e3f498] 

Unfortunately Visual Studio has no way to load mismatched pdbs even if you have rebuilt the dll and pdb from the same source code base.

 

image

But all is not lost. You can get Visual Studio to load mismatched pdbs:

image

How? There is a little tool called chkmatch which modifies your "new" pdb to match and "old" binary. When you have a managed application you can save the dll to e.g. c:\temp with the managed debugger extension command !SaveModule in Windbg which is the reason why I switched from a pure C++ to a managed C++ application to make it possible to dump a dll from memory to disc. The .writeMem command did not work so I had to revert to a managed host process to debug a pure C++ issue.

Then you can make the "old" exe matching with your new pdb:

C:\Windows\system32>chkmatch -m c:\temp\DoubleFree.exe c:\temp\DoubleFree.pdb
ChkMatch - version 1.0
Copyright (C) 2004 Oleg Starodumov
http://www.debuginfo.com/

Executable: c:\temp\DoubleFree.exe
Debug info file: c:\temp\DoubleFree.pdb

Executable:
TimeDateStamp: 5429d867
Debug info: 2 ( CodeView )
TimeStamp: 5429d867  Characteristics: 0  MajorVer: 0  MinorVer: 0
Size: 68  RVA: 00007aec  FileOffset: 00006eec
CodeView format: RSDS
Signature: {36317c4a-a90c-482a-a8a7-c96ac8050a2e}  Age: 1
PdbFile: D:\Source\DoubleFree\Release\DoubleFree.pdb
Debug info: 12 ( Unknown )
TimeStamp: 5429d867  Characteristics: 0  MajorVer: 0  MinorVer: 0
Size: 20  RVA: 00007b30  FileOffset: 00006f30

Debug information file:
Format: PDB 7.00
Signature: {7cec4b85-0aca-4680-b797-82ead65f95ad}  Age: 1

Writing to the debug information file...
Result: Success.

Viola. You have matching pdbs and file and line numbers back to debug your dump file with Visual Studio. A common scenario where this is useful is that if your build stores only the binaries in a repository but does not upload for build performance reasons the pdbs. When you get dumps from failed test runs from some test machines you have perhaps still the binaries but not the pdbs. In that case Windbg or the chkmatch tool can help you out to get back file and line information from otherwise unreadable call stacks.

I know that the VS product group does not want a "load mismatched pdbs" feature in VS but it would be very useful if you have a dump but not exactly matching symbols. At least for the guys which do not want to learn Windbg for such "rare" cases.

posted on Monday, September 29, 2014 10:50 AM