WinDbg loses connection debugging over network, and target machine freeze
Asked Answered
B

5

16

I'm trying to get WinDbg debugging over the network to work, but it always loses connections after I break into the debugger (Debug->Break), and then try to start it again (Debug->Go). However, if I never break into the debugger, it looks like the connection is stable for an 'N' period of time. I can even see debug print statements in WinDbg as I use the target system during this grace period. Moreover, It seems like the connection is good while in debug break, because I can gather information from the target system. I use "!ustr srv!SrvComputerName" to get the target computer name, and it returns the correct name. Any help would be much appreciated.

Setting up the systems: I followed instructions from MSDN website to setup my target and host systems.

Debugging: Below are my attempts to resolve this issue.

  1. Disabling Flow Control, and using Half Duplex mode, on the network adapter. I tried this after reading this post: WinDbg, host machine lose network if test machine is on the same switch
  2. Buying new network adapters. According to this webpage, my network adapter should support network kernel debugging. However, further investigation shows that vendors have a bad habit of not updating their device IDs, so I decided to rule out this possibility by buying new adapters from different vendors.
  3. Changing network port. I've tried a hand full of different network ports (49152-65535) just in case one of them is being used for a different purpose.
  4. Unplugging the Ethernet cable, and then plug it back in. Once the connection has been lost, I tried this hoping it would re-establish connection.
  5. Rebooting the target system. Same reason as #4.
  6. Changing PCIe ports. I'm running out of options.
  7. Moved host system to a different network switch. No change.

Observation:

  1. Wireshark shows that the target system sends a UPD packages to the host system as soon as the system boots up, but the host system does not respond until WinDbg is launched. More interestingly, the target system continue sending UPD packages to host even after the target system has become unresponsive. Unfortunately, I don't understand the UPD packages data.
  2. WinDbg can consistently re-establish connection with target system, if restarted. The target system seems to be stuck in debug break.

System Info: The host system is running Windows 8.1 Pro. The target system is running a Windows 8.1 Enterprise Evaluation (8GB of RAM).

WinDbg print out:

Microsoft (R) Windows Debugger Version 6.3.9600.17237 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.

Using NET for debugging
Opened WinSock 2.0
Waiting to reconnect...
Connected to target **.**.*.*** on port ***** on local IP **.**.*.***
Connected to Windows 8 9600 x64 target at (Fri Mar 27 18:58:06.217 2015 (UTC - 7:00)), ptr64 TRUE
Kernel Debugger connection established.

************* Symbol Path validation summary **************
Response                         Time (ms)     Location
Deferred                                       srv*C:\Symbols*http://msdl.microsoft.com/download/symbols
Symbol search path is: srv*C:\Symbols*http://msdl.microsoft.com/download/symbols
Executable search path is: 
Windows 8 Kernel Version 9600 MP (4 procs) Free x64
Product: WinNt, suite: TerminalServer SingleUserTS
Built by: 9600.17031.amd64fre.winblue_gdr.140221-1952
Machine Name:
Kernel base = 0xfffff801`00e70000 PsLoadedModuleList = 0xfffff801`0113a2d0
Debug session time: Fri Mar 27 18:58:06.918 2015 (UTC - 7:00)
System Uptime: 0 days 0:47:15.869
Break instruction exception - code 80000003 (first chance)
*******************************************************************************
*                                                                             *
*   You are seeing this message because you pressed either                    *
*       CTRL+C (if you run console kernel debugger) or,                       *
*       CTRL+BREAK (if you run GUI kernel debugger),                          *
*   on your debugger machine's keyboard.                                      *
*                                                                             *
*                   THIS IS NOT A BUG OR A SYSTEM CRASH                       *
*                                                                             *
* If you did not intend to break into the debugger, press the "g" key, then   *
* press the "Enter" key now.  This message might immediately reappear.  If it *
* does, press "g" and "Enter" again.                                          *
*                                                                             *
*******************************************************************************
nt!DbgBreakPointWithStatus:
fffff801`00fcab90 cc              int     3
0: kd> g
... Retry sending the same data packet for 64 times.
The transport connection between host kernel debugger and target Windows seems lost.
please try resync with target, recycle the host debugger, or reboot the target Windows.
... Retry sending the same data packet for 128 times.
... Retry sending the same data packet for 192 times.

At this point WinDbg is no longer responsive, and continue sending data packets. The target system is also non-responsive.

Brok answered 28/3, 2015 at 3:18 Comment(3)
Don't post solutions in the question. You can post an answer to your question if you like.Vincents
Is that the protocol?Brok
Yes, it is. stackoverflow.com/help/self-answerVincents
B
5

I finally solved this problem by switching the host system. In the beginning, I thought the target system was the problem, because MSDN only put the NIC debug requirement on the target system. It appears that there might be requirements placed the host system as well.

New host system: Desktop (Identical to target system)

  • OS: Windows 8.1 Enterprise Evaluation x64
  • NIC: VEN_10EC&DEV_8168

Previous host system: Laptop

  • OS: Windows 8.1 Pro x64
  • NIC: VEN_8086&DEV_1502

NOTE: I don't really know the root cause. Both NICs are on the Supported Ethernet NICs list, I used the same WinDbg version that came with the WDK, and all systems are on the same switch.

Brok answered 1/4, 2015 at 0:52 Comment(1)
Exactly same experience for me. My target is Windows 10 x64 with NIC: VEN_8086&DEV_1502. When I used Dell laptop as host with VEN_10EC&DEV_8168 my connection would fail as described in OP. When I used a different PC as host with VEN_8086&DEV_100F, then the connection worked fine. No changes to target necessary (except changing hostip setting, of course).Feinberg
I
6

I found a simpler solution that worked for me in VMware, The problem is in vmware - the NAT connection has a 30 seconds timeout. This value is configurable. Go to edit -> virtual network editor -> change settings (uac prompt) -> select NAT in the list -> NAT settings -> UDP timeout. The max value is 32767, the default (for me) was 30 seconds. It completely solved my problem.

Incomplete answered 14/3, 2019 at 13:5 Comment(0)
B
5

I finally solved this problem by switching the host system. In the beginning, I thought the target system was the problem, because MSDN only put the NIC debug requirement on the target system. It appears that there might be requirements placed the host system as well.

New host system: Desktop (Identical to target system)

  • OS: Windows 8.1 Enterprise Evaluation x64
  • NIC: VEN_10EC&DEV_8168

Previous host system: Laptop

  • OS: Windows 8.1 Pro x64
  • NIC: VEN_8086&DEV_1502

NOTE: I don't really know the root cause. Both NICs are on the Supported Ethernet NICs list, I used the same WinDbg version that came with the WDK, and all systems are on the same switch.

Brok answered 1/4, 2015 at 0:52 Comment(1)
Exactly same experience for me. My target is Windows 10 x64 with NIC: VEN_8086&DEV_1502. When I used Dell laptop as host with VEN_10EC&DEV_8168 my connection would fail as described in OP. When I used a different PC as host with VEN_8086&DEV_100F, then the connection worked fine. No changes to target necessary (except changing hostip setting, of course).Feinberg
D
2

I had the similar problem and solved it by using USB to Ethernet adapter on the host machine instead of in built NIC card.

Disbursement answered 9/2, 2016 at 12:36 Comment(1)
Can you please share the specific vendor, and product?Brok
T
0

I also met this issue, and found that when I try to force shutdown VMWare OS, the windbg connection seems recover before VMWare OS is actually closed. After several tries, I found a weird solution:

When the windbg connection between host and VMWare guest lost, try to click "shutdown VMWare Guest", but DO NOT really confirm. And you may found that the windbg connection recovers! Then, cancel the shutdown.

It's very strange, seems VMWare itself blocked the network debugging connection lost. But at least it's a workaround worth trying.

Another workaround I tried, which sometimes work, is killing windbg in task manager, and re-run windbg and reconnect to VMWare guest. And may need wait seconds to minutes until it reconnects.

btw, my ethernet card is Intel Ethernet Connection I218-V.

Tavish answered 15/7, 2016 at 7:21 Comment(0)
I
0

The problem is with the host. If you don't want to change your host and continue debugging on it, you might want to try using a serial port. It gives better performance. Take a look at the following link for setting up debugging of a virtual machine over com port:

https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/attaching-to-a-virtual-machine--kernel-mode-?redirectedfrom=MSDN

Iterate answered 4/5, 2020 at 17:50 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.