What happens if you ignore a X11 BadWindow error?

Asked 1/1, 2014 at 17:34 Answered 2/1, 2014 at 21:38

I have a legacy Motif application written in the early 1990s (I can't rewrite the UI in QT or even modify the app extensively without going through a time consuming safety assessment). This app used to run on AIX where it ran for weeks on end under intensive use and was rock stable. We have now ported it to Linux. During sustained Beta testing over long periods of time the app has crashed about once every week with the following message.

Error of failed request: BadWindow (invalid Window parameter)
Major opcode of failed request: 4 (X_DestroyWindow)

I have since learned that these errors can be ignored using a custom X11 error handler (the default X11 error handler just prints the error message and exits) as descibed here:

http://motifdeveloper.com/tips/tip29.html

I have implemented a custom X11 error handler that ignores BadWindow errors as described in that article. So my question is: Can somebody who knows more about X11 development and the inner workigns of an X server than I do enlighten me about whether or not BadWindow error can really be ignored like that?

P.S. I am going to try and debug this further by running our app in Synchronous mode but that's slow going because I have no way to reproduce this error on demand. Any tips about debugging BadWindow errors would also be appreciated.

Achates answered 1/1, 2014 at 17:34 Comment(2)

It depends on what do you mean by "can". Will it disconnect you from the internet and eat your lunch? Probably not. Will the application work as if nothing has happened? Probably not. The error by itself is not fatal, but it is a manifestation of some error in the application logic, and that error could well be severe – Demisemiquaver 1/1, 2014 at 17:38

So far I have seen no ill effects but I'm going to have it exhaustively tested. Plus this happens only on Linux, not AIX and the UI has not been subjected to major changes for years. – Achates 1/1, 2014 at 18:50

If your program consists of a single process (single connection to the X display) then this error will almost always reflect a bug in the program.

The secret to know is how to debug it. Because Xlib is async, the XDestroyWindow() will fire-and-forget, some post-destroy operation on the window may also fire-and-forget, and you get the error at some future time (during some other unrelated X call). This means a stack trace from the X error is meaningless and it's hard to debug.

To fix this, call XSynchronize(dpy, True) to force all calls to be synchronous. This will make the app slow so don't leave it on in production. http://www.x.org/releases/X11R7.6/doc/man/man3/XSynchronize.3.xhtml

But in synchronized mode, if an Xlib call uses a bad window it will fail immediately. So you can set a debug breakpoint, for example on your error handler function, and get a meaningful backtrace. That should show you which Xlib call causes the problem - and hopefully it will be clear whether it's a double-delete of a widget, using a destroyed widget, or what.

If your app does have multiple processes or multiple display connections, such as in a window manager, then it's possible for a BadWindow to be unavoidable (if you try to mess with another app's window, then there's an unavoidable race where the other app's window might be destroyed). In that case, ignoring BadWindow is the correct solution, though best practice is to ignore it only during those calls which are known to trigger it, so you still can get errors that might be bugs. A common idiom for this is to implement an error_trap_push()/error_trap_pop() which just install and de-install your error handler which ignores errors. Push an error trap when you're touching an external window which could be deleted outside of your control.

Anemometer answered 2/1, 2014 at 21:38 Comment(1)

Thanks a bunch for your answer. I'll try to run the app synchronised, have it tested and see what that reveals. – Achates 3/1, 2014 at 22:15

This looks like a button (or similar UI element) being deleted more than once. Typically, buttons are implemented as dedicated windows, with the button graphic emitted in it, that way you can simply tie teh callback handler to a click event in the associated window.

The error says your program has tried deleting a non-existing window id and the easiest way for taht to happen is indeed that it's been deleted twice (alternatively, something's changed the ID recorded for an UI element somewhere).

At this point, you do not want to ignore the error, you want to get sufficient logging in place to figure out where the problem with your application lies.

Yasui answered 1/1, 2014 at 18:2 Comment(3)

What bugs me about this is that this is that these errors did not occur on AIX, only on Linux so I was thinking it must be an implementation difference in the X servers. – Achates 1/1, 2014 at 18:46

@osxnerd Could be a difference in the underlying libraries, as well. Worst case, you can end up with UI that thinks it has clickable widgets up, but they're actually deleted, making the application unusable for the end-user. So, instrument it with sufficient debug logging to find the error and eliminate it. – Yasui 1/1, 2014 at 19:23

If the error is intermittent it's probably a race condition of some kind. All kinds of random unrelated differences in the platforms could result in triggering it or not because they might affect timing. Could also be a different Motif behavior or something, or as you say a different X server. – Anemometer 3/1, 2014 at 13:41

In this case, the error is telling you that your program requested to destroy a window id that doesn't exist. If you ignore it, then you may have a leak of whatever window you really intended to destroy; or you may simply be trying to destroy the same window id twice, and nothing will change. Without tracking down the root cause of why your program is calling XDestroyWindow with an invalid id, it's difficult to say what happens if you just ignore it.

Lepto answered 2/1, 2014 at 21:24 Comment(0)

Recommended topics

Hot tags