Skip to content

reduce noise during a crash

Ray Strode requested to merge wip/halfline/silence-x-io-errors into master
<adamw> mcatanzaro: mclasen: halfline: what do you guys think of https://bugzilla.redhat.com/show_bug.cgi?id=1556831 ? the reasoning kinda makes sense to me. is there a considered reason why shell explicitly aborts when it loses touch with wayland? could we change that so we don't get these fairly useless tracebacks?
<adamw> (assuming we'd get an xwayland crash report filed instead, which would likely be more useful)
<halfline> yea i presonally think it just adds noise
<halfline> same story on the other side
<mclasen> adamw: if it was easy to run without xwayland we would already do it. not sure it makes much of a difference which way we die
<halfline> the problem is whenever one side crashes both sides crash
<adamw> mclasen: the argument in the bug report is that shell should die in a way which doesn't cause abrt to kick in, basically
<halfline> and it takes effort to figure out which side crashed first
<halfline> we should suppress knock on crashes, since they're just noise not signal
<adamw> right, but this is the specific path where shell knows it lost connection to wayland...it's actually *intentionally written to abort* in that case
<adamw> it calls g_error("lost connection to xwayland") or whatever the message is, that's where we get all these abrt reports for "lost connection to xwayland" from
<adamw> there's a direct link to the line in the bug: https://gitlab.gnome.org/GNOME/mutter/blob/7e17dd00/src/wayland/meta-xwayland.c#L417
<adamw> that's what he's suggesting changing
<mcatanzaro> We had a WebKit bug recently where the web process intentionally aborted if it lost connection to the network process
<mcatanzaro> Which should only happen when the network process crashes
<mcatanzaro> But the network process was not crashing
<mcatanzaro> This bug has caused something like 2000 crashes in the past couple days
<mcatanzaro> We would never have known if we removed the web process abort
<mcatanzaro> The bug reporter was not impressed when I said the crash was intentional, and tried to convince me to change it to an exit() instead, but then we would have zero crash reports for this issue.
<adamw> mcatanzaro: the expectation here is we'd get reports for the *xwayland* crash
<mcatanzaro> adamw: Yes of course that's the expectation... that was the expectation in the WebKit case too, that we'd get reports for the network process crash
<adamw> where i'm coming from here is https://bugzilla.redhat.com/show_bug.cgi?id=1510059#c303
<adamw> that is the bug which *every single crash of this kind in f27* is currently considered a duplicate of by libreport
<halfline> mcatanzaro: i'd rather miss an occasional bug than get flooded with noise
<mcatanzaro> Clearly something needs to change, but it could just as easily be handled by ABRT
<halfline> doing what ?
<mcatanzaro> I guess making any changes to ABRT is probably too much to expect, though
<halfline> what change would you propose to make to abrt ?
<mcatanzaro> halfline: ABRT has logic somewhere to ignore expected crashes like this
<halfline> why would that be better?
<adamw> i have filed a satyr issue on this too
<halfline> if it's ignoring them
<halfline> versus them not happening ?
<mcatanzaro> I assume it could still count them, but not open a bunch of bugzilla bugs.
<adamw> but yeah, i agree with halfline, it doesn't seem obviously better to abort and then make libreport ignore the abort, versus just exiting
<mcatanzaro> Then if the count goes way up, we can say: hmmm, problem.
<halfline> mcatanzaro: what would the count tell you?
<halfline> yea but what problem?
<halfline> more likely the problem is Xwayland is crashing
<halfline> or something
<halfline> the count doesn't really help you
<halfline> since the Xwayland crash will get shown separately
<halfline> unless you're saying you look at the number of xwayland crashes and the count and see if tehre's a big discrepency ?
<halfline> we had a similar issue with gtk a while back btw
<adamw> yeah, the only problem i can see is if we for some reason *don't* get the xwayland crashes reported
<mcatanzaro> If Xwayland ever dies without leaving a core dump, or ABRT refuses to report the crash for whatever reason ("this backtrace is unusable" being a common culprit), then the XWayland crash won't be reported... anyway, it's fine either way, I'm just observing that we would have had a ton of trouble with this recent WebKit issue had we disabled the client process crash
<halfline> every time the display server went down every application would spam the log with a message saying as much
* adamw goes to look at xwayland crash reports, for that mayyer.
<halfline> totally not useful to see 50 apps all say "session is over" at the same time
<adamw> oh, yeah, we still get that with gnome :P
<adamw> but that's "just" logspam, at least it doesn't affect bugzilla.
<mcatanzaro> Ah good point, I forgot this happened once for every single application....

Merge request reports