Skip to content

Wip/kill x on login too plz

Ray Strode requested to merge wip/kill-x-on-login-too-plz into master
<didrocks> jadahl: hey! Some news on my gdm issue I was mentionning yesterday. So, it seems that mutter is making the GNOME Shell gdm process restarting in a loop after logging in (once the X server is paused): gdb stacktrace attached on https://gitlab.gnome.org/GNOME/gdm/issues/429#note_344826
<gitlab-bot> GNOME issue 429 in gdm "Regression in 3.30.1 (from 3.30.0): gnome-shell process in gdm is spiking at 100% once logged in" [Opened]
<didrocks> would you have any idea what could cause this?
<jadahl> didrocks: didy ou ask halfline?
<jadahl> i don't know much about gdm
<didrocks> jadahl: yes, basically that's what leaded to GNOME Shell restarting in a loop
<jadahl> what is?
<didrocks> and for $reason, now that I fallback to Xorg instead of wayland (nvidia blob driver), once logged in, GNOME Shell restarts continously, hitting my CPU hard
<jadahl> the udev rule?
<jadahl> right, but I don't think I know more about gdm and how it starts things more than you I'm afraid
<jadahl> the same issue could be triggered with a WaylandEnable=false entry in custom.conf ?
<didrocks> ok, as this was mutter calling meta_restart on the COGL error, I thought that was more on the Shell side than gdm
<didrocks> yes
<jadahl> afaik the udev rule is just a dynamic way of doing that
<jadahl> oh, it does?
<didrocks> basically it's "once logged in, if GNOME Shell (gdm) is running on Xorg"
<didrocks> yes, see my last comment on the bug report (the first 5 lines)
<didrocks> basically cogl_get_graphics_reset_status() doesn't return COGL_GRAPHICS_RESET_STATUS_NO_ERROR nor COGL_GRAPHICS_RESET_STATUS_PURGED_CONTEXT_RESET, goes to the "default" case -> meta_restart
<didrocks> and so on, and so on…
<didrocks> (only happens once I switched to my user session though)
<jadahl> aha
<jadahl> so its the context-lost handling going crazy then
<didrocks> looks like so…
<didrocks> jadahl: FYI, I have a gdb shell opened right now, tell me if you have time or if I can help for anything
<jadahl> didrocks: I suppose what we need is a wayt to tell if its a context lost because of gdm switch away, or if its a context lost that should be handled
<jadahl> can't say I know at hand how to make that distinction
<didrocks> makes sense (surprised though that it doesn't seem to hit everyone fallbacking to Xorg for gdm). Is it something that should be discussed with halfline?
<halfline1> check if its foreground vt i guess...
<jadahl> didrocks: only nvidia has the context lost thing
<didrocks> jadahl: Laney has a nvidia card as well, he checked that he's fallback to Xorg as well but doesn't have that issue apparently, which is puzzling…
<halfline1> i wouldnt be surprised if theres a driver bug involved
<didrocks> yeah, I think that I'm on the "unlucky card triggering this nvidia driver bug" case :p
<csoriano> so... do I understand correctly that no gjs code can be executed in different threads?
<csoriano> so whenever I do a glib async call, the operation within is actually running in the main loop?
<halfline> csoriano: nope
<halfline> all javascript code is run on the main thread
<csoriano> so say I want to perform some threaded expensive code, I would need to make it in C and expose it to JS?
<halfline> the c gtask itself or whatever is probably running on a thread though
<halfline> yea
<csoriano> ayy
<csoriano> thanks
<halfline> unless you can break it into chunks and process it a little at a time
<halfline> didrocks: hmm
<halfline>       /* The ARB_robustness spec says that, on error, the application•
<halfline>          should destroy the old context and create a new one. Since we•
<halfline>          don't have the necessary plumbing to do this we'll simply•
<halfline>          restart the process. Obviously we can't do this when we are•
<halfline>          a wayland compositor but in that case we shouldn't get here•
<halfline>          since we don't enable robustness in that case. */•
<halfline> so ... i guess just ignoring the error isn't really an option
<halfline> well maybe it's an option but it's an option that might come back and bite us :-)
<halfline> so taking a step back
<halfline> ideally gnome-shell wouldn't be running at all
<halfline> it's for the login screen
<halfline> we already kill the login screen session for wayland
<halfline> we can't completely kill the login screen for X because we can't kill X in the background
<halfline> but i suppose we could kill gnome-shell
<halfline> and just bring gnome-shell back when the user switches back
<halfline> then we get most of the benefit we have on wayland
<halfline> and we avoid driver bugs like this
<heftig> btw, why can't you kill the greeter just before the user's x server launches?
<halfline> flicker
<halfline> i mean since wayland is the default now we could just live with flicker i guess
<halfline> for the fallback case
<heftig> yeah, my thoughts right now too
<heftig> I already get flickering on my nvidia box on login already with the current state
<halfline> you do ?
<heftig> yeah.
* halfline boggles
<halfline> like it shows a VT in between ?
<heftig> it goes black, a second later there's an orange flash, another second of black, then shell starts
<halfline> orange?!
<heftig> yeah, orange.
<halfline> wtf
<halfline> where is orange coming from
<heftig> i blame the driver
<halfline> hmm
<heftig> it's efifb for vt + nvidia for x (no modeset)
<halfline> if you boot with fbcon=map:9  to the kernel command line (so you don't have VTs mapped) does it still do the orange ?
<halfline> another idea
<halfline> we might be able to coax X to be able to die in the background if we start it with -novtswitch.  might not work for some older drivers, but i think there's a good chance it would work for modesetting based drivers and maybe nvidia
<heftig> halfline: i'll try that when i'm back home
<didrocks> the second options sounds more risky, no?
<halfline> maybe, dunno until we try it :-)
<didrocks> worth a try anyway, indeed :)
<halfline> i did have a look through most of the drivers recently for a rhel bug and most of the check vtSema on shutdown and avoid code paths if vt switched away
<didrocks> interesting
<halfline> the one glaring exception was qxl
<halfline> (but it had the opposite problem, was missing said code completely)
<halfline> didrocks: i just sketched out the change, will push to a branch for now
<halfline> not going to try it at this moment, i have a rhel install i gotta start and need to go through some rhel bugs
Edited by Ray Strode

Merge request reports