Cannot switch default GPU for Mutter in a dual NVIDIA GPU configuration
Affected version
GNOME 44 (Wayland on NVIDIA's proprietary drivers, no extensions) on Fedora 38 Workstation
Bug summary
GNOME fails to initialize rendering of any desktop application when two NVIDIA GPUs are running on the same computer.
Steps to reproduce
- Have a system with two NVIDIA GPUs installed.
- Install Fedora 38 Workstation.
- Install the proprietary NVIDIA drivers from Fedora's software repository.
- Reboot.
What happened
The desktop will render just fine, but opening any program that makes use of GNOME's Mutter renderer will fail with the following errors on the journal:
gnome-shell[2329]: meta_window_set_stack_position_no_sync: assertion 'window->stack_position >= 0' failed
gnome-shell[2329]: WL: error in client communication
What did you expect to happen
The programs should open with no errors and render on the screens.
Relevant logs, screenshots, screencasts etc.
Specs:
AMD Ryzen 9 5900X
MSI MPG X570 Gaming Edge WiFi
NVIDIA GeForce GTX 1660 Super, with three screens attached to it
NVIDIA GeForce GTX 750 Ti
More info can be found on the existing issue at NVIDIA's egl-wayland repo on GitHub: https://github.com/NVIDIA/egl-wayland/issues/82
TL;DR of the GitHub issue, together with some guesses made with the NVIDIA developer
The system has two NVIDIA GPUs, a 1660 Super and a 750 Ti. The 1660 Super is the main GPU that controls all the displays, and is connected to the PCIe slot of the CPU. The 750 Ti is a secondary GPU, only used for NVENC/CUDA workloads with no display outputs attached to it, and is connected to the PCIe slot of the chipset.
Through some troubleshooting, we managed to understand that, because of how the computer initializes the device, the 750 Ti gets initialized first, so it will get priority as the main renderer when the desktop environment starts up.
I got recommended to mark the 1660 Super as the primary render device by setting the flag mutter-device-preferred-primary
through a udev rule. This however made no difference, the issue still persists, the 750 Ti is still the one that holds priority as OpenGL renderer.
This seems to be further corroborated by the fact that glxinfo
returns the 750 Ti as the default renderer:
OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: NVIDIA GeForce GTX 750 Ti/PCIe/SSE2
OpenGL core profile version string: 4.6.0 NVIDIA 530.41.03
Switching cards in the slots did fix the issue, but it's also not the best option as the PCIe slot from the chipset runs at 4x, effectively bottlenecking the stronger GPU, and, in my opinion, it's just a workaround to avoid the issue.