"error in client communication" on samsung-m0
What problem did you encounter
When upgrading phoc from 0.17.1 to 0.20.0 (also 0.21.0) on samsung-m0, Phosh (0.17.0 and 0.21.0) does not start up anymore.
With G_MESSAGES_DEBUG=all
, I get in the log of phoc (and pid 1992 is phosh here):
phoc-wlroots-INFO: 22:09:47.806: [wayland] error in client communication (pid 1992)
The error comes from here: https://gitlab.freedesktop.org/wayland/wayland/-/blob/73468bab7d12fd4d7b62d3b50a56e77acd977607/src/wayland-server.c#L452-455
There's nothing useful in the phosh log, I'll attach larger segments of log below.
So my understanding is:
- phosh sends a message to phoc
- phoc tries to parse it with wlroots, which uses libwayland
- libwayland can't parse the message and runs
destroy_client_with_error
- phosh isn't aware that it sent a wrong message
- phoc closes phosh somewhat gracefully
- phosh accepts this and closes
- therefore there is no coredump (even after enabling them)
We only see this on the SIII, not on the SII or pinephone or any other device.
Some more information is here: https://gitlab.alpinelinux.org/alpine/aports/-/merge_requests/35027
@Newbyte narrowed it down to this patch in phoc which causes the change of behavior: phoc!329 (merged)
Unfortunately this is not suitable for bisect.
Given that the error in client communication comes from a message that phosh is sending, I think this is a bug in phosh.
How to reproduce
- buy a samsung-m0 (got one for 10 €)
- run phoc >= 0.20.0 on it with phosh >= 0.17.0 (e.g. by using postmarketOS edge)
- enable verbose logging, on pmOS: write
export G_MESSAGES_DEBUG=all
into~/.profile
What is the (wrong) result?
black screen, phosh closes
What is the expected behaviour?
phosh shows up
How are you running phosh?
-
Mobile phone (model: samsung-m0 , operating system version: postmarketOS edge)
Releveant logfiles
phosh 0.17.0 starting up properly with phoc 0.17.0:
...
dbus-daemon[1161]: Successfully activated service 'org.a11y.atspi.Registry'
SpiRegistry daemon is running with well-known name - org.a11y.atspi.Registry
(phosh:1149): phosh-idle-manager-DEBUG: 22:07:58.936: Acquired name org.gnome.Mutter.IdleMonitor
(phosh:1149): phosh-system-prompter-DEBUG: 22:07:58.936: acquired name: org.gnome.keyring.SystemPrompter
(phosh:1149): phosh-notify-manager-DEBUG: 22:07:58.936: Acquired name org.freedesktop.Notifications
(phosh:1149): phosh-session-manager-DEBUG: 22:07:59.011: Session is now active
(phosh:1149): phosh-wifiinfo-DEBUG: 22:07:59.037: Updating wifi status
...
phosh 0.17.0 startup fails with phoc 0.20.0 (or 0.21.0, not sure which one I tried here, but it should be the same):
...
dbus-daemon[2004]: Successfully activated service 'org.a11y.atspi.Registry'
SpiRegistry daemon is running with well-known name - org.a11y.atspi.Registry
(phosh:1992): phosh-idle-manager-DEBUG: 22:09:47.787: Acquired name org.gnome.Mutter.IdleMonitor
(phosh:1992): phosh-system-prompter-DEBUG: 22:09:47.793: acquired name: org.gnome.keyring.SystemPrompter
(phosh:1992): phosh-notify-manager-DEBUG: 22:09:47.793: Acquired name org.freedesktop.Notifications
phoc-wlroots-INFO: 22:09:47.806: [wayland] error in client communication (pid 1992)
(phoc:1945): phoc-gtk-shell-DEBUG: 22:09:47.806: Destroying gtk_shell 0xae5dc620 (res 0xae28ca50)
...
So "Session is now active" is replaced with the wayland error.
I also ran phosh through strace and found that it's doing some udev.conf parsing right before this happens. Looking at the code, udev is only used in the torch code of phosh? But I'm not sure if it's really coming from there:
open("/etc/udev/udev.conf", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 16
fcntl64(16, F_SETFD, FD_CLOEXEC) = 0
fcntl64(16, F_SETFD, FD_CLOEXEC) = 0
read(16, "# see udev.conf(5) for details\n\n"..., 1024) = 49
read(16, "", 1024) = 0
close(16) = 0
clock_gettime64(CLOCK_REALTIME, {tv_sec=1661902588, tv_nsec=430514912}) = 0
clock_gettime64(CLOCK_REALTIME, {tv_sec=1661902588, tv_nsec=431186828}) = 0
clock_gettime64(CLOCK_REALTIME, {tv_sec=1661902588, tv_nsec=432224117}) = 0
clock_gettime64(CLOCK_REALTIME, {tv_sec=1661902588, tv_nsec=432718449}) = 0
clock_gettime64(CLOCK_REALTIME, {tv_sec=1661902588, tv_nsec=433204157}) = 0
clock_gettime64(CLOCK_REALTIME, {tv_sec=1661902588, tv_nsec=433656031}) = 0
clock_gettime64(CLOCK_REALTIME, {tv_sec=1661902588, tv_nsec=434104863}) = 0
clock_gettime64(CLOCK_REALTIME, {tv_sec=1661902588, tv_nsec=434605362}) = 0
clock_gettime64(CLOCK_REALTIME, {tv_sec=1661902588, tv_nsec=435104986}) = 0
clock_gettime64(CLOCK_MONOTONIC, {tv_sec=1505, tv_nsec=603698862}) = 0
sendmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="5\0\0\0\10\0\f\0\2\0\0\0\5\0\0\0\0\0\20\09\0\0\0@\220\10\09\0\0\0"..., iov_len=248}], msg_iovlen=1, msg_control=[{cmsg_len=16, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, cmsg_data=[17]}], msg_controllen=16, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 248
close(17) = 0
poll([{fd=3, events=POLLIN}, {fd=5, events=POLLIN}, {fd=12, events=POLLIN}, {fd=15, events=POLLIN}], 4, 0) = 1 ([{fd=5, revents=POLLIN}])
read(5, "\5\0\0\0\0\0\0\0", 16) = 8
futex(0xb64db9b0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0xb64db9b0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0xb64db9b0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0xb64db9b0, FUTEX_WAKE_PRIVATE, 2147483647phoc-wlroots-INFO: 23:36:28.441: [wayland] error in client communication (pid 6901)
(phoc:6846): phoc-gtk-shell-DEBUG: 23:36:28.441: Destroying gtk_shell 0xae5f0840 (res 0xae2a0b90)
(phoc:6846): phoc-phosh-private-DEBUG: 23:36:28.441: Destroying phosh 0xaee8cca0 (res 0xae04f930)
(phoc:6846): phoc-server-DEBUG: 23:36:28.441: Shell state changed: 0
(phoc:6846): phoc-phosh-private-DEBUG: 23:36:28.441: Destroying private_keyboard_event 0xae25ef70 (res 0xae04fd40)
) = 0
I'm planning to analyze this further over the next days.
Also I wonder if this relates to: https://source.puri.sm/Librem5/phosh/-/issues/422