I just saw that gupnp-universal-cp on Ubuntu 18.04.2 also hogs the CPU while the box is on the network. Among others, gupnp-universal-cp links against libgupnp, libgssdp, libsoup, and a whole bunch of GNOME libraries, but it doesn't use dLeyna. In addition, this program also uses 100% CPU time if a D-Link DIR-635 router is present. Blocking the FRITZ!Box and the router made gupnp-universal-cp behave well again.
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related.
Learn more.
I have done some more investigation using a small test program (see also the issue on GitHub linked from previous comment).
When calling gupnp_service_proxy_set_subscribed() (file libgupnp/gupnp-service-proxy.c) from my client code with subscribed set to TRUE, function subscribe() gets called, which calls soup_session_queue_message() in turn and sets subscribe_got_response() as callback. On success, that function is supposed to set the subscription ID (proxy->priv->sid).
While waiting for the message completion, emit_notifications() is repeatedly called (callback for an idle source constructed in server_handler()). First thing this function does is checking the proxy->priv->sid pointer, and returning TRUE if it is still NULL.
Now, for the problematic FRITZ!Box 7240, the subscribe_got_response() callback is called only after 1 minute. In the meantime, we have a busy loop which causes 100% CPU load. The message status is SOUP_STATUS_IO_ERROR in this case:
After that, the CPU load from my small test program goes down again.
I have tested with the current gupnp-1.0 branch (all other libraries used as shipped with Ubuntu 18.04.2), but I suspect that the problem persists in master as well since the code looks quite similar.
Maybe it would be possible to have a rate limit on emit_notifications() or some explicit synchronization to avoid the high CPU load while waiting for the subscription ID ?
Yes, exactly that code in Sven's comment is on the busy path which is burning my CPU cycles away. I take it that the SOUP_STATUS_IO_ERROR is equivalent to "subscribe message is canceled".
So here is a scenario. :) Is it possible to improve the situation?
The flow is slightly weird. The emit_notifications() idle function should not be triggered if there wasn't at least one notification from the client which should only happen if subscribe() has returned successful.
Can you re-test with environment variables G_MESSAGES_DEBUG=all and GUPNP_DEBUG=1 and attach the log?
Be aware that there are 3 different FRITZ!Box devices plus several other devices in the log. Subscription was only attempted for the FRITZ!Box 7240. The program ran for about 2 minutes, with the first minute at 100% CPU load.
Here are dumps for the D-Link DIR-635 when trying to subscribe to the "urn:schemas-upnp-org:service:Layer3Forwarding:1" service. CPU usage is at 100% for 40 seconds.
As soon as possible after the subscription is accepted, the publisher also sends the first, or
initial event message to the subscriber. This message includes the names and current values
for all evented variables. (The data type and range for each variable is described in a service
description. Clause 2, “Description” explains this in more detail.) This initial event message is
always sent, even if the control point unsubscribes before it is delivered. The device shallinsure that the control point has received the response to the subscription request beforesending the initial event message, to insure that the control point has received the SID
(subscription ID) and can thereby correlate the event message to the subscription.
I managed to reproduce it by hacking GUPnP to be non-compliant as well. My proposed fix seems to work, but I think that needs more checks regarding possible side effects and a UPnP test run...
Unfortunately, your change doesn't have the intended effect here. I am still at 100% CPU load.
To clarify, I have changed line 2204 reading
if (!proxy->priv->notify_idle_src) {
into
if (!proxy->priv->notify_idle_src || proxy->priv->sid == NULL) {
near the end of server_handler(). I am working on the gupnp-1.0 branch, not master, because I'd have to install more recent dependencies to get master to work. I could do that, however, if it helps, but the code doesn't look much different.
Also, gupnp-universal-cp still works the same way as it did when linking against the libgupnp shipping with Ubuntu, except it doesn't hog the CPU anymore (checked by setting LD_LIBRARY_PATH and comparing the outputs of original vs patched versions). It still shows the same set of ~50 UPnP services, including those of the FRITZ!Box and D-Link routers.
Is there anything I could test for you to confirm that this is the right fix?