Socket I/O timeout not reported to app, breaking it
If a socket I/O timeout happens, libsoup HTTP/2 implementation paths are never calling the async job finish, consumes all max-connections-per-host with broken connections and then never successfully posts anything to that host within the SoupSession.
The problem goes away when passing env SOUP_FORCE_HTTP1=1
.
Using the following reproducer with libsoup-3.0.3: http2-wait-bug.c
It periodically posts to a given URL and reports when a response is received back. If now the server doesn't respond within timeout (e.g. it just took longer to do something or due to some other issues ends up with a proxy timeout), it never recovers with HTTP/2. This can be trivially reproduced by posting to a server one is in control of and then just adding a sleep longer than 5 seconds into the request handler. If the sleep isn't there at the start of running the reproducer, it starts out fine. Leaving it running and editing the request handler on the server to sleep some longer time than the libsoup configured timeout, e.g. 60 seconds, the libsoup sample breaks down after receiving max-connections-per-host amount of socket timeouts:
(process:312323): libsoup-http2-DEBUG: 23:58:23.122: [C2-S0] [-] [SESSION] IO error: Socket I/O timed out
Posting to server - 5
Posting to server - 6
Posting to server - 7
Posting to server - 8
Posting to server - 9
So it detects the timeout fine, but never calls on_result
in the sample and new requests stop being sent out (after both configured max-connections-per-host are in this state) as it can't find a connection to use. The libsoup debug spam just ceases as it's stuck in trying to get a connection to serve it. I did a lot of debugging on that on top of an older version, but it probably isn't useful to report those findings here, as the core problem appears to be that it doesn't propagate the I/O error up as needed, and I didn't get deep with that line of debugging.
If forcing HTTP1, it reports the error fine:
Failed to send data - 4: Socket I/O timed out
and never stops trying again on_interval
. If the server is modified back to not artificially sleep, it also recovers fully — eventually. However it takes much longer than I'd expect, I can see some 5+ new posts done that eventually give "Socket I/O timed out" after I've already removed the sleep. So there may be some issues in the HTTP/1.1 code paths as well for this.