Feature: per-host or global download delay
SoupSession
already has max-conns
and max-conns-per-host
properties, which is great.
I'd like it to also have something like Scrapy's DOWNLOAD_DELAY
setting, which delays starting a new request for the specified amount of time since the last request started. That setting is documented at https://doc.scrapy.org/en/latest/topics/settings.html#std:setting-DOWNLOAD_DELAY and implemented in https://github.com/scrapy/scrapy/blob/master/scrapy/core/downloader/init.py in Downloader._process_queue
.
In Scrapy, this setting applies globally unless their equivalent of max-conns-per-host
is set, in which case it applies per-host instead. Since Soup always applies some per-host connection limit, I think it'd be most useful to provide a new download-delay-per-host
property which limits the client's impact on any one server.
I could also imagine some value in a session-global download-delay
which limits impact on the client's network connection. If both are set then I'd think both deferred start times should be computed for each request and it should start at whichever time is later.
I'd particularly like to see this for applications like RSS feed readers which can do huge numbers of requests in the background, where this delay isn't visible to the user.
I can almost kind of see how to fit this into soup-session.c
but I'd like to know if it'd be likely to be merged and whether anyone has any implementation advice.