[gvfs-smb] Error reading: Invalid argument
/copy_metadata 373
A few days ago I stumbled upon Issue #373 (closed) “GVFS-SMB write problems to SMB2 shares” by @sawozny where he described an odd issue when writing to a SMB3 share. That reminded me of this similar bug I had troubleshooted in the past but did not get around to report. This is such report, after confirming it is still happening. The bug could actually lie in gvfs, libsmbclient or even the SMB server, but 373 (closed as no longer reproducible) makes me think they may have the same origin. Nonetheless, it seems preferably to open this as a separate bug.
Test scenario: an up-to-date Ubuntu 18.04.2 machine:
- gvfs 1.36.1-0ubuntu1.3.3
- libsmbclient 2:4.7.6+dfs-ubuntu-0ubuntu2.16
- Connection to remote share using SMB3†
† client min protocol set to SMB3 on /etc/samba/smb.conf, albeit the connection log shows "signed SMB2 message"
Steps to reproduce:
SHARE=/run/user/$UID/gvfs/smb-share:server=... # Point to a mounted, writable gvfs share
SHAREFILE="$SHARE/bug.txt" # A remote file
TEMPFILE=`mktemp` # A local file
SIZE=278529
rm "$SHAREFILE" # Ensure the remote file doesn't exist
dd if=/dev/zero of="$SHAREFILE" bs=$SIZE count=1 status=none # Create file
dd if="$SHAREFILE" of="$TEMPFILE" bs=$SIZE count=1 status=none # Copy locally
The code itself should be self-explanatory: We remove a smb file if it exists, create it and finally read its contents.
The parameter status=none is simply to make dd
less verbose. I initially identified the problem as happening when the last read was using a larger buffer than the remainder of the file (e.g. when the first dd uses $(($SIZE + 1))
), but it is also happening with an exact read.
Expected value:
Second dd reads the 278529 file and saves it locally at $TEMPFILE
What actually happens:
The second dd fails with:
dd: error reading `/run/user/…/bug.txt': Invalid argument
strace shows that read()
is returning EINVAL.
And now all the funny bits:
- 278529 is a magic value. It is always failing for me with a size of 278529, but working with a size of 278528.
- Sizes larger than 278529 also fail. I see all values from 278529 to 282623 (278528 + 4095) fail.
- Sizes of the form
278528 + 4096*n
always work - Sizes from 282624 (278528 + 4096) to 286720 (278528 + 8192) fail almost always, 282624 and 286720 always succeed, and a variable number of values inbetween will work (different ones on every iteration). There seem to be about 1/256th chance that it succeeds in this range
- Small files, with as few as 28 bytes may also fail, but very infrequently
- I see in the log a time where the EINVAL was on writing such small file (i.e. #373 (closed))
- It is time-dependant. If you place a
sleep 1
between the two dd, it will no longer fail. Run the above commands on a script or as a single command execution, not manually. - The rm is also required (if not the first run). It must be a fresh file.
I am attaching a debug log with GVFS_SMB_DEBUG=10 GVFS_DEBUG=1
following the steps at https://wiki.gnome.org/Projects/gvfs/debugging . The separators were added to the log between the execution of the different commands.
The only obvious difference between the two files is that in the later there is an cli_close failed, reporting a nt_Status error of c000020c (this is STATUS_CONNECTION_DISCONNECTED), with a smbc_error 1 233 (0xe9) -> 103
As this is purging the server, next interaction will show a harmless reconnect in the log.
Interestingly enough, a file of size 282624 also shows that same c000020c error code, but does not result in EINVAL being returned to the application.
The following direct interaction via smbclient isn't showing any error:
dd if=/dev/urandom of=bug.txt bs=278529 count=1
printf 'dele bug.txt\nput bug.txt\nget bug.txt\n' | smbclient -k //server/share
The value 278528 seems to be the sum of 131072+131072+16384 chosen as read sizes for smbc_read().
Is gvfs somehow adapting the read sizes of their GVfsJobReads according to some internal benchmarking of the transmission bandwidth? It's a bit strange that it does two 131072 reads, then down to 16384, finally 32768 (which returns the lonely remaining byte), then up to 65536 and later 131072 although there are no more contents at the file. I pondered if after returning 0 the first time, the would get an EOF bit, that was what produced the JobClose failure. But there are two spurious reads as well for 278527, which is a successful transmission.