dirdiff: Use mmap in _files_same for a speed improvement
When some file has content and other hasn't, it will try open both with mmap and fail for empty file. We can't say that they are different because regex or blank lines can equalize both. Since we can't open an empty mmap, I just use mmaps for 'big' files. I'm only not sure about this size. (current set to CHUNK_SIZE, 4096b);
Round | max size (KB) | master (s) | mmap (s) |
---|---|---|---|
10000 | 4 | 11.46 | 12.62 |
1000 | 4 | 1.54 | 1.67 |
1000 | 40 | 2.20 | 1.91 |
1000 | 400 | 9.81 | 5.21 |
1000 | 4000 | 91.05 | 47.50 |
Every 'round' it runs:
- 1 x empty files (fastest equal, wont read files)
- 1 x 1Byte vs 1Byte file (fast equal, read both until end)
- 1 x max size vs max size (slow equal, read both until end)
- 1 x empty vs 1Byte file (fast different, first chunk diff)
- 1 x max size vs max size (fast different, first chunk diff)
- 1 x max size vs max size (slow different, read both until end)
MMap runs faster if files are large, and slower if files has more or less 4KB
Part of old dirdiff-mmap merge request
Edited by Kai Willadsen