Dirdiff mmap - improve dirdiff performance using mmap
Changes:
- Use mmap for files with size greater than CHUNK_SIZE (curent set to 4096b);
- When "ignore blank line" retain content for rematch after remove blank lines;
- Remove blank lines now ignores lines with only spaces;
- All_same accept iter and runs faster;
- Convert large data to generators and small data to list;
- Use regex to normalize line ending;
- Apply regex.sub instead of apply_text_filters;
- Add tests for _files_same, all_same and remove_blank_lines.
Test executed:
- empty files (fastest equal, wont read files)
- 1b vs 1b file (fast equal, read both until end)
- 4mb vs 4mb file (slow equal, read both until end)
- empty vs 1b file (fast different, first chunk diff)
- 1b vs 4mb file (fast different, first chunk diff)
- 4mb vs 4mb file (slow different, read both until end)
cProfile Results:
- master branch: 34291 function calls in 0.337 seconds
- this branch: 1115 function calls in 0.069 seconds
Edited by Hugo Sena Ribeiro