Draft: Highlight Changes Within Lines Feature
Highlight Changes Within Line Feature
Adds highlighting to modifications between lines, based on myers-diff
Algorithm. I've used the following resources to implement it:
-
This paper which contains the detailed and the proof that the algorithm works and get the smallest difference/change between two lines: " http://www.xmailserver.org/diff2.pdf ".
-
These awesome series of articles which contain a detailed explanation of the algorithm: "https://blog.jcoglan.com/2017/02/12/the-myers-diff-algorithm-part-1/"
-
Also some useful questions that explain an overview of the algorithm: "https://stackoverflow.com/questions/42635889/myers-diff-algorithm-vs-hunt-mcilroy-algorithm"
My implementation is based on backtracking from the smallest edit found. I'm trying to map the removed/added lines numbers so that it's easily accessed, without actually looping through all of them.
First, Each of the removed/added lines is saved in removed/added hash table using the key of this line number in the new/old file. Mapping occurs here by assigning the key to the removed line in the removed hash table to the line to be added in the new file.
Then looping through each of the removed/added lines, the algorithm checks if there is any added/removed line with the same key in the proper hash table
In case if the key exist, then we apply myers-diff
algorithm between them(the removed/added
and the added/removed lines), get the smallest edit script, and
backtrack from it while highlighting the corresponding characters
in each position.
This is still a WIP though, and these are the issues I've found:
-
In case of a line contains a non UTF-8 characters, highlighting gets messed up due to accessing the line by bytes, while Gtk.TextIter is expecting a proper character offset.
-
In case of large number of edits, and small similarities, highlighting gets distributed across the line, and it becomes hard to see (I suggest adding a threshold in this case for the number of edits according to each line's length).
-
Commit loading becomes slow in case of the commit has a very large number of removals/additions between large number of lines.