Experiment with new behavior for TAB characters
From gnome-terminal#8042 (closed) and several forum posts elsewhere (see also gnome-terminal#3461 (closed)):
TAB characters are often found in text files. Think of source code, think of TSV files, think of /etc/fstab
, /etc/hosts
, /etc/services
etc.
cat
'ing them to the terminal doesn't produce the "expected" result if they cross the linebreak. TABs can go missing, and two words previously separated by a TAB can become concatenated into one word.
This is because TAB is not a printable character, it is a control instruction just like escape sequences.
We should experiment with a fix to the problem, a new behavior where printing each and every TAB character always results in exactly one copy-pasteable TAB character (assuming that we're printing a text file on an empty canvas, that is, this TAB charcter doesn't jump over existing letters).
I think it's unlikely that any app would emit a TAB character near the end of the line, relying on the current behavior of stopping at the right edge (okay there's some chance that some app does this), and especially not on the no-op behavior if the cursor is already at the right edge.
That being said, the new mode could be conditional, to allow easy reverting if any app encounters problems, possibly only for the lifetime of that app. Maybe DECSET 1009 or 2009 (the number 9 referring to TAB).
It's not trivial to design the desired layout after a TAB character wraps into the next line, especially if the window width is not a multiple of the TAB width. Possible behaviors include:
-
Shorten the tab at the end of the line. Whatever follows the TAB will start from the beginning of the next line.
-
Split the tab. Some spaces at the end of the previous line, some at the beginning of the new, to keep the exact number of desired spaces; i.e. resulting in the same visual arrangement as if the file was sent through
expand
. -
Insist on the desired width of each particular TAB character (as computed from the column position). If does not fit at the end of the line then empty area is left there and the entire TAB is moved to beginning of the next line.
These all work differently, all produce different result with a file that has nicely formatted columns (after wrapping those columns may not even align), and all behave differently on a subsequent resize.
I'm leaning towards the middle one being the best behavior, but it's the hardest to implement in VTE. VteRowData
would need to be prepared for possibly beginning with a fragment, getting/freezing/thawing a row all prepared for this, row_stream
being able to point to a middle of a character of text_stream
, rewrapping adjusted, text selection adjusted etc.
(A long time ago when I rewrote Midnight Commander's viewer I faced this dilemma, and randomly chose the third option. Not sure if I would choose that today.)
An obvious followup question, closely related to the previous one: After wrapping, should TABs behave according to the logical offset (the column within the original text file) or the visual offset (the visual column it is wrapped to)?
If according to the logical column then the list of tab stops being a finite set (up to column 1000 currently) is no longer a viable approach. Maybe the whole dynamic tab stops CTC/HTS/TBC/TSR business has to be removed (at least if this new mode is in effect) and operate with a hardwired multiple-of-8 approach. Also, how to track the logical column if some cursor moving escape sequences are seen?
If according to the visual column then rewrapping won't be able recreate the layout as if the file was cat
'ed at the new window size.
(In mcview
I picked the logical column approach.)
If only plain printable characters and newlines are involved, but no TABs, rewrapping on resize guarantees that the layout after a resize is identical to as if the file was printed at that new size. This is even true if the line is printed slowly, char-by-char, and the window is resized in between.
Do we want to keep this property even when TABs are involved? If yes then this criterium will restrict the possible choices to the earlier questions.
Anyhow, unfortunately it doesn't seem easy to come up with the right design.