multi-table html import fails without <caption>
Desired and Expected Behavior
This concerns importing an HTML file containing multiple tables.
By way of background: The following valid HTML works as desired and expected: tables.html
As you can see in these screenshots, the two tables appear as separate sheets:
Undesired and Unexpected Behavior
This applies to a version compiled from freshly-pulled git sources.
Now suppose we remove the <caption> tags. That results in the following, which is still 100% valid HTML: tablex.html
Alas gnumeric interprets this incorrectly, as shown in this screenshot. The second table tramples on the first. Data is lost.
Single-Sheet Behavior
Older versions behave differently. gnumeric version '1.12.51' puts both tables on the same page, one after another. This is sometimes desirable, and sometimes merely OK. No data is lost. I am not complaining about this.
Remarks
The HTML standard does not require a table to have a <caption> tag. Gnumeric should not assume a caption will be present. The decision-making should be driven primarily by the table tag, with an assist from the caption tag if any. Note that the caption tag, if present, must come immediately after the table tag, which simplifies things.
- Take the name of the sheet from the caption, if any.
- Otherwise, take the name from the table tag's ID attribute if any.
- Otherwise, generate the table name in sequence (Sheet1, Sheet2, ...).
If two or more tables are encountered with the same name, they should be imported to the same sheet, one after another, with no trampling, as seen in the single-sheet example above.