Please add "total least squares" orthogonal regression
@lukehutch
Submitted by Luke Hutchison Link to original bug (#605044)
Description
The current linear regression line in charts in Gnumeric uses ordinary least squares (OLS), minimizing the squared error in vertical distance between points and the regression line, rather than the orthogonal distance. Specifically "A fitted linear regression model can be used to identify the relationship between a single predictor variable x_j and the response variable y when all the other predictor variables in the model are 'held fixed'." -- http://en.wikipedia.org/wiki/Linear_regression
In some datasets, neither variable is "held fixed", and as a result the interpretation of the results of OLS would be wrong. An example is a scatterplot between two variables where the points form a cluster that is horizontally narrow but vertically quite tall. OLS will likely produce a shallow regression line in an attempt to minimize vertical squared error. However a much more natural best fit line would be a steep line -- possibly even of infinite gradient, i.e. a perfectly vertical line -- that properly shows the trend in the points. The difference is that when there is no "held fixed" variable, perpendicular distance to the regression line must be used, not vertical distance. The total least squares (TLS) method to accomplish this is described here:
http://en.wikipedia.org/wiki/Total_least_squares
(This is probably closely related to the 1st principal component...) The result is not in the form y=mx+c but ax+by+c=0, because the gradient can be infinite.
Please can you add a TLS regression line method to Gnumeric? For XY plots, standard linear regression only works correctly some of the time. Thank you.
Version: GIT