Commit ec6e6517 authored by Michael Meeks's avatar Michael Meeks

New Depencency code + docs,

misc. minor fixes.
parent a9b66126
2000-01-05 Michael Meeks <mmeeks@gnu.org>
* src/sheet.c (sheet_move_range, sheet_insert_cols, sheet_delete_cols),
(sheet_insert_rows, sheet_delete_rows): use sheet_recalc_dependencies.
* src/eval.c (cell_eval_content): Hack out stubs of value tree
pruning. (cell_eval): ditto.
Stupidly remove all inline optimization.
(sheet_recalc_dependencies): implement.
2000-01-03 Michael Meeks <mmeeks@gnu.org>
* src/clipboard.c (do_clipboard_paste_cell_region): add back
region_get_deps.
2000-01-02 Michael Meeks <mmeeks@gnu.org>
* src/sheet.c (sheet_get_extent): fit inside the max_used bounds.
* src/eval.c (search_range_deps): add back.
(cell_get_single_dependencies): make API nicer + rename to
(get_single_dependencies). (sheet_region_get_deps): revive, add
single dependency checking + more logical name.
1999-12-31 Michael Meeks <mmeeks@gnu.org>
* src/eval.c (cell_get_dependencies): add debub.
(sheet_dump_dependencies): dump pending eval_queue.
(drop_cell_range_dep): improve debug + kill printout.
* src/cell.c (cell_relocate): When we relocate, we change the content,
hence do a cell_content_changed.
* src/clipboard.c (paste_cell): Do a 'cell_relocate' when we relocate
a cell :-).
1999-12-30 Michael Meeks <mmeeks@gnu.org>
* src/eval.c (drop_cell_range_dep): fast range dep drop.
(cell_drop_dependencies): kill iterate over sheet in favour of faster
drop_cell_range_dep. ( drastic simplification )
Kill 'remove_list' (dependency_remove_cell): Kill.
* src/sheet.c (sheet_cell_remove_from_hash): drop dependencies.
* src/Gnumeric.idl: Add ';'.
* src/eval.c (cell_eval_content): Comment out & 'fix' the 'changed'
return value.
1999-12-24 Michael Meeks <mmeeks@gnu.org>
* src/eval.c (DependencySmall): Use a CellPos, we don't need or want
a Sheet pointer in this structure. (dependency_small_hash): remove sheet
usage. (dependency_small_equal, handle_cell_small_dep): ditto.
(DependencyRange): Kill redundant ref_count member, this was just ==
g_list_length (DependencyRange->cell_list), and was not used sensibly.
(dependency_remove_cell): check for cell_list == NULL not ref_count.
(add_cell_range_dep, dependency_range_ctor, dump_range_dep): kill
ref_count.
1999-12-23 Michael Meeks <mmeeks@gnu.org>
* src/eval.c (cell_eval_content): documented + added 'changed' return
value + check using value_equal if there is no change.
* src/value.c (value_equal): implement.
* src/value.h (struct _Value): comment redundant 'CellRef cell' member.
* src/eval.c (dependency_range_ctor): kill redundant sheet in
DependencyRange. (handle_cell_small_dep): add helpful precondition.
(handle_cell_range_deps): Get the right sheet's deps.
(add_cell_range_dep): Add DependencyData arg.
(range_equal_func): remove sheet check.
(search_cell_deps, search_intersheet_deps): ditto.
1999-12-22 Michael Meeks <mmeeks@gnu.org>
* src/clipboard.c (do_clipboard_paste_cell_region): remove redundant
dependency queueing, this is done by cell_reloate in paste_cell anyhow.
* src/eval.c (dump_range_dep, dump_small_dep): implement.
(add_cell_range_dep): insert the inter sheet dependencies into the
correct sheet's hash!
(cell_get_range_dependencies): hence use only the sheet's hash and not
all of the sheets !
(cell_drop_dependencies): iterate over all sheets to kill dependencies.
(dump_cell_list): implement.
(handle_cell_range_dep): kill + (handle_tree_deps): fixup for ARRAY.
* src/expr.c (eval_range): update warning + failure case.
* src/workbook.c (workbook_can_detach_sheet): implement.
(workbook_setup_edit_area, misc_output): debugging dumps.
* src/eval.c (cell_drop_dependencies): only drop depends from single
ref-counted formulae; (region_get_dependencies): kill it was used
incorrectly all over the place. (search_range_deps): ditto.
(search_intersheet_deps): implement. (sheet_get_intersheet_deps): implemtn.
* src/clipboard.c (do_clipboard_paste_cell_region): remove get_deps +
add warning.
* src/workbook.c (workbook_do_destroy): call sheet_dump_dependencies.
(dump_dep): move to eval.c (misc_output): hack debugging code.
(workbook_can_detach_sheet): add warning + update ( this was broken ).
* src/eval.c (cell_add_explicit_dependency): kill.
(region_get_dependencies): add warning.
(cell_drop_dependencies): Add handle_tree_deps; NB. this can be
all moved elsewhere for speed later.
(sheet_dump_dependencies): implement.
* src/sheet.c (sheet_move_range): Kill region_get_dependencies +
queue_recalc_list: cell_relocate has to do this anyway.
(sheet_insert_cols, sheet_delete_cols,
(sheet_insert_rows, sheet_delete_rows): ditto.
* src/eval.c (dependency_small_hash, dependency_small_equal): implement.
(handle_cell_range_depsm cell_add_dependencies, handle_tree_deps),
(handle_cell_range_dep): rename + add 'add/remove' parameter.
(cell_eval_content): update debug to ParsePosition.
(handle_cell_range_deps): update with switch.
(handle_cell_small_dep): implement.
(cell_get_dependencies): hack.
(cell_get_range_dependencies): split out.
1999-12-21 Michael Meeks <mmeeks@gnu.org>
* src/eval.c (dependency_hash_init): kill.
(dependency_data_new, dependency_data_destroy): implement.
(cell_add_dependencies, cell_add_explicit_dependency),
(region_get_dependencies, cell_get_dependencies): update preconditions.
* src/sheet.c (sheet_destroy, sheet_new): use new functions.
* src/sheet.h (Sheet): Use DependencyData.
* src/eval.h: Add new / destroy functions for deps.
* src/gnumeric.h: add DependencyData.
* src/main.c: Add debug_deps.
2000-01-06 Jody Goldberg <jgoldberg@home.com>
* *.[ch] : rename struct expr_relocate_info -> ExprRelocateInfo.
......@@ -476,16 +622,6 @@
(item_bar_event) : No need to call canvas to world. We zoom manually.
(item_bar_get_line_points) : Delete.
1999-12-19 Michael Meeks <mmeeks@gnu.org>
* src/ranges.h (range_overlap): macroify.
* src/ranges.c (range_overlap): comment out.
* src/eval.c (cell_queue_recalc): inline
(cell_queue_deps_recalc): implement.
(cell_queue_deps_for_recalc): implement.
1999-12-19 Michael Meeks <mmeeks@gnu.org>
* configure.in: bump version to 0.46.
......
2000-01-05 Michael Meeks <mmeeks@gnu.org>
* src/sheet.c (sheet_move_range, sheet_insert_cols, sheet_delete_cols),
(sheet_insert_rows, sheet_delete_rows): use sheet_recalc_dependencies.
* src/eval.c (cell_eval_content): Hack out stubs of value tree
pruning. (cell_eval): ditto.
Stupidly remove all inline optimization.
(sheet_recalc_dependencies): implement.
2000-01-03 Michael Meeks <mmeeks@gnu.org>
* src/clipboard.c (do_clipboard_paste_cell_region): add back
region_get_deps.
2000-01-02 Michael Meeks <mmeeks@gnu.org>
* src/sheet.c (sheet_get_extent): fit inside the max_used bounds.
* src/eval.c (search_range_deps): add back.
(cell_get_single_dependencies): make API nicer + rename to
(get_single_dependencies). (sheet_region_get_deps): revive, add
single dependency checking + more logical name.
1999-12-31 Michael Meeks <mmeeks@gnu.org>
* src/eval.c (cell_get_dependencies): add debub.
(sheet_dump_dependencies): dump pending eval_queue.
(drop_cell_range_dep): improve debug + kill printout.
* src/cell.c (cell_relocate): When we relocate, we change the content,
hence do a cell_content_changed.
* src/clipboard.c (paste_cell): Do a 'cell_relocate' when we relocate
a cell :-).
1999-12-30 Michael Meeks <mmeeks@gnu.org>
* src/eval.c (drop_cell_range_dep): fast range dep drop.
(cell_drop_dependencies): kill iterate over sheet in favour of faster
drop_cell_range_dep. ( drastic simplification )
Kill 'remove_list' (dependency_remove_cell): Kill.
* src/sheet.c (sheet_cell_remove_from_hash): drop dependencies.
* src/Gnumeric.idl: Add ';'.
* src/eval.c (cell_eval_content): Comment out & 'fix' the 'changed'
return value.
1999-12-24 Michael Meeks <mmeeks@gnu.org>
* src/eval.c (DependencySmall): Use a CellPos, we don't need or want
a Sheet pointer in this structure. (dependency_small_hash): remove sheet
usage. (dependency_small_equal, handle_cell_small_dep): ditto.
(DependencyRange): Kill redundant ref_count member, this was just ==
g_list_length (DependencyRange->cell_list), and was not used sensibly.
(dependency_remove_cell): check for cell_list == NULL not ref_count.
(add_cell_range_dep, dependency_range_ctor, dump_range_dep): kill
ref_count.
1999-12-23 Michael Meeks <mmeeks@gnu.org>
* src/eval.c (cell_eval_content): documented + added 'changed' return
value + check using value_equal if there is no change.
* src/value.c (value_equal): implement.
* src/value.h (struct _Value): comment redundant 'CellRef cell' member.
* src/eval.c (dependency_range_ctor): kill redundant sheet in
DependencyRange. (handle_cell_small_dep): add helpful precondition.
(handle_cell_range_deps): Get the right sheet's deps.
(add_cell_range_dep): Add DependencyData arg.
(range_equal_func): remove sheet check.
(search_cell_deps, search_intersheet_deps): ditto.
1999-12-22 Michael Meeks <mmeeks@gnu.org>
* src/clipboard.c (do_clipboard_paste_cell_region): remove redundant
dependency queueing, this is done by cell_reloate in paste_cell anyhow.
* src/eval.c (dump_range_dep, dump_small_dep): implement.
(add_cell_range_dep): insert the inter sheet dependencies into the
correct sheet's hash!
(cell_get_range_dependencies): hence use only the sheet's hash and not
all of the sheets !
(cell_drop_dependencies): iterate over all sheets to kill dependencies.
(dump_cell_list): implement.
(handle_cell_range_dep): kill + (handle_tree_deps): fixup for ARRAY.
* src/expr.c (eval_range): update warning + failure case.
* src/workbook.c (workbook_can_detach_sheet): implement.
(workbook_setup_edit_area, misc_output): debugging dumps.
* src/eval.c (cell_drop_dependencies): only drop depends from single
ref-counted formulae; (region_get_dependencies): kill it was used
incorrectly all over the place. (search_range_deps): ditto.
(search_intersheet_deps): implement. (sheet_get_intersheet_deps): implemtn.
* src/clipboard.c (do_clipboard_paste_cell_region): remove get_deps +
add warning.
* src/workbook.c (workbook_do_destroy): call sheet_dump_dependencies.
(dump_dep): move to eval.c (misc_output): hack debugging code.
(workbook_can_detach_sheet): add warning + update ( this was broken ).
* src/eval.c (cell_add_explicit_dependency): kill.
(region_get_dependencies): add warning.
(cell_drop_dependencies): Add handle_tree_deps; NB. this can be
all moved elsewhere for speed later.
(sheet_dump_dependencies): implement.
* src/sheet.c (sheet_move_range): Kill region_get_dependencies +
queue_recalc_list: cell_relocate has to do this anyway.
(sheet_insert_cols, sheet_delete_cols,
(sheet_insert_rows, sheet_delete_rows): ditto.
* src/eval.c (dependency_small_hash, dependency_small_equal): implement.
(handle_cell_range_depsm cell_add_dependencies, handle_tree_deps),
(handle_cell_range_dep): rename + add 'add/remove' parameter.
(cell_eval_content): update debug to ParsePosition.
(handle_cell_range_deps): update with switch.
(handle_cell_small_dep): implement.
(cell_get_dependencies): hack.
(cell_get_range_dependencies): split out.
1999-12-21 Michael Meeks <mmeeks@gnu.org>
* src/eval.c (dependency_hash_init): kill.
(dependency_data_new, dependency_data_destroy): implement.
(cell_add_dependencies, cell_add_explicit_dependency),
(region_get_dependencies, cell_get_dependencies): update preconditions.
* src/sheet.c (sheet_destroy, sheet_new): use new functions.
* src/sheet.h (Sheet): Use DependencyData.
* src/eval.h: Add new / destroy functions for deps.
* src/gnumeric.h: add DependencyData.
* src/main.c: Add debug_deps.
2000-01-06 Jody Goldberg <jgoldberg@home.com>
* *.[ch] : rename struct expr_relocate_info -> ExprRelocateInfo.
......@@ -476,16 +622,6 @@
(item_bar_event) : No need to call canvas to world. We zoom manually.
(item_bar_get_line_points) : Delete.
1999-12-19 Michael Meeks <mmeeks@gnu.org>
* src/ranges.h (range_overlap): macroify.
* src/ranges.c (range_overlap): comment out.
* src/eval.c (cell_queue_recalc): inline
(cell_queue_deps_recalc): implement.
(cell_queue_deps_for_recalc): implement.
1999-12-19 Michael Meeks <mmeeks@gnu.org>
* configure.in: bump version to 0.46.
......
A discussion of the new dependency code, version 0.1
by Michael Meeks <mmeeks@gnu.org>
The dependency code is a comparatively conceptualy simple part of the gnumeric
code. The code is designed to determine which cells depend on the cell of
interest. The main use of this is in triggering recomputation of cells which
depend on a cell that has just changed.
1. Overview of the Dependencies
The majority of the code related to dependencies can be found in module
eval.c, and this should be the first reference for functions.
1.1 Data structures and their meaning
The main dependency data is anchored on a per-sheet basis using the
structure DependencyData. This stores all the dependencies for the sheet
in two hash tables.
There are two types of dependencies, single and range. Loosely these
describe single (ie. = A1 ) cell references vs. large ( ie. = SUM (A1:Z500) )
range references. The two hash tables store DependencyRange structures in the
'range_hash' member, and DependencySingle structures in the 'single_hash'
member.
The DependencyRange structure defines a range reference. This essential
lists the Cells ( not neccessarily in this sheet ), that depend on the range
specified. Hence to find the cells that depend on a cell you have just altered
you must search all the range structures in the range_hash for that sheet.
The DependencySingle structure mapping stores the degenerate case of
a DependencyRange. Essentialy it stores the cells that depend on a unit range.
This allows for extremely fast linear time hashed lookup. This contrasts with
the Range hash, all of which has to be traversed per dependency calculation.
NB. the DependencySingle has has to use CellPos' since there is no garentee that
a cell will exist at a given position in the sheet that is depended on.
1.2 The generation gap
In order to ensure that there are no circular dependencies that will
create a recursive loop locking gnumeric there is a generation counter on
each cell. There is a generation count per workbook which is incremented on
recalculate. When a cell is re-calculated its generation is set to be the
workbook's generation count. A cell will only be re-evaluated if its generation
is not the current generation.
2. Cell Evaluation
The routine cell_eval will evaluate a cell and if its value changes
it will queue cells that depend on it, these are obtained via
cell_get_dependencies, which looks up in the single_hash and tranverses the
range_hash to obtain its result. Often cell recalculation happens through
workbook_recalc which works through the workbook's eval_queue re-evaluating
the cells there.
2.1 Evaluation queue vs. recursion
There are two ways in which a cell's recalculation can occur. Firstly
simply traversing the ExprTree (expr.h) will result in many cells being
evaluated. Essentialy each dereference in the ExprTree ( eg. =A1 ) will cause
the re-calculation of A1's ExprTree before we can continue ( recursively ).
After the root ExprTree has been evaluated the value is set on the Cell and
the dependencies of this cell are queued on the workbook's eval_queue.
2.2 Short circuiting dependencies
Clearly, if after evaluating a cell's ExprTree we generate an identical
Value (value.h) we can prune a huge chunk off the dependency tree by not
bothering notifying dependencies of a change ( since there hasn't been one ).
Hence cell_eval_content returns a gboolean indicating whether the cell's
value has changed.
3. Dependencies the bottleneck
Since dependencies tend to fan out, and massively increase the area that
needs to be recomputed, it is clearly advantagous to spend time culling the
dependency tree as intelligently as possible. Furthermore since for each cell
that changes it is neccessary to determine its dependencies the lookup of
a cells dependencies needs to be fast.
3.1 Why two methods
First, consider the case where only range dependencies are used, this
is a fairly simple first implementation. If we have N cells that have
random dependencies on one other cell, then we will have approx N ranges in
the range hash. For each cell we re-calculate we need to iterate over the
entire range hash to determine its dependencies. Hence we have a fundamentally
O(N^2) algorithem, this is very bad news. This scheme spends almost all of its
time in search_cell_deps.
To overcome this problem we partition dependencies into single cell
dependencies and range dependencies. This way for the common =A1 case, we don't
add an entry in the range_hash, we simply add an entry in the simple_hash.
Hence for the cell_get_dependencies we have one less entry in the range hash to
iterate over, which saves N iterations of search_cell_deps.
3.2 Inter-sheet dependencies
Inter sheet dependencies are managed simply by inserting the dependency
information into the sheet in which the cells that are dependended on reside.
Furthermore all inter-sheet dependencies are managed using DependencyRanges
regardless of size. Whilst storing dependencies in the Cell's sheet
accelerates the cell_get_dependencies common case ( since it only has to check
the current sheet for inter-sheet dependencies ) it slows down the less common
cell_drop_dependencies, since this has to scan all the sheet's dependency
ranges to remove inter-sheet references.
3.3 What is hashed
Whilst the two hashes ( range_hash, simple_hash ) are both GHashTables
what is hashed is quite different.
3.3.1 What does the range hash do ?
The hashing on the range_hash is merely used to determine if there is
already a range in the hash with the same dimensions as a new dependency range
being added. This accelerates insertion of dependencies, the hash is traversed
as a simple un-ordered list at all other times.
3.3.2 Why not a direct Cell * -> GList * mapping for DependencySingle ?
This is not done since there is no garentee that cells that have
dependencies are in existance yet. Hence it is quite valid for A1 to be '=A2'
before A2 exists. If A2 does not exist then A2 has no Cell structure yet. This
could be obviated by creating depended cells, but this would be inelegant.
4. How dependencies are generated and removed
The dependencies are both generated and mostly removed by the
handle_tree_deps function. This traverses the ExprTree either adding or
removing dependencies on cells as they are met.
4.1 Handling ExprTrees
The ExprTree is recursively traversed by handle_tree_deps and may
terminate with handle_value_deps terminating in either adding or removing
dependencies, according to the 'add' parameter.
4.2 Removal of dependencies
The removal of single dependencies is performed by traversing the
ExprTree again, this saves a search of every cell in the dependency hash
looking for this Cell's position. This relies on sheet_cell_remove_from_hash,
and sheet_cell_add_to_hash dropping and adding dependencies, and
formula_unlink / link when a cell's formula is changed, since the original
ExprTree is needed to remove its dependencies correctly. The current code
implements this correctly internaly, but this needs bearing in mind if
extensive work is done to cell.c.
4.3 Special cases
4.3.1 Implicit intersection
This is as yet unimplemented, but will further reduce the number of
ranges to clip against. Essentialy an implicit intersection reduces a range
to an adjacent single reference under certain circumstances.
4.3.2 Array Formula
These luckily have a simple dependency structure, since the formula
stored is identical in each cell, the cells may all depend on the corner cell
using a fast single mapping.
4.3.3 The INDIRECT function
This is rather a special case; this function returns a value that
references a different cell, hence the dependency has to be treated rather
differently. This is yet to be implemented.
5. Future Expansion
There are several avenues open for future expansion. Clearly further
accelerating the range search will give big speedups on large sheets. This
could be done by clipping the dependency ranges several times against smaller
ranges homing in on the cell of interest, and storing the results for future
reference. Clearly many of the MStyle range related optimizations would be
useful here as well.
5.1 Pruning the tree
It should be trivial to chop huge chunks of the dependency tree out
if we find that the value we have generated is identical to that which we
already had. This is currently implemented, but commented out, since
dependencies need to be recalculated on move / cut / paste etc. and in this
case the value does not change. Simply NULL'ing cell->value in cell_relocate
might fix this.
5.2 Multi-threading,
With the current structure, it might well be possible to add multi-
threading support to the evaluation engine. The structure of this would take
advantage of the partitioning already provided by the sheet boundary. To do
this it would be neccessary to move the eval_queue to a per-sheet entity, and
putting a locking / signaling mechanism on the queue such that inter-sheet
dependencies could be pre-pended to the queue ( thus ensuring rapid
evaluation ), and waited on safely. Since each cell is evaluated but once
per re-calc, it would then be safe to read the Cell's value when it dissapeared
from the eval_queue.
5.3 ExprTree recursion
Whether it is always entirely neccessary to re-evaluate a cell solely
on the basis that it is in the ExprTree is non-obvious to me. Clearly if this
cell is in the dependency queue it would make perfect sense, however if there
is as yet no chance that this cell has been changed, it makes little sense
to re-calculate it ( and its tree'd dependencies ). The only problem here is
determining whether any of the currently queued dependencies would alter this
cell's dependencies.
\ No newline at end of file
A discussion of the new dependency code, version 0.1
by Michael Meeks <mmeeks@gnu.org>
The dependency code is a comparatively conceptualy simple part of the gnumeric
code. The code is designed to determine which cells depend on the cell of
interest. The main use of this is in triggering recomputation of cells which
depend on a cell that has just changed.
1. Overview of the Dependencies
The majority of the code related to dependencies can be found in module
eval.c, and this should be the first reference for functions.
1.1 Data structures and their meaning
The main dependency data is anchored on a per-sheet basis using the
structure DependencyData. This stores all the dependencies for the sheet
in two hash tables.
There are two types of dependencies, single and range. Loosely these
describe single (ie. = A1 ) cell references vs. large ( ie. = SUM (A1:Z500) )
range references. The two hash tables store DependencyRange structures in the
'range_hash' member, and DependencySingle structures in the 'single_hash'
member.
The DependencyRange structure defines a range reference. This essential
lists the Cells ( not neccessarily in this sheet ), that depend on the range
specified. Hence to find the cells that depend on a cell you have just altered
you must search all the range structures in the range_hash for that sheet.
The DependencySingle structure mapping stores the degenerate case of
a DependencyRange. Essentialy it stores the cells that depend on a unit range.
This allows for extremely fast linear time hashed lookup. This contrasts with
the Range hash, all of which has to be traversed per dependency calculation.
NB. the DependencySingle has has to use CellPos' since there is no garentee that
a cell will exist at a given position in the sheet that is depended on.
1.2 The generation gap
In order to ensure that there are no circular dependencies that will
create a recursive loop locking gnumeric there is a generation counter on
each cell. There is a generation count per workbook which is incremented on
recalculate. When a cell is re-calculated its generation is set to be the
workbook's generation count. A cell will only be re-evaluated if its generation
is not the current generation.
2. Cell Evaluation
The routine cell_eval will evaluate a cell and if its value changes
it will queue cells that depend on it, these are obtained via
cell_get_dependencies, which looks up in the single_hash and tranverses the
range_hash to obtain its result. Often cell recalculation happens through
workbook_recalc which works through the workbook's eval_queue re-evaluating
the cells there.
2.1 Evaluation queue vs. recursion
There are two ways in which a cell's recalculation can occur. Firstly
simply traversing the ExprTree (expr.h) will result in many cells being
evaluated. Essentialy each dereference in the ExprTree ( eg. =A1 ) will cause
the re-calculation of A1's ExprTree before we can continue ( recursively ).
After the root ExprTree has been evaluated the value is set on the Cell and
the dependencies of this cell are queued on the workbook's eval_queue.
2.2 Short circuiting dependencies
Clearly, if after evaluating a cell's ExprTree we generate an identical
Value (value.h) we can prune a huge chunk off the dependency tree by not
bothering notifying dependencies of a change ( since there hasn't been one ).
Hence cell_eval_content returns a gboolean indicating whether the cell's
value has changed.
3. Dependencies the bottleneck
Since dependencies tend to fan out, and massively increase the area that
needs to be recomputed, it is clearly advantagous to spend time culling the
dependency tree as intelligently as possible. Furthermore since for each cell
that changes it is neccessary to determine its dependencies the lookup of
a cells dependencies needs to be fast.
3.1 Why two methods
First, consider the case where only range dependencies are used, this
is a fairly simple first implementation. If we have N cells that have
random dependencies on one other cell, then we will have approx N ranges in
the range hash. For each cell we re-calculate we need to iterate over the
entire range hash to determine its dependencies. Hence we have a fundamentally
O(N^2) algorithem, this is very bad news. This scheme spends almost all of its
time in search_cell_deps.
To overcome this problem we partition dependencies into single cell
dependencies and range dependencies. This way for the common =A1 case, we don't
add an entry in the range_hash, we simply add an entry in the simple_hash.
Hence for the cell_get_dependencies we have one less entry in the range hash to
iterate over, which saves N iterations of search_cell_deps.
3.2 Inter-sheet dependencies
Inter sheet dependencies are managed simply by inserting the dependency
information into the sheet in which the cells that are dependended on reside.
Furthermore all inter-sheet dependencies are managed using DependencyRanges
regardless of size. Whilst storing dependencies in the Cell's sheet
accelerates the cell_get_dependencies common case ( since it only has to check
the current sheet for inter-sheet dependencies ) it slows down the less common
cell_drop_dependencies, since this has to scan all the sheet's dependency
ranges to remove inter-sheet references.
3.3 What is hashed
Whilst the two hashes ( range_hash, simple_hash ) are both GHashTables
what is hashed is quite different.
3.3.1 What does the range hash do ?
The hashing on the range_hash is merely used to determine if there is
already a range in the hash with the same dimensions as a new dependency range
being added. This accelerates insertion of dependencies, the hash is traversed
as a simple un-ordered list at all other times.
3.3.2 Why not a direct Cell * -> GList * mapping for DependencySingle ?
This is not done since there is no garentee that cells that have
dependencies are in existance yet. Hence it is quite valid for A1 to be '=A2'
before A2 exists. If A2 does not exist then A2 has no Cell structure yet. This
could be obviated by creating depended cells, but this would be inelegant.
4. How dependencies are generated and removed
The dependencies are both generated and mostly removed by the
handle_tree_deps function. This traverses the ExprTree either adding or
removing dependencies on cells as they are met.
4.1 Handling ExprTrees
The ExprTree is recursively traversed by handle_tree_deps and may
terminate with handle_value_deps terminating in either adding or removing
dependencies, according to the 'add' parameter.
4.2 Removal of dependencies
The removal of single dependencies is performed by traversing the
ExprTree again, this saves a search of every cell in the dependency hash
looking for this Cell's position. This relies on sheet_cell_remove_from_hash,
and sheet_cell_add_to_hash dropping and adding dependencies, and
formula_unlink / link when a cell's formula is changed, since the original
ExprTree is needed to remove its dependencies correctly. The current code
implements this correctly internaly, but this needs bearing in mind if
extensive work is done to cell.c.
4.3 Special cases
4.3.1 Implicit intersection
This is as yet unimplemented, but will further reduce the number of
ranges to clip against. Essentialy an implicit intersection reduces a range
to an adjacent single reference under certain circumstances.
4.3.2 Array Formula
These luckily have a simple dependency structure, since the formula
stored is identical in each cell, the cells may all depend on the corner cell
using a fast single mapping.
4.3.3 The INDIRECT function
This is rather a special case; this function returns a value that
references a different cell, hence the dependency has to be treated rather
differently. This is yet to be implemented.
5. Future Expansion
There are several avenues open for future expansion. Clearly further
accelerating the range search will give big speedups on large sheets. This
could be done by clipping the dependency ranges several times against smaller
ranges homing in on the cell of interest, and storing the results for future
reference. Clearly many of the MStyle range related optimizations would be
useful here as well.
5.1 Pruning the tree
It should be trivial to chop huge chunks of the dependency tree out
if we find that the value we have generated is identical to that which we
already had. This is currently implemented, but commented out, since
dependencies need to be recalculated on move / cut / paste etc. and in this
case the value does not change. Simply NULL'ing cell->value in cell_relocate
might fix this.
5.2 Multi-threading,
With the current structure, it might well be possible to add multi-
threading support to the evaluation engine. The structure of this would take
advantage of the partitioning already provided by the sheet boundary. To do
this it would be neccessary to move the eval_queue to a per-sheet entity, and
putting a locking / signaling mechanism on the queue such that inter-sheet
dependencies could be pre-pended to the queue ( thus ensuring rapid
evaluation ), and waited on safely. Since each cell is evaluated but once
per re-calc, it would then be safe to read the Cell's value when it dissapeared
from the eval_queue.
5.3 ExprTree recursion
Whether it is always entirely neccessary to re-evaluate a cell solely
on the basis that it is in the ExprTree is non-obvious to me. Clearly if this
cell is in the dependency queue it would make perfect sense, however if there
is as yet no chance that this cell has been changed, it makes little sense
to re-calculate it ( and its tree'd dependencies ). The only problem here is
determining whether any of the currently queued dependencies would alter this
cell's dependencies.
\ No newline at end of file
1999-12-30 Jody Goldberg <jgoldberg@home.com>
1999-12-23 Michael Meeks <mmeeks@gnu.org>
* lotus.c (read_workbook): comment out format_prefix.
1999-12-30 Jody Goldberg <jgoldberg@home.com>
......
......@@ -215,9 +215,7 @@ read_workbook (Workbook *wb, FILE *f)
case LOTUS_LABEL: