Commit 2c356b79 authored by Jody Goldberg's avatar Jody Goldberg

tidy up the docs a mite.

parent be8e4b1a
......@@ -7,11 +7,6 @@ change_logs = ChangeLog \
OChangeLog-2000-02-23 OChangeLog-2000-10-10 \
appicondir = $(datadir)/pixmaps
appicon_DATA = gnome-gnumeric.png \
gnome-application-x-gnumeric.png \
Applicationsdir = $(datadir)/applications/
Applications_in_files =
Applications_DATA = gnumeric.desktop
......@@ -54,9 +49,3 @@ dist-hook: gnumeric.spec
mkdir $(distdir)/samples
cp $(srcdir)/samples/*gnumeric $(distdir)/samples
cp gnumeric.spec $(distdir)
<title>Gnumeric TODO list</title>
<entry size="big" status="0%" target="1.0">
<title>Dependencies for CELL, INDIRECT functions and Sheet objects</title>
</section> <!--Calculation-->
<title>Graphics Component</title>
<entry size="small" status="0%" target="1.0">
<title>Need to high light current graphics on the first page</title>
Current druid is not highlighting the first graphic.
</section> <!--Graphics Druid-->
<title>Graphics Component</title>
<entry size="medium" status="0%" target="1.0">
<title>Need to support captions</title>
We currently do not support any kind of captions.
</section> <!--Graphics Component-->
/* $Id$/
Cell Content Jody Goldberg <>
Cell is one of the most widely used aspects of Gnumeric. This document
describes the operations and state changes that are available.
location : Cells contain pointers to their sheet, the row, and the column
that contains them. This may change.
value : A Value *. All displayable cells should have a value.
renderedvalue : A placeholder for future use.
parse_format : Expressions store the results of autoformat and values store the
format used to parse the value that was entered.
When a cell is first created it is initialized to have value_empty and no location.
- Mark sheet as dirty
- Queue depends for recalc
- Calculate
- Render
- Dimension
- Spans
- Redraw
- link expression to master list
- Update the edit area
- Recalc the auto expression
Future plans
- just in time rendering and span calculation
- span ovelap
- Have spans use run length encoding
A discussion of the new dependency code, version 0.2
by Michael Meeks <>
The dependency code is a comparatively conceptualy simple part of the gnumeric
code. The code is designed to determine which objects depend on which cells.
The main use of this is triggering recomputation for things that depend on
something that has changed.
1. Overview of the Dependencies
The majority of the code related to dependencies can be found in module
eval.c, and this should be the first reference for functions.
1.1 Data structures and their meaning
The main dependency data is anchored on a per-sheet basis using the
structure DependencyData. This stores all the dependencies for the sheet
in two hash tables.
There are two types of dependencies, single and range. Loosely these
describe single (ie. = A1 ) cell references vs. large ( ie. = SUM (A1:Z500) )
range references. The two hash tables store DependencyRange structures in the
'range_hash' member, and DependencySingle structures in the 'single_hash'
The DependencyRange structure defines a range reference. This lists the
Dependencies ( not neccessarily in this sheet ), that depend on the range
specified. Hence to find the cells that depend on a cell you have just altered
you must search all the range structures in the range_hash for that sheet.
The DependencySingle structure mapping stores the degenerate case of
a DependencyRange. Essentialy it stores the cells that depend on a unit range.
This allows for extremely fast constant time hashed lookup. This contrasts with
the Range hash, all of which has to be traversed per dependency calculation.
NB. the DependencySingle has to use CellPos' since there is no garentee that
a cell will exist at a given position in the sheet that is depended on.
2. Dependent Evaluation
The routine dependent_eval will evaluate a dependent and if its value
changes it will queue cells that depend on it, these are obtained via
cell_forech_dep, which looks up in the single_hash and tranverses the
range_hash to obtain its result. Often cell recalculation happens through
workbook_recalc which works through the workbook's eval_queue re-evaluating
the cells there.
2.1 Evaluation queue vs. recursion
There are two ways in which a cell's recalculation can occur. Firstly
simply traversing the ExprTree (expr.h) will result in many cells being
evaluated. Essentialy each dereference in the ExprTree ( eg. =A1 ) will cause
the re-calculation of A1's ExprTree before we can continue ( recursively ).
This is actually fairly expensive when the expression contains a reference like
Each dependent can be in two states
3. Dependencies the bottleneck
Since dependencies tend to fan out, and massively increase the area that
needs to be recomputed, it is clearly advantagous to spend time culling the
dependency tree as intelligently as possible. Furthermore since for each cell
that changes it is neccessary to determine its dependencies the lookup of
a cell's dependencies needs to be fast.
3.1 Why two methods
First, consider the case where only range dependencies are used, this
is a fairly simple first implementation. If we have N cells that have
random dependencies on one other cell, then we will have approx N ranges in
the range hash. For each cell we re-calculate we need to iterate over the
entire range hash to determine its dependencies. Hence we have a fundamentally
O(N^2) algorithem, this is very bad news. This scheme spends almost all of its
time in search_cell_deps.
To overcome this problem we partition dependencies into single cell
dependencies and range dependencies. This way for the common =A1 case, we don't
add an entry in the range_hash, we simply add an entry in the simple_hash.
Hence for the cell_forech_dep we have one less entry in the range hash to
iterate over, which saves N iterations of search_cell_deps.
Another common case is having a lot of formulae sharing the same range
in the sheet as a dependency ( eg. cutting and pasting = SUM($A$1:$A$100) ). In
this case there is a single depedency range entry with many cells listed as its
3.2 Inter-sheet dependencies
Inter sheet dependencies are managed simply by inserting the dependency
information into the sheet in which the cells that are dependended on reside.
This is essentialy exactly what is expected, given that cell's are linked to
the cell they depend on. Removing inter-sheet dependencies is also identical
to normal dependencies, excepting that it is more likely to throw up formulae
that have not been correctly unlinked in a sheet destroy.
3.3 What is hashed
Whilst the two hashes ( range_hash, simple_hash ) are both GHashTables
what is hashed is quite different.
3.3.1 What does the range hash do ?
The hashing on the range_hash is merely used to determine if there is
already a range in the hash with the same dimensions as a new dependency range
being added. This accelerates insertion of dependencies, the hash is traversed
as a simple un-ordered list at all other times.
3.3.2 Why not a direct Cell * -> GList * mapping for DependencySingle ?
This is not done since there is no garentee that cells that have
dependencies are in existance yet. Hence it is quite valid for A1 to be '=A2'
before A2 exists. If A2 does not exist then A2 has no Cell structure yet. This
could be obviated by creating depended cells, but this would be inelegant.
4. How dependencies are generated and removed
The dependencies are both generated and mostly removed by the
handle_tree_deps function. This traverses the ExprTree either adding or
removing dependencies on cells as they are met.
4.1 Handling ExprTrees
The ExprTree is recursively traversed by handle_tree_deps and may
terminate with handle_value_deps terminating in either adding or removing
dependencies, according to the 'add' parameter.
4.2 Removal of dependencies
The removal of single dependencies is performed by traversing the
ExprTree again, this saves a search of every cell in the dependency hash
looking for this Cell's position. This relies on sheet_cell_remove_from_hash,
and sheet_cell_add_to_hash dropping and adding dependencies, and
formula_unlink / link when a cell's formula is changed, since the original
ExprTree is needed to remove its dependencies correctly. The current code
implements this correctly internaly, but this needs bearing in mind if
extensive work is done to cell.c.
4.3 Special cases
4.3.1 Implicit intersection
This is as yet unimplemented, but will further reduce the number of
ranges to clip against. Essentialy an implicit intersection reduces a range
to an adjacent single reference under certain circumstances.
4.3.2 Array Formula
These luckily have a simple dependency structure, since the formula
stored is identical in each cell, the cells may all depend on the corner cell
using a fast single mapping.
4.3.3 The INDIRECT function
This is rather a special case; this function returns a value that
references a different cell, hence the dependency has to be treated rather
differently. This is yet to be implemented.
5.1 Future Expansion
There are several avenues open for future expansion. Clearly further
accelerating the range search will give big speedups on large sheets. This
could be done by clipping the dependency ranges several times against smaller
ranges homing in on the cell of interest, and storing the results for future
reference. Clearly many of the MStyle range related optimizations would be
useful here as well.
5.2 Multi-threading,
With the current structure, it might well be possible to add multi-
threading support to the evaluation engine. The structure of this would take
advantage of the partitioning already provided by the sheet boundary. To do
this it would be neccessary to move the eval_queue to a per-sheet entity, and
putting a locking / signaling mechanism on the queue such that inter-sheet
dependencies could be pre-pended to the queue ( thus ensuring rapid
evaluation ), and waited on safely. Since each cell is evaluated but once
per re-calc, it would then be safe to read the Cell's value when it dissapeared
from the eval_queue.
5.3 ExprTree recursion
Whether it is always entirely neccessary to re-evaluate a cell solely
on the basis that it is in the ExprTree is non-obvious to me. Clearly if this
cell is in the dependency queue it would make perfect sense, however if there
is as yet no chance that this cell has been changed, it makes little sense
to re-calculate it ( and its tree'd dependencies ). The only problem here is
determining whether any of the currently queued dependencies would alter this
cell's dependencies.
<!DOCTYPE book PUBLIC "-//Davenport//DTD DocBook V3.0//EN" [
<book id="gnumeric-design">
<title>The Gnumeric Spreadsheet Internal Design</title>
<releaseinfo>July 2, 1998</releaseinfo>
<surname>de Icaza</surname>
<shortaffil>Instituto de Ciencias Nucleares, Universidad
Nacional Autónoma de México
<date>September 2, 1998; July 2, 1999.</date>
<title>Gnumeric Internal Design</title>
<title>Design Goals</title>
The Gnumeric Spreadsheet is a spreadsheet that is intended to
grow in the future to provide all of the features available in
commercial-grade spreadsheets.
I am not only intending to provide Gnumeric with a very good
computation engine, but I am also interested in making the GUI
experience for the user as pleasant as possible, and that
includes taking a lot of ideas from existing spreadsheets.
Gnumeric should be compatible as much as possible with
Microsoft Excel in terms of the user. This means formulae and
expressions should be compatible unless we find a serious
design problem in Excel.
An example of a design problem in Excel is the fact that
various internal functions have limits on the number of
arguments they can take: this is just bad coding and this sort
of limitation is unacceptable in Gnumeric. When writing code
for Gnumeric, no hard coded limits should be set (currently
Gnumeric breaks this rule by having hardcoded, in a few places,
the maximum number of columns to be 256).
<title>Basic organization</title>
The Gnumeric spreadsheet is basically organized as Workbooks
(see src/sheet.h for the definition of the workbook object).
There might be various workbooks loaded at the same time.
Every one of these workbooks might have a variable number of
Sheets (see src/sheet.h for the definition of the Sheet
object), and finally each sheet contains cells (The definition
of a Gnumeric Cell is in src/cell.h).
Workbooks only take care of keeping various sheets
together and giving a name to them.
Item Sheets are the repository of information: cells are
kept here, information on the columns and rows is kept
here, and the styles attached to the regions are also
kept here.
Sheets might have multiple views, this is required to
support split views and in the future to support the
GNOME document model. The actual front-end to the Sheet
object is the SheetView object: SheetView object each
one has a number of components:
Their scrollbars.</para>
<listitem><para> Their cell display engine (more in a
<listitem><para> Their bar display (column and row
<para>The cell display engine is basically a modified
GnomeCanvas that can deal with keystrokes and can do some
extra spreadsheet oriented tasks. This cell display
engine is of type GnumericSheet.</para>
<para>GnumericSheet objects usually contain a number of
Gnome Canvas items specially designed to be used for a
spreadsheet: the Grid Item and the Cursor Item:</para>
<listitem><para> The Grid item takes care of rendering the
actual contents of the Sheet object and displaying the
Cells in the way the user requested, this is the
actual "core" display engine.</para>
<listitem><para> The Cursor item is the item that actually
draws the spreadsheet cursor. This item, as every
other Gnome Canvas item can take events and this one
is specially interesting, as it provides the basic
facilities for dragging a region of cells. </para>
During the course of a user session, Gnumeric will
create Items of type Editor, a special item designed to
display the contents of a cell as it is being typed and
it is syncronized with a GtkEntry (to provide an
Excel-like way of typing text into the cells).
Sheets contain information for columns and rows in doubly
linked lists to facilitate their traversal (in the future,
when bottlenecks are identified, we will provide an
alternate quick access method based on hash tables, but the
information will still be linked forward and
The column and row information is stored in GList's that
contain ColRowInfo structures. These structures include a
number of interesting bits:
Their assigned position (or -1 if they are the "default"
style), field name "pos".
The actual width used in pixels (for the current
magnification setting) as well as their logical size,
plus the margins required in pixels for displaying
various cell adornements.
When a cell is allocated, both ColRowInfos (for column and
row) are allocated and properly linked, the cell is made to
point to these new structures.
The column ColRowInfos have a field called "data", this is a
linked list of Cells for every row where a cell
Cells are stored in a hash table inside the Sheet data
structure for quick retrieval and they are also linked
properly in their respective columns.
A cell might display information in more than one column, so
Gnumeric needs to keep track of which cells (column, row
pairs) are being managed by which cell. This information is
kept in a per-row fashion in the data pointer in the
ColRowInfo structure for the rows. The registration and
unregistration of the view areas code is on the cellspan.c
<title>Formula storage</title>
When a formula is encountered, Gnumeric parses the formula
into a tree structure. The tree structure is later used to
evaluate the expression at a given location (each cordinate
in a cell references is stored as either an absolute
reference or a relative reference to the cell position). To
read about the actual implementation of this, look in the
files <filename>gnumeric/src/expr.h</filename>,
To speed up formula duplication, Gnumeric reference counts
the parsed expression, this allow for quick duplication of p
expressions. It should be noted that some file formats (the
Excel file format) uses formula references, which can
preserve the memory saving features of reference counting.
Currently Gnumeric does not provide any way to keep these
references, a possible scheme for saving these cells correctly
would be to keep in a special list any formula being
saved that has a reference count bigger than one and keep
track of these duplicates and generate formula references in
the output file rather than outputing the actual formula.
<title>Resource management</title>
Data structures in Gnumeric are lightweight, they are
designed to consume little memory by reusing as much
information as possible. This is achieved by making common
information be hashed and reference-counted. This is done
with strings, parser/interprester symbols and styles.
To learn more about this, read:
<para>for strings:
<filename>gnumeric/src/str.c</filename> and
<para>for styles:
<filename>gnumeric/src/style.c</filename> and
for symbols:
<filename>gnumeric/src/symbol.c</filename> and
Gnumeric's plugin API is very simple. Every plugin must have
one function: init_plugin see
<filename>gnumeric/src/plugin.h</filename> for the full details.
This file rapidly gets out of date: see ../TODO
Embedding of graphs and charts will be implemented via
Guppi through the Bonobo document model.
Guppi information:
......@@ -4,21 +4,5 @@
translating.sgml \
Design \
Future-Roadmap \
saving.txt \
writing-functions.sgml \
excel-format-doc.txt \
linux-expo-99-gnumeric.tex \
linuxexpo.sty \
Dependencies.txt \
Styles.txt \
features.txt \
pref-attributes.txt \
guile-gnumeric.txt \
python-gnumeric.txt \
stf-export.txt \
stf-parser.txt \
......@@ -2,11 +2,8 @@ Each language has its own directory where the translated documentation
goes. 'C' being the default (English) language.
In order to avoid useless weight it is a good idea to use the images
in the C/images/ directory, that is using relative paths as
"../C/images/image_name" and only use "images/image_name" (that is using
in the C/figures/ directory, that is using relative paths as
"../C/figures/image_name" and only use "figures/image_name" (that is using
an images/ directory under the specific language directory) for those
images where a translation is useful (images whith text strings for
A discussion of the style code, version 0.2
by Jody Goldberg <>
In order to solve several of the problems associated with blank cells
and to simplify the way styles are handled a new approach to styles was
developed by Michael Meeks. This solved all of the problems it set out to
address, but became bogged down in performace issues as attempts were made to