Port the loading code to Rust
The loading code is in rsvg-load.c
. It has several responsibilities, which may be nice to split apart:
-
rsvg_load_new()
creates anRsvgLoad
object. It holds the loading state (see below) and the part of the element tree that has been loaded so far. -
The two entry points for actual loading are
rsvg_load_write()
andrsvg_load_read_stream_sync()
. The first one means, "the caller is using the mostly-obsoletersvg_handle_write()
API", and that mode gets finished withrsvg_load_close()
. The second one means, "the caller is using the modernGInputStream
API". Unfortunately the first mode needs to jump through some hoops to handle compressed SVG data; it must buffer the entire thing and finally pass it to a stream reader in the end. -
Depending on the load mode, we create a libxml2 parser either with
create_xml_push_parser()
forrsvg_load_write()
, or withcreate_xml_stream_parser()
forGInputStream
. -
All the rest of the file deals with handling SAX events from libxml2.
Loading state
LoadState
is only meaningful if using the write()
/ close()
mode. It tracks whether compressed data is being read or not. If the loading mode is with a GInputStream
, the streams handle everything nicely themselves.
I don't think we need to port this part to Rust just yet.
XML events
This part would be really nice to port to Rust. I want to experiment with switching to a Rust XML parser, but so far none of the available ones support all the things that libxml2 does (mainly, support for XML entity expansion / doctype parsing, with guards for billion-laughs attacks and such).
We use libxml2's SAX API, which means we register callbacks and get called when stuff gets read from the XML stream. For example, there are callbacks for "element start", "element end", "read character data", etc.
In contrast, most Rust XML parsers use a STAX model, where one asks the parser for the next event and it returns an enum similar to
enum XmlEvent {
StartElement(name, attribute_value_list),
EndElement(name),
CharacterData(chars),
... etc ...
}
We can probably map libxml2's SAX callbacks to Rust-friendly events like that. Then we hand them off to a Rust-side process_xml_event(ev: XmlEvent)
function or something within the loading context.
Node
and auxiliary data
Creating the element tree of RsvgLoad
builds up the element tree in load->treebase
, but it calls rsvg_add_node_to_handle()
to tell the RsvgHandle
to put a pointer to the node in the main handle->priv->all_nodes
array. This is the actual owner of the nodes; that's where they are freed from when RsvgHandle
is finalized.
The end of the loading process is when rsvg_load_destroy()
returns the treebase
to the handle. We could maintain the all_nodes
inside the loading process and return that array as well.
Within the element creation functions, there is also a call to rsvg_defs_register_node_by_id()
somewhere - this maintains the map of ids to nodes. It is within RsvgHandlePrivate
; the loading process could maintain that as well and finally hand that off to the handle.
Summary
-
Split the XML logic into libxml2-specific stuff and general STAX-like event handlers (these last ones are easier to do in Rust).
-
Have the loading process maintain all the auxiliary data, instead of depending on RsvgHandle.
-
Pass all the auxiliary data back to RsvgHandle at the very end of the loading process.