[Discussion] Faster Data Structures for Messages
Fractal is slow for a lot of operations because they require iterating over all messages in a room. I tried to identify some of the more common operations and wrote them down. Optimizing for them, I came up with a solution which should be faster and easy to implement in Fractal as it is today.
Ideas
Idea 1 – Messages live in Rooms
The idea is to have two data structures, one per-room and one global:
- A date-ordered list of messages per room (Vec, VecDeque, LinkedList – tbd).
- A global hashmap evid → (roomid, ts, [evid]) which maps an event id to
- roomid the id of the room containing the event
- ts the server timestamp of the event
- [evid] ids of all events relating to the event
The only really surprising thing should be the ts field, which is used to get finding an event from O(n logn) down to O(log n). This is done by getting the message list for the correct room first an then performing a binary search.
Idea 2 – Messages live in a big ol’ Hashmap
The idea is very similar to idea 1, but the events live in the hashmap instead of the rooms.
- A date-ordered list of event ids per room (Vec, VecDeque, LinkedList – tbd).
- A global hashmap evid → (event, [evid]) which maps an event id to
- event an event
- [evid] ids of all events relating to the event
Analysis
A table of the complexities per operation:
Operation | Idea 1 [Vec] | Idea 1 [LL] | Idea 2 [Vec] | Idea 2 [LL] |
---|---|---|---|---|
Get roomid by eventid | O(1) | O(1) | O(1) | O(1) |
Get event by eventid | O(log n) | O(log n) | O(1) | O(1) |
New event | O(n) | O(log n) | O(n) | O(log n) |
Modify event* | O(n) | O(log n) | O(n) | O(log n) |
Keep track of event relations | O(1) | O(1) | O(1) | O(1) |
[Vec] = with Vec(Deque), [LL] = with LinkedList
* Needs reordering because timestamp and id might change
The tradeoff I made for idea 1 was keeping the messages ordered and in one place but as a price making retrieving a message more expensive. Adding a structure to speed up eventid → event lookup further is possible, but would require holding references into the lists, which is a messy thing to do in Rust as far as I know.
The tradeoff I made for idea 2 was a fast lookup of events at the cost of always having to take an extra indirection through the hashmap, which is not cache-efficient, when working with the per-room list of event ids.
The reasons to go against a LinkedList is its lacking cache friendliness compared to Vec(Deque). VecDeque is interesting as the common case is operating at the ends of the message list.
Please comment
Before stating to implement this, I would like some feedback from you a) if there are common operations missing b) if there is a better/improved solution.