cogl: reduce temporary allocations calculating redraw regions
When calculating regions, a lot of temporary allocations are created. For the array of rects (which is often a short number of them) we can use stack allocations up to 1 page (256 cairo_rectangle_int_t). For building a region of rectangles, cairo and pixman are much faster if you have all of the rectangles up front or else it mallocs quite a bit of temporary memory.
If we re-use the cairo_rectangle_int_t array we've already allocated (and preferably on the stack), we can delay the creation of regions until after the tight loop.
Additionally, it requires fewer allocations to union to cairo_region_t than to incrementally union the rectangles into the region.
Before (percentages are of total number of allocations)
TOTAL FUNCTION [ 100.00%] [Everything] [ 100.00%] [gnome-shell --wayland --display-server] [ 99.67%] _start [ 99.67%] __libc_start_main [ 99.67%] main [ 98.60%] meta_run [ 96.90%] g_main_loop_run [ 96.90%] g_main_context_iterate.isra.0 [ 96.90%] g_main_context_dispatch [ 90.27%] clutter_clock_dispatch [ 86.54%] _clutter_stage_do_update [ 85.00%] clutter_stage_cogl_redraw [ 84.98%] clutter_stage_cogl_redraw_view [ 81.09%] cairo_region_union_rectangle
After (overhead has much dropped)
TOTAL FUNCTION [ 100.00%] [Everything] [ 99.80%] [gnome-shell --wayland --display-server] [ 99.48%] _start [ 99.48%] __libc_start_main [ 99.48%] main [ 92.37%] meta_run [ 81.49%] g_main_loop_run [ 81.49%] g_main_context_iterate.isra.0 [ 81.43%] g_main_context_dispatch [ 39.40%] clutter_clock_dispatch [ 26.93%] _clutter_stage_do_update [ 25.80%] clutter_stage_cogl_redraw [ 25.60%] clutter_stage_cogl_redraw_view