GCancellable: Use atomic logic instead of global mutex-based critical sections

GCancellable is meant to be used in multi-thread operations but all the
cancellable instances were sharing a single mutex to synchronize them
which can be less optimal when many instances are in place.
Especially when we're doing a lock/unlock dances that may leave another
thread to take the control of a critical section in an unexpected way.

This in fact was leading to some races in GCancellableSources causing
leaks because we were assuming that the "cancelled" callback was always
called before our dispose implementation.

As per this, just move to use atomic logic, with few spin-locks when
needed, given that in such cases the threads don't really need to be
paused for long times, so it's just more efficient than using mutexes
and conditionals, as it avoids context switches.

The lock is also now used only to protect the calls that may interact
with cancelled state or that depends on that, as per this we can just
reduce it to the cancel and reset case, other than to the connect one to
prevent the race that we could have when connecting to a cancellable
that is reset from another thread.

We don't really need to release the locks during callbacks now as they
are per instance, and there's really no function that we allowed to call
during a ::cancelled signal callback that may require an unlocked state.
This could been done in case with a recursive lock, that is easy enough
to implement but not really needed for this case.

Fixes: #2309, #2313
9 jobs for atomic-gcancellable in 31 minutes and 53 seconds (queued for 1 minute and 59 seconds)