Garbage Collection
This document covers the basics of Chain garbage collection.
Currently we run garbage collection only in non-archival nodes, to keep the size of the storage under control. Therefore, we remove blocks, chunks and state that is ‘old’ enough - which in current configuration means 5 epochs ago.
We run a single ‘round’ of GC after a new block is accepted to the chain - and in order not to delay the chain too much, we make sure that each round removes at most 2 blocks from the chain.
For more details look at function clear_data()
in file chain/chain/src/chain.rs
How it works:
Imagine the following chain (with 2 forks)
In the pictures below, let’s assume that epoch length is 5 and we keep only 3 epochs (rather than 5 that is currently set in production) - otherwise the image becomes too large 😉.
If head is in the middle of the epoch, the gc_stop
will be set to the first
block of epoch T-2, and tail
& fork_tail
will be sitting at the last block of
epoch T-3.
(and no GC is happening in this round - as tail is next to gc_stop
).
Next block was accepted on the chain (head jumped ahead), but still no GC happening in this round:
Now interesting things will start happening once head ‘crosses’ over to the next epoch.
First, the gc_stop
will jump to the beginning of the next epoch.
Then we’ll start the GC of the forks: by first moving the fork_tail
to match
the gc_stop
and going backwards from there.
It will start removing all the blocks that don’t have a successor (a.k.a the tip of the fork). And then it will proceed to lower height.
Will keep going until it ‘hits’ the tail.
In order not to do too much in one go, we’d only remove up to 2 block in each run (that happens after each head update).
Now, the forks are gone, so we can proceed with GCing of the blocks from the canonical chain:
Same as before, we’d remove up to 2 blocks in each run:
Until we catch up to the gc_stop
.