Garbage Collection

This document covers the basics of Chain garbage collection.

Currently we run garbage collection only in non-archival nodes, to keep the size of the storage under control. Therefore, we remove blocks, chunks and state that is ‘old’ enough - which in current configuration means 5 epochs ago.

We run a single ‘round’ of GC after a new block is accepted to the chain - and in order not to delay the chain too much, we make sure that each round removes at most 2 blocks from the chain.

For more details look at function clear_data() in file chain/chain/src/chain.rs

How it works:

Imagine the following chain (with 2 forks)

In the pictures below, let’s assume that epoch length is 5 and we keep only 3 epochs (rather than 5 that is currently set in production) - otherwise the image becomes too large 😉.

If head is in the middle of the epoch, the gc_stop will be set to the first block of epoch T-2, and tail & fork_tail will be sitting at the last block of epoch T-3.

(and no GC is happening in this round - as tail is next to gc_stop).

Next block was accepted on the chain (head jumped ahead), but still no GC happening in this round:

Now interesting things will start happening once head ‘crosses’ over to the next epoch.

First, the gc_stop will jump to the beginning of the next epoch.

Then we’ll start the GC of the forks: by first moving the fork_tail to match the gc_stop and going backwards from there.

It will start removing all the blocks that don’t have a successor (a.k.a the tip of the fork). And then it will proceed to lower height.

Will keep going until it ‘hits’ the tail.

In order not to do too much in one go, we’d only remove up to 2 block in each run (that happens after each head update).

Now, the forks are gone, so we can proceed with GCing of the blocks from the canonical chain:

Same as before, we’d remove up to 2 blocks in each run:

Until we catch up to the gc_stop.