Protocol Upgrade

This document describes the entire cycle of how a protocol upgrade is done, from the initial PR to the final release. It is important for everyone who contributes to the development of the protocol and its client(s) to understand this process.

Background

At NEAR, we use the term protocol version to mean the version of the blockchain protocol and is separate from the version of some specific client (such as nearcore), since the protocol version defines the protocol rather than some specific implementation of the protocol. More concretely, for each epoch, there is a corresponding protocol version that is agreed upon by validators through a voting mechanism. Our upgrade scheme dictates that protocol version X is backward compatible with protocol version X-1 so that nodes in the network can seamlessly upgrade to the new protocol. However, there is no guarantee that protocol version X is backward compatible with protocol version X-2.

Despite the upgrade mechanism, rolling out a protocol change can be scary, especially if the change is invasive. For those changes, we may want to have several months of testing before we are confident that the change itself works and that it doesn't break other parts of the system.

Protocol version voting and upgrade

When a new neard version, containing a new protocol version, is released, all node maintainers need to upgrade their binary. That typically means stopping neard, downloading or compiling the new neard binary and restarting neard. However the protocol version of the whole network is not immediately bumped to the new protocol version. Instead a process called voting takes place and determines if and when the protocol version upgrade will take place.

Voting is a fully automated process in which all block producers across the network vote in support or against upgrading the protocol version. The voting happens in the last block every epoch. Upgraded nodes will begin voting in favour of the new protocol version after a predetermined date. The voting date is configured by the release owner like this. Once at least 80% of the stake votes in favour of the protocol change in the last block of epoch X, the protocol version will be upgraded in the first block of epoch X+2.

For mainnet releases, the release on github typically happens on a Monday or Tuesday, the voting typically happens a week later and the protocol version upgrade happens 1-2 epochs after the voting. This gives the node maintainers enough time to upgrade their neard nodes. The node maintainers can upgrade their nodes at any time between the release and the voting but it is recommended to upgrade soon after the release. This is to accommodate for any database migrations or miscellaneous delays.

Starting a neard node with protocol version voting in the future in a network that is already operating at that protocol version is supported as well. This is useful in the scenario where there is a mainnet security release where mainnet has not yet voted or upgraded to the new version. That same binary with protocol voting date in the future can be released in testnet even though it has already upgraded to the new protocol version.

Nightly Protocol features

To make protocol upgrades more robust, we introduce the concept of a nightly protocol version together with the protocol feature flags to allow easy testing of the cutting-edge protocol changes without jeopardizing the stability of the codebase overall. The use of the nightly and nightly_protocol for new features is mandatory while the use of dedicated rust features for new protocol features is optional and only recommended when necessary. Adding rust features leads to conditional compilation which is generally not developer friendly. In Cargo.toml file of the crates we have in nearcore, we introduce rust compile-time features nightly_protocol and nightly:

nightly_protocol = []
nightly = [
    "nightly_protocol",
    ...
]

where nightly_protocol is a marker feature that indicates that we are on nightly protocol whereas nightly is a collection of new protocol features which also implies nightly_protocol.

When it is not necessary to use a rust feature for the new protocol feature the Cargo.toml file will remain unchanged.

When it is necessary to use a rust feature for the new protocol feature, it can be added to the Cargo.toml, to the nightly features. For example, when we introduce EVM as a new protocol change, suppose the current protocol version is 40, then we would do the following change in Cargo.toml:

nightly_protocol = []
nightly = [
    "nightly_protocol",
    "protocol_features_evm",
    ...
]

In core/primitives/src/version.rs, we would change the protocol version by:

#![allow(unused)]
fn main() {
#[cfg(feature = “nightly_protocol”)]
pub const PROTOCOL_VERSION: u32 = 100;
#[cfg(not(feature = “nightly_protocol”)]
pub const PROTOCOL_VERSION: u32 = 40;
}

This way the stable versions remain unaffected after the change. Note that nightly protocol version intentionally starts at a much higher number to make the distinction between the stable protocol and nightly protocol clearer.

To determine whether a protocol feature is enabled, we do the following:

  • We maintain a ProtocolFeature enum where each variant corresponds to some protocol feature. For nightly protocol features, the variant may optionally be gated by the corresponding rust compile-time feature.
  • We implement a function protocol_version to return, for each variant, the corresponding protocol version in which the feature is enabled.
  • When we need to decide whether to use the new feature based on the protocol version of the current network, we can simply compare it to the protocol version of the feature. To make this simpler, we also introduced a macro checked_feature

For more details, please refer to core/primitives/src/version.rs.

Feature Gating

It is worth mentioning that there are two types of checks related to protocol features:

  • Runtime checks that compare the protocol version of the current epoch and the protocol version of the feature. Those runtime checks must be used for both stable and nightly features.
  • Compile time checks that check if the rust feature corresponding with the protocol feature is enabled. This check is optional and can only be used for nightly features.

Testing

Nightly protocol features allow us to enable the most bleeding-edge code in some testing environments. We can choose to enable all nightly protocol features by

#![allow(unused)]
fn main() {
cargo build -p neard --release --features nightly
}

or enable some specific protocol feature by

#![allow(unused)]
fn main() {
cargo build -p neard --release --features nightly_protocol,<protocol_feature>
}

In practice, we have all nightly protocol features enabled for Nayduck tests and on betanet, which is updated daily.

Feature Stabilization

New protocol features are introduced first as nightly features and when the author of the feature thinks that the feature is ready to be stabilized, they should submit a pull request to stabilize the feature using this template. In this pull request, they should do the feature gating, increase the PROTOCOL_VERSION constant (if it hasn't been increased since the last release), and change the protocol_version implementation to map the stabilized features to the new protocol version.

A feature stabilization request must be approved by at least two nearcore code owners. Unless it is a security-related fix, a protocol feature cannot be included in any release until at least one week after its stabilization. This is to ensure that feature implementation and stabilization are not rushed.