Overview
This document describes the high-level architecture of nearcore. The focus here is on the implementation of the blockchain protocol, not the protocol itself. For reference documentation of the protocol, please refer to nomicon
Some parts of our architecture are also covered in this video series on YouTube.
Bird's Eye View
If we put the entirety of nearcore onto one picture, we get something like this:
Don't worry if this doesn't yet make a lot of sense: hopefully, by the end of this document the above picture would become much clearer!
Overall Operation
nearcore
is a blockchain node -- it's a single binary (neard
) which runs on
some machine and talks to other similar binaries running elsewhere. Together,
the nodes agree (using a distributed consensus algorithm) on a particular
sequence of transactions. Once transaction sequence is established, each node
applies transactions to the current state. Because transactions are fully
deterministic, each node in the network ends up with identical state. To allow
greater scalability, NEAR protocol uses sharding, which allows a node to hold
only a small subset (shard) of the whole state.
neard
is a stateful, restartable process. When neard
starts, the node
connects to the network and starts processing blocks (block is a batch of
transactions, processed together; transactions are batched into blocks for
greater efficiency). The results of processing are persisted in the database.
RocksDB is used for storage. Usually, the node's data is found in the ~/.near
directory. The node can be stopped at any moment and be restarted later. While
the node is offline it misses the block, so, after a restart, the sync process
kicks in which brings the node up-to-speed with the network by downloading the
missing bits of history from more up-to-date peer nodes.
Major components of nearcore:
-
JSON RPC. This HTTP RPC interface is how
neard
communicates with non-blockchain outside world. For example, to submit a transaction, some client sends an RPC request with it to some node in the network. From that node, the transaction propagates through the network, until it is included in some block. Similarly, a client can send an HTTP request to a node to learn about current state of the blockchain. The JSON RPC interface is documented here. -
Network. If RPC is aimed "outside" the blockchain, "network" is how peer
neard
nodes communicate with each other within the blockchain. RPC carries requests from users of the blockchain, while network carries various messages needed to implement consensus. Two directly connected nodes communicate by sending protobuf-encoded messages over TCP. A node also includes logic to route messages for indirect peers through intermediaries. Oversimplifying a lot, it's enough for a new node to know an IP address of just one other network participant. From this bootstrap connection, the node learns how to communicate with any other node in the network. -
Client. Somewhat confusingly named, client is the logical state of the blockchain. After receiving and decoding a request, both RPC and network usually forward it in the parsed form to the client. Internally, client is split in two somewhat independent components: chain and runtime.
-
Chain. The job of chain, in a nutshell, is to determine a global order of transactions. Chain builds and maintains the blockchain data structure. This includes block and chunk production and processing, consensus, and validator selection. However, chain is not responsible for actually applying transactions and receipts.
-
Runtime. If chain selects the order of transactions, Runtime applies transaction to the state. Chain guarantees that everyone agrees on the order and content of transactions, and Runtime guarantees that each transaction is fully deterministic. It follows that everyone agrees on the "current state" of the blockchain. Some transactions are as simple as "transfer X tokens from Alice to Bob". But a much more powerful class of transactions is supported: "run this arbitrary WebAssembly code in the context of the current state of the chain". Running such "smart contract" transactions securely and efficiently is a major part of what Runtime does. Today, Runtime uses a JIT compiler to do that.
-
Storage. Storage is more of a cross-cutting concern, than an isolated component. Many parts of a node want to durably persist various bits of state to disk. One notable case is the logical state of the blockchain, and, in particular, data associated with each account. Logically, the state of an account on a chain is a key-value map:
HashMap<Vec<u8>, Vec<u8>>
. But there is a twist: it should be possible to provide a succinct proof that a particular key indeed holds a particular value. To allow that internally the state is implemented as a persistent (in both senses, "functional" and "on disk") merkle-patricia trie. -
Parameter Estimator. One kind of transaction we support is "run this arbitrary, Turing-complete computation". To protect from a
loop {}
transaction halting the whole network, Runtime implements resource limiting: each transaction runs with a certain finite amount of "gas", and each operation costs a certain amount of gas to perform. Parameter estimator is essentially a set of benchmarks used to estimate relative gas costs of various operations.
Entry Points
neard/src/main.rs
contains the main function that starts a blockchain node.
However, this file mostly only contains the logic to parse arguments and
dispatch different commands. start_with_config
in nearcore/src/lib.rs
is the
actual entry point and it starts all the actors.
JsonRpcHandler::process
in the jsonrpc
crate is the RPC entry point. It
implements the public API of a node, which is documented
here.
PeerManagerActor::spawn
in the network
is an entry for the other point of
contract with the outside world -- the peer-to-peer network.
Runtime::apply
in the runtime
crate is the entry point for transaction
processing logic. This is where state transitions actually happen, after chain
decided, according to distributed consensus, which transitions need to
happen.
Code Map
This section contains some high-level overview of important crates and data structures.
core/primitives
This crate contains most of the types that are shared across different crates.
core/primitives-core
This crate contains types needed for runtime.
core/store/trie
This directory contains the MPT state implementation. Note that we usually use
TrieUpdate
to interact with the state.
chain/chain
This crate contains most of the chain logic (consensus, block processing, etc).
ChainUpdate::process_block
is where most of the block processing logic
happens.
State update
The blockchain state of a node can be changed in the following two ways:
- Applying a chunk. This is how the state is normally updated: through
Runtime::apply
. - State sync. State sync can happen in two cases:
- A node is far enough behind the most recent block and triggers state sync to fast forward to the state of a very recent block without having to apply blocks in the middle.
- A node is about to become validator for some shard in the next epoch, but it
does not yet have the state for that shard. In this case, it would run state
sync through the
catchup
routine.
chain/chunks
This crate contains most of the sharding logic which includes chunk creation,
distribution, and processing. ShardsManager
is the main struct that
orchestrates everything here.
chain/client
This crate defines two important structs, Client
and ViewClient
. Client
includes everything necessary for the chain (without network and runtime) to
function and runs in a single thread. ViewClient
is a "read-only" client that
answers queries without interfering with the operations of Client
.
ViewClient
runs in multiple threads.
chain/network
This crate contains the entire implementation of the p2p network used by NEAR blockchain nodes.
Two important structs here: PeerManagerActor
and Peer
. Peer manager
orchestrates all the communications from network to other components and from
other components to network. Peer
is responsible for low-level network
communications from and to a given peer (more details in
this article). Peer manager runs in one thread while each
Peer
runs in its own thread.
Architecture Invariant: Network communicates to Client
through
NetworkClientMessages
and to ViewClient
through NetworkViewClientMessages
.
Conversely, Client
and ViewClient
communicates to network through
NetworkRequests
.
chain/epoch_manager
This crate is responsible for determining validators and other epoch related information such as epoch id for each epoch.
Note: EpochManager
is constructed in NightshadeRuntime
rather than in
Chain
, partially because we had this idea of making epoch manager a smart
contract.
chain/jsonrpc
This crate implements JSON-RPC API server to enable
submission of new transactions and inspection of the blockchain data, the
network state, and the node status. When a request is processed, it generates a
message to either ClientActor
or ViewClientActor
to interact with the
blockchain. For queries of blockchain data, such as block, chunk, account, etc,
the request usually generates a message to ViewClientActor
. Transactions, on
the other hand, are sent to ClientActor
for further processing.
runtime/runtime
This crate contains the main entry point to runtime -- Runtime::apply
. This
function takes ApplyState
, which contains necessary information passed from
chain to runtime, a list of SignedTransaction
and a list of Receipt
, and
returns an ApplyResult
, which includes state changes, execution outcomes, etc.
Architecture Invariant: The state update is only finalized at the end of
apply
. During all intermediate steps state changes can be reverted.
runtime/near-vm-logic
VMLogic
contains all the implementations of host functions and is the
interface between runtime and wasm. VMLogic
is constructed when runtime
applies function call actions. In VMLogic
, interaction with NEAR blockchain
happens in the following two ways:
VMContext
, which contains lightweight information such as current block hash, current block height, epoch id, etc.External
, which is a trait that contains functions to interact with blockchain by either reading some nontrivial data, or writing to the blockchain.
runtime/near-vm-runner
run
function in runner.rs
is the entry point to the vm runner. This function
essentially spins up the vm and executes some function in a contract. It
supports different wasm compilers including wasmer0, wasmer2, and wasmtime
through compile-time feature flags. Currently we use wasmer0 and wasmer2 in
production. The imports
module exposes host functions defined in
near-vm-logic
to WASM code. In other words, it defines the ABI of the
contracts on NEAR.
neard
As mentioned before, neard
is the crate that contains that main entry points.
All the actors are spawned in start_with_config
. It is also worth noting that
NightshadeRuntime
is the struct that implements RuntimeAdapter
.
core/store/src/db.rs
This file contains the schema (DBCol) of our internal RocksDB storage - a good starting point when reading the code base.
Cross Cutting Concerns
Observability
The tracing crate is used for structured, hierarchical event output and logging. We also integrate Prometheus for light-weight metric output. See the style documentation for more information on the usage.
Testing
Rust has built-in support for writing unit tests by marking functions
with the #[test]
directive. Take full advantage of that! Testing not
only confirms that what was written works the way it was intended to but
also helps during refactoring since it catches unintended behaviour
changes.
Not all tests are created equal though and while some may only need
milliseconds to run, others may run for several seconds or even
minutes. Tests that take a long time should be marked as such by
prefixing their name with slow_test_
or ultra_slow_test_
:
#![allow(unused)] fn main() { #[test] fn ultra_slow_test_catchup_random_single_part_sync() { test_catchup_random_single_part_sync_common(false, false, 13) } }
During local development both slow and ultra-slow tests will not run
with a typical just nextest
invocation. You can run them with just nextest-slow
or just nextest-all
locally. CI will run the slow
tests and the ultra_slow
ones are left to run on Nayduck.
Because ultra_slow
tests are run on nayduck, they need to be explicitly
included in nightly/expensive.txt
file; for example:
expensive --timeout=1800 near-client near_client tests::catching_up::ultra_slow_test_catchup_random_single_part_sync
expensive --timeout=1800 near-client near_client tests::catching_up::ultra_slow_test_catchup_random_single_part_sync --features nightly
For more details regarding nightly tests see nightly/README.md
.
Note that what counts as a slow test is defined in
.config/nextest.toml
.