Benchmarking synthetic workloads

Benchmarking a synthetic workload starts a new network with empty state. Then state is created and afterwards transactions involving that state are generated. For example, the native token transfer workload creates n accounts with NEAR balance and then generates transactions to transfer the native token between accounts.

This approach has the following benefits:

Relatively simple and quick setup, as there is no state from real work networks involved.
Fine grained control over traffic intensity.
Enabling the comparison of neard performance at different points in time or with different features.
Might expose performance bottlenecks.

The main drawbacks of synthetic benchmarks are:

Drawing conclusions is limited as real world traffic is not homogeneous.
Calibrating traffic generation parameters can be cumbersome.

The tooling for synthetic benchmarks is available in benchmarks/synth-bm.

Workflows

The tooling's justfile contains recipes for the most relevant workflows.

Benchmark native token transfers

A typical workflow benchmarking the native token transfers using the above justfile would be something along the:

set up the network

rm -rf .near && just init_localnet
# Modify the configuration (see the "Un-limit configuration" section)
[t1]$ just run_localnet
[t1]$ just create_sub_accounts

run the benchmark

# set the desired tx rate (`--interval-duration-micros`) and the total volume (`--num-transfers`) in the justfile
[t2]$ just benchmark_native_transfers

This benchmark generates a native token transfer workload involving the accounts provided in --user-data-dir. Transactions are generated by iterating through these accounts and sending native tokens to a randomly chosen receiver from the same set of accounts. To view all options, run:

cargo run --release -- benchmark-native-transfers --help

For the native transfer benchmark transactions are sent with wait_until: None, meaning the responses the near_synth_bm tool receives are basically just an ACK by the RPC confirming it received the transaction. Thus the numbers reported by the tool as if in

[2025-01-27T14:05:12Z INFO  near_synth_bm::native_transfer] Sent 200000 txs in 6.50 seconds
[2025-01-27T14:05:12Z INFO  near_synth_bm::rpc] Received 200000 tx responses in 6.49 seconds

are not directly indicative of the runtime performance and transaction outcomes. The number of transactions successfully processed may be obtained by querying the near_transaction_processed_successfully_total metric, e.g. with: http://localhost:3030/metrics | grep transaction_processed. Automatic calculation of transactions per second (TPS) when RPC requests are sent with wait_until: NONE is coming up shortly.

Benchmark calls to the `sign` method of an MPC contract

Assumes the accounts that send the transactions invoking sign have been created as described above. Transactions can be sent to a RPC of a network on which an instance of the mpc/chain-signatures is deployed.

Transactions are sent to the RPC with wait_until: EXECUTED_OPTIMISTIC as the throughput for sign is at a level at which neither the network nor the RPC are expected to be a bottleneck.

All options of the command can be shown with:

cargo run -- benchmark-mpc-sign --help

Auxiliary steps

Network setup and `neard` configuration

Details of bringing up and configuring a network are out of scope for this document. Instead we just give a brief overview of the setup regularly used to benchmark TPS of common workloads in a single-node with a single-shard setup.

Build `neard`

Choose the git commit and cargo features corresponding to what you want to benchmark. Most likely you will want a --release build to measure TPS. Place the corresponding neard binary in the justfile's directory or set the NEARD_PATH environment variable to point to it.

Create sub accounts

Creating the state for synthetic benchmarks usually starts with creating accounts. We create sub accounts for the account specified by --signer-key-path. This avoids dealing with the registrar, which would be required for creating top level accounts. To view all options, run:

cargo run --release -- create-sub-accounts --help

Initialize the network

./neard --home .near init --chain-id localnet

Enable memtrie

The configuration generated by the above command does not enable memtrie. However, most benchmarks should run against a node with memtrie enabled, which can be achieved by setting the following in .near/config.json:

"load_mem_tries_for_tracked_shards": true

Un-limit configuration

Following these steps so far creates a config that will throttle throughput due to various factors related to state witness size, gas/compute limits, and congestion control. In case you want to benchmark a node that fully utilizes its hardware, you can do the following modifications to effectively run with unlimited configuration:

# Modifications in .near/genesis.json

"chain_id": "benchmarknet"
"gas_limit": 20000000000000000               # increase default by x20

# Modifications in .near/config.json
"view_client_threads": 8                     # increase default by x2
"load_mem_tries_for_tracked_shards": true    # enable memtrie 
"produce_chunk_add_transactions_time_limit": {
  "secs": 0,
  "nanos": 800000000                         # increase default by x4
}

Note that as nearcore evolves, these steps and BENCHMARKNET adjustments might need to be updated to achieve the effect of un-limiting configuration.

Modifications of genesis.json need to be applied before initializing the network with just init_localnet. Otherwise just run_localnet will fail. If you ran the node with default config and want to switch to unlimited config, the required steps are:

# Remove .near as you will need to initialize localnet again.
$ rm -rf .near
$ just init_localnet
# Modify the configuration
$ just run_localnet

Common parameters

The following parameters are common to multiple tasks:

`rpc-url`

The RPC endpoint to which transactions are sent.

Synthetic benchmarking may create thousands of transactions per second, which can hit network limitations if the RPC is located on a separate machine. In particular sending transactions to nodes running on GCP requires care as it can cause temporary IP address bans. For that scenario it is recommended to run a separate traffic generation vm located in the same GCP zone as the RPC node and send transactions to its internal IP.

`interval-duration-micros`

Controls the rate at which transactions are sent. Assuming your hardware is able to send a request at every interval tick, the number of transactions sent per second equals 1_000_000 / interval-duration-micros. The rate might be slowed down if channel-buffer-size becomes a bottleneck.

`channel-buffer-size`

Before an RPC request is sent, the tooling awaits capacity on a buffered channel. Thereby the number of outstanding RPC requests is limited by channel-buffer-size. This can slow down the rate at which transactions are sent in case the node is congested. To disable that behavior, set channel-buffer-size to a large value, e.g. the total number of transactions to be sent.

Guide to Nearcore Development