Runtime Parameter Estimator

The runtime parameter estimator is a byzantine benchmarking suite. Byzantine benchmarking is not a commonly used term but I feel it describes it quite well. It measures the performance assuming that up to a third of validators and all users collude to make the system as slow as possible.

This benchmarking suite is used to check that the gas parameters defined in the protocol are correct. Correct in this context means, a chunk filled with 1 Pgas (Peta gas) will take at most 1 second to be applied. Or more generally, per 1 Tgas of execution, we spend no more than 1ms wall-clock time.

For now, nearcore timing is the only one that matters. Things will become more complicated once there are multiple client implementations. But knowing that nearcore can serve requests fast enough proves that it is possible to be at least as fast. However, we should be careful to not couple costs too tightly with the specific implementation of nearcore to allow for innovation in new clients.

The estimator code is part of the nearcore repository in the directory runtime/runtime-params-estimator.

For a practical guide on how to run the estimator, please take a look at Running the Estimator in the workflows chapter.

Code Structure

The estimator contains a binary and a library module. The main.rs contains the CLI arguments parsing code and logic to fill the test database.

The interesting code lives in lib.rs and its submodules. The comments at the top of that file provide a high-level overview of how estimations work. More details on specific estimations are available as comments on the enum variants of Cost in costs.rs.

If you roughly understand the three files above, you already have a great overview of the estimator. estimator_context.rs is another central file. A full estimation run creates a single EstimatorContext. Each estimation will use it to spawn a new Testbed with a fresh database that contains the same data as the setup in the estimator context.

Most estimations fill blocks with transactions to be executed and hand them to Testbed::measure_blocks. To allow for easy repetitions, the block is usually filled by an instance of the TransactionBuilder, which can be retrieved from a testbed.

But even filling blocks with transactions becomes repetitive since many parameters are estimated similarly. utils.rs has a collection of helpful functions that let you write estimations very quickly.

Estimation Metrics

The estimation code is generally not concerned with the metric used to estimate gas. We use let clock = GasCost::measure(); and clock.elapsed() to measure the cost in whatever metric has been specified in the CLI argument --metric. But when you run estimations and especially when you want to interpret the results, you want to understand the metric used. Available metrics are time and icount.

Starting with time, this is a simple wall-clock time measurement. At the end of the day, this is what counts in a validator setup. But unfortunately, this metric is very dependent on the specific hardware and what else is running on that hardware right now. Dynamic voltage and frequency scaling (DVFS) also plays a role here. To a certain degree, all these factors can be controlled. But it requires full control over a system (often not the case when running on cloud-hosted VMs) and manual labor to set it up.

The other supported metric icount is much more stable. It uses qemu to emulate an x86 CPU. We then insert a custom TCG plugin (counter.c) that counts the number of executed x86 instructions. It also intercepts system calls and counts the number of bytes seen in sys_read, sys_write and their variations. This gives an approximation for IO bytes, as seen on the interface between the operating system and nearcore. To convert to gas, we use three constants to multiply with instruction count, read bytes, and write bytes.

We run qemu inside a Docker container using the Podman runtime, to make sure the qemu and qemu plugin versions match with system libraries. Make sure to add --containerize when running with --metric icount.

The great thing about icount is that you can run it on different machines and it will always return the same result. It is not 100% deterministic but very close, so it can usually detect code changes that degrade performance in major ways.

The problem with icount is how unrepresentative it is for real-life performance. First, x86 instructions are not all equally complex. Second, how many of them are executed per cycle depends on instruction level pipelining, branch prediction, memory prefetching, and more CPU features like that which are just not captured by an emulator like qemu. Third, the time it takes to serve bytes in system calls depends less on the sum of all bytes and more on data locality and how it can be cached in the OS page cache. But regardless of all these inaccuracies, it can still be useful to compare different implementations both measured using icount.

From Estimations to Parameter Values

To calculate the final gas parameter values, there is more to be done than just running a single command. After all, these parameters are part of the protocol specification. They cannot be changed easily. And setting them to a wrong value can cause severe system instability.

Our current strategy is to run estimations with two different metrics and do so on standardized cloud hardware. The output is then sanity checked manually by several people. Based on that, the final gas parameter value is determined. Usually, it will be the higher output of the two metrics rounded up.

The PR #8031 to set the ed25519 verification gas parameters is a good example of how such an analysis and report could look like.

More details on the process will be added to this document in due time.