Running a Mina Archive Node
The article is written by Gareth Davies and was first published on Arp 27th 2021 on Medium HERE. It is reproduced here with permission under a creative commons licence. You can follow Gareth on Twitter HERE
Mina is a succinct blockchain, and as a result, consensus nodes only store the recent history of the chain before discarding it (the last k
blocks, currently 290).
While prior transaction history is not required to prove the current state is valid (this is handled via a recursive zero-knowledge proof), many applications would like access to this prior transaction history. Examples include block explorers and wallets.
To solve this problem, users may optionally run an archive node that stores a summary of each block seen in a Postgres database. The archive node is just a regular mina daemon that connects to a running archive process.
How much does the archive node grow? After seeing 28,500 blocks on mainnet the size as reported by
pg_size_pretty( pg_database_size('archiver') );
is 74 MB.
Setting up
Running an archive node is comprised of running the following components:
- A mina daemon.
- The archive node package.
- A Postgres node, with a database created with the archive node schema.
Links to the relevant downloads and installation instructions are available in the official documentation.
Adding Redundancy
While the above configuration of a single daemon writing to a single archive node process is the most straightforward, it offers little redundancy if either the daemon or archive process crashes. In those cases, blocks will likely be lost and need to be recovered from other sources (or re-syncing the node if within the last k
blocks).
For redundancy, multiple daemons can write to a single archive node process by each specifying the address of an archive process.
Further, we can have multiple daemons, pointing to separate archive nodes, both writing to a single Postgres database. Note that in this instance, to ensure consistency of the data, the database should be modified after creating the database and before connecting an archive process to it.
ALTER DATABASE <DATABASE NAME> SET DEFAULT_TRANSACTION_ISOLATION TO SERIALIZABLE ;
If running Postgres via a cloud service such as with AWS or GCP, you can further add redundancy with replication and failover by following these best practices.
Alternate storage for block data
In addition to writing the block data to the Postgres database, we can also archive a representation of the block data, known as a precomputed block to either the logs, or upload them to Google Cloud Storage. As a result, a final, fully-redundant setup may look like this:
The precomputed blocks can be huge (as much as 5 MB per block) and much larger than is stored by the archive node. If using
-log-precomputed-blocks
ensure that your logging service can handle such long log lines.
Monitoring
Even with the above setup, likely, you will want to monitor the status of your archive node, and in the case of data loss, be able to restore blocks to the database. The archive node has some available tooling, detailed below, that will identify individual missing blocks, but for a quick overview of the status of the database, the following two queries can be used in conjunction.
SELECT count( * )
FROM (SELECT h::int FROM generate_series(1 , (select max(height) from blocks)) h
LEFT JOIN blocks b
ON h = b.height where b.height is null) as v
This query determines if there is a block for every height (up to the maximum height seen). This query should return 0
where there are no missing blocks at any height. This query does not, however, confirm that there is a canonical block at each height. As such, this query should be used in conjunction with:
select count(*) from blocks where parent_id is null
This query checks for any missing parents of a block. For a complete archive database, this query should return 1
for the Genesis block, which does not have a parent. If the two queries return 0
and 1
respectively, you are likely in good shape. If they don’t, you’ll likely need to recover some missing blocks using the archive node tooling.
Archive Tooling
To allow for data restoration, exporting, and verification, the archive node has the following tooling available:
- mina-missing-blocks-auditor — reports state hashes of blocks missing from archive database.
- mina-extract-blocks — extracts all blocks or a chain (from a provided start and end hash) from the database.
- mina-archive-blocks — writes blocks to the archive database.
- mina-replayer — replays transactions from the archive node.
Some of these tools have been renamed over time, so if missing any of them, ensure you are running the latest version of the archive node (1.1.5 at the time of writing)
mina-missing-blocks-auditor
This tool will identify any parent state hashes missing from the archive database. This tool will only identify the missing parent state hashes for blocks it has stored, i.e., if you are missing a sequential sequence of blocks, it will only return the first. The output from this tool should only return the Genesis block on a complete database, as it has no parent hash.
The following output is returned on a database with missing blocks, which allows us to identify the missing blocks, which correspond to the parent_hash
.
mina-missing-blocks-auditor --archive-uri <POSTGRES_URI>{"timestamp":"2021-04-22 16:36:21.127744Z","level":"Info","source":{"module":"Dune__exe__Missing_blocks_auditor","location":"File \"src/app/missing_blocks_auditor/missing_blocks_auditor.ml\", line 30, characters 10-21"},"message":"Block has no parent in archive db","metadata":{"block_id":1145,"parent_hash":"3NLD34ddu4i8aPF6c7cD4aDh27MFWYTwamAoVRXWBFtcTeAbcJjA","pid":32,"state_hash":"3NK3zgLmMdptx9ubM1H4LTBadSQoQJ3ouYQbR6PQQnbq5vfTpGEo"}}
{"timestamp":"2021-04-22 16:36:21.127756Z","level":"Info","source":{"module":"Dune__exe__Missing_blocks_auditor","location":"File \"src/app/missing_blocks_auditor/missing_blocks_auditor.ml\", line 30, characters 10-21"},"message":"Block has no parent in archive db","metadata":{"block_id":3386,"parent_hash":"3NLk4z4hvUgiWPMyeF7iaKp78XyqHjGUqm2vabJB6v28VLuMyLQ2","pid":32,"state_hash":"3NKrHrebU3mQUiGwynuKxWW3d3snDCP2wczeDKw7S8t54kK8VkSX"}}
mina-extract-blocks
To extract blocks from an archive node, for example, for use in recovering missing blocks on a different database, you can use the mina-extract-blocks
tool to output blocks in an extensional format. This format, as compared to the precomputed blocks, is more lightweight as it only contains the information required to restore data to the archive node.
This tool may export either all the blocks in the database with --all-blocks
. Or a chain between a range of state hashes by optionally providing the --start-state-hash
or --end-state-hash
.
mina-extract-blocks --archive-uri <POSTGRES_URI> --all-blocks
This tool outputs each extensional block as a file with the name of the file being the state hash, e.g., 3NKGgTk7en3347KH81yDra876GPAUSoSePrfVKPmwR1KHfMpvJC5.json
.
mina-archive-blocks
The mina-archive-blocks
tool is used to restore blocks to the database. This tool can be used either with precomputed blocks, i.e. those stored to logs or via Google Cloud Storage or the extensional format as exported by another archive node.
For example, to import the extensional block we exported in the previous section, we could run:
mina-archive-blocks --extensional 3NKGgTk7en3347KH81yDra876GPAUSoSePrfVKPmwR1KHfMpvJC5.json --archive-uri <POSTGRES_URI>
We can pass multiple files to the tool to import a large number of files via mina-archive-blocks
. To do this in a batch, we can make use of standard Linux tooling. For example, the following command will attempt to import all .json
files in the current directory. It will also write to separate output files for successful and failed blocks during the process for later review.
find . -name "*.json" | xargs -I % mina-archive-blocks --extensional % --archive-uri <POSTGRES_URI> --log-successful false --successful-files success.txt --failed-files failed.txt
There is a very similar approach for precomputed blocks but instead, use the --precomputed
flag.
mina-replayer
One issue when recovering blocks is that you have to trust the source of the data — for example, downloading a backup you have to trust that the data was not tampered with.
The mina-replayer
tool takes as input a genesis ledger and can be run on the archive data to produce a ledger corresponding to a protocol state and can be used to verify that the archived data is correct and complete by comparing the resulting ledger to a known one.
mina-replayer --archive-uri <POSTGRES_URI> --input-file genesis.json --output-file calculated_ledger.json
Bootstrapping
If you are new to Mina and wish to start an archive node, you will need to get an existing database to bootstrap from (as the node will only restore the last 290 blocks). While you could use the tools listed above to extract and import all blocks, it would be easier to request a database exported using the pg_dump
tool (and imported via psql
) to bootstrap the archive node from an existing archive database operator. If you need assistance then ask on the project Discord.
If the dump is recent enough, i.e., within the last 290 blocks and you sync a node, it will catchup any missing blocks between the dump and when you started the archive node.