A couple of months ago, I started playing/developing around the Blockchain, given the Hype it was having lately throughout the crypto space with cryptocurrencies and all the incredible possible applications thanks to the implementation of smart contracts such as NFT collections, NFT games and DAO. I have always been very interested in distributed systems, and the Blockchain is a distributed system in its purest form.
It all started with one of the most popular games, which was applying this new concept of Play2Earn, where the game's assets were Tokens (ERC-721), which the players owned and generated profit while playing with them. They could also trade them with other players through the market that the platform offered, my interest was triggered when I found certain transactions in the market where there were players who bought these assets for a price below the average and then resold them for profit.
After noticing this, I contacted a great colleague and friend since this represented two exciting benefits: first, to learn about this technology that had been talked about for years (still, neither of us had had the opportunity or interest to get into it so far); and second, it presented an actual use case that could also have an economic benefit. What a great incentive!
We started with simple iterations. We created a Bot that listened to the events and transactions of the particular Blockchain that we were interested in. Every time we found a transaction in which we could make a profit, we decided to participate. Obviously, we were not the only ones participating, so we failed most of the transactions, given that others were much faster than us. As the days passed, the more we investigated, the more we learned, and we were making more efficient implementations. We stopped using public nodes and deployed multiple nodes with a modified protocol version.
The Blockchain was an Ethereum fork, so we modified the implementation of the protocol (which is written in Go, and also my favourite language) to run our Bots as part of our internal implementation to be a bit faster. In this way, we end up doing Front-running of the transactions and gaining many more opportunities.
Front running is the act of placing a transaction in a queue with the knowledge of a future transaction. https://coinmarketcap.com/alexandria/glossary/front-running
In short, all this led me to be more involved in the whole topic of Blockchain development in recent weeks, and that is why I decided to learn a little more in-depth and share it with you while I do it. So the idea is that I'll be posting little articles about technologies involved in everything related to Blockchain, anything that I find interesting and that might be interesting for you too.
To not bore you anymore with this introduction, in this first article, I want to talk about something that is a fundamental part of the blockchain, such as the Merkle tree, and one straightforward implementation in Go. This article assumes you have a basic understanding of cryptographic hash functions and basic binary tree algorithms.
What is a Merkle Tree?
A Merkle tree or hash tree is a binary tree where a node can only have zero, one, or two children. Every leaf node is labelled with the cryptographic hash of a data block (for example, a Blockchain transaction), and every node that is not a leaf (called a branch or inner node) is labelled with the cryptographic hash of the labels of its child nodes. The tree is complete when there is a single node called the Merkle root.
Thanks to this design, this structure achieves the following properties:
- Easy to verify data integrity because the root hash (the Merkle root) will change if any of the data changes.
- A single hash value, the Merkle root, represents all data blocks.
- Easy to verify the inclusion of a piece of data into the tree using the Merkle proof.
Something interesting to note about the construction of a Merkle tree is that since it is a binary tree and its branches are the combinations of the child nodes then, what happens if we have an odd number of leaf nodes? What happens if we have an odd number of branches nodes?
When building a Merkle tree, the simple answer is that if we have an odd number of nodes at a particular level, we must duplicate the last node to concatenate with itself.
Suppose a user knows the root hash of the Merkle tree and wants to check whether the tree does indeed contain a specific piece of data, such as a transaction. In that case, he can do it by using just one path through the Merkle tree, which is proportional to the log2
of the number of leaf nodes rather than the total number of leaf nodes.
For example, following the diagram above, we can see how users can check the integrity of the L3
block or if it is present in the tree. If they know the root hash
and are provided with Hash 1-1
and Hash 0
, they can do it by following the next steps:
- Users first get the hash of the data block they want to check, in this case, the
L3
block:Hash(L3)
, to get theHash 1-0
. - Then they must combine the result with the next provided hash, which must be the node's hash that is its peer. In this case, it would be
Hash(Hash 1-0, Hash 1-1)
and so on until we get the Merkle root. - Finally, they only need to check that the obtained root hash equals the previously known root hash. If the hashes are the same, the
L3
block belongs to the tree, and its data has not been altered. Otherwise,L3
is not in the tree or its data was changed.
As can be seen, a Merkle tree allows rapid validation and provides a robust security system since if any of the blocks is modified, the result of its hash is different. Therefore, combining its hash with the subsequent hashes will produce a different root hash than the known one.
A Merkle tree is used in many computer applications. In Bitcoin and other cryptocurrencies, Merkle trees are especially used to maintain the state of the blockchain efficiently and securely.
On average, there are 500 transactions in a Bitcoin block. Without this data structure, each user/client who wants to check if one transaction is within a specific block or its integrity would have to request and download the 500 transactions to prove it. While using a Merkle tree implementation, they only need to download the path of nine nodes ceil(log2(500)) = 9
of the specific transaction and execute the process discussed before. With this, you can check the integrity of the transaction.
Even if the block contained 1M transactions, it would only need a path with 20 nodes to be able to perform validation.
As you can see, this data structure is compelling and is used in various systems such as:
- Version control systems, for example, Git.
- File system in P2P.
- Databases.
- Blockchain.
Merkle tree in Go
An easy way to build this tree is to use a pointer-based strategy. With pointers, we can easily see the relationship of nodes to their children and parents and use this knowledge to calculate their hashes correctly. Also, these relations will help us to build the path to prove a piece of data.
As we already know, in this type of tree, the leaf nodes are hashes of the data we want to build the tree, so we must make it from the leaves to the root using a layered construction algorithm from the bottom up.
We will start with two relatively simple structures, in which we will store the tree and the nodes of the tree with their respective values.
type Tree struct {
Root *Node
Leaves []*Node
h Hasher
}
type Node struct {
Parent *Node
Left *Node
Right *Node
Hash []byte
}
The NewFromHashes
function will help us create a tree from an array of hashes, as we already know the leaf nodes of this structure are hashes of the desired information.
func NewFromHashes(hashes [][]byte, h Hasher) *Tree {
t := &Tree{
Leaves: make([]*Node, 0, len(hashes)),
h: h,
}
// Add leaf nodes.
for _, h := range hashes {
t.Leaves = append(t.Leaves, &Node{Hash: h})
}
t.Root = t.buildRoot()
return t
}
The next one will iterate through each layer of nodes to form their relationships until it reaches the root node.
func (t *Tree) buildRoot() *Node {
nodes := t.Leaves
// We are iterating until we reach a single node, which will be our root.
for len(nodes) > 1 {
var parents []*Node
// Having an odd number of nodes at this level, we will duplicate the last node to concatenate it with itself.
if len(nodes)%2 != 0 {
nodes = append(nodes, nodes[len(nodes)-1])
}
// Pairing nodes to build a parent from the pair
for i := 0; i < len(nodes); i += 2 {
n := &Node{
Left: nodes[i],
Right: nodes[i+1],
// Compute the hash of the new node, which will be the combination of its children's hashes.
Hash: t.h.Hash(nodes[i].Hash, nodes[i+1].Hash),
}
parents = append(parents, n)
nodes[i].Parent, nodes[i+1].Parent = n, n
}
// Once all possible pairs are processed, the parents become the children, and we start all over again.
nodes = parents
}
return nodes[0]
}
The GetProof
function receives as a parameter the hash of the data/transaction that we want to verify and will return the hashes of the nodes that are needed to validate its integrity. Using the relation between nodes, we can retrieve those nodes easily.
I added some explanatory comments which I think may clarify the way we get the proof using pointers.
func (t *Tree) GetProof(hash []byte) ([][]byte, []int, error) {
var (
path [][]byte
idxs []int
)
// Find the leaf node for the specific hash.
for _, currentNode := range t.Leaves {
if bytes.Equal(currentNode.Hash, hash) {
// After finding the node, we will scale the tree using the relationship of the nodes to their parent nodes.
parent := currentNode.Parent
for parent != nil {
// If the current node is the left child, we need the right child to calculate the parent hash
// for the proof and vice versa.
// i.e:
// If CurrentNode == Left ; ParentHash = (CurrentNode.Hash, RightChild.Hash)
// If CurrentNode == Right ; ParentHash = (LeftChild.Hash, CurrentNode.Hash)
// So we have to add the corresponding hash to the path, and in idxs, we save the hash's position 0
// for left and 1 for right. In this way, when we want to verify the proof, we can know if
// the given hash is the left o right child.
if bytes.Equal(currentNode.Hash, parent.Left.Hash) {
path = append(path, parent.Right.Hash)
idxs = append(idxs, 1)
} else {
path = append(path, parent.Left.Hash)
idxs = append(idxs, 0)
}
currentNode = parent
parent = currentNode.Parent
}
return path, idxs, nil
}
}
return path, idxs, errors.New("hash does not belong to the tree")
}
If you go to the repository, you can find the function to verify the proof, and you will also find a complete example of this simple implementation. Github Repo.
package main
import (
"crypto/sha256"
"fmt"
"github.com/douglasmakey/mktree"
)
type transaction struct {
from string
to string
value string
}
func hashTrx(t transaction) []byte {
h := sha256.New()
h.Write([]byte(fmt.Sprintf("%v", t)))
return h.Sum(nil)
}
func main() {
trx1 := transaction{from: "mike", to: "bob", value: "100"}
trx2 := transaction{from: "bob", to: "douglas", value: "250"}
trx3 := transaction{from: "alice", to: "john", value: "100"}
trx4 := transaction{from: "vitalik", to: "elon", value: "10000"}
data := [][]byte{
hashTrx(trx1),
hashTrx(trx2),
hashTrx(trx3),
hashTrx(trx4),
}
// Create and verify the tree.
t := mktree.NewFromHashes(data, mktree.DefaultShaHasher)
fmt.Println("Hex: ", t.Root.Hex())
// Getting the proof of the first transaction and verify it.
proof, idxs, err := t.GetProof(hashTrx(trx1))
if err != nil {
panic(err)
}
fmt.Printf("Verify proof of trx1: %+v \n", trx1)
p := mktree.VerifyProof(t.Root.Hash, hashTrx(trx1), proof, idxs, mktree.DefaultShaHasher)
fmt.Println("Proof integrity: ", p)
// Change trx1 and try to verify it with the original proof.
trx1.value = "1000"
fmt.Printf("Verify proof of trx1: %+v \n", trx1)
p = mktree.VerifyProof(t.Root.Hash, hashTrx(trx1), proof, idxs, mktree.DefaultShaHasher)
fmt.Println("Proof integrity with a change trx: ", p)
// Modifying the second transaction to send money to me.
trx5 := transaction{from: "vitalik", to: "douglas", value: "10000"}
t.Leaves[1].Hash = hashTrx(trx5)
// We are going to verify the integrity of the tree after the modification
fmt.Println("Tree integrity: ", t.Verify())
}
I hope you enjoyed the article. Thanks for your time, and any feedback is always welcomed :heart:.
Links:
- https://www.geeksforgeeks.org/binary-tree-data-structure/
- https://en.wikipedia.org/wiki/Cryptographic_hash_function
- https://en.wikipedia.org/wiki/Merkle_tree
- https://en.bitcoinwiki.org/wiki/Merkle_tree
- https://en.bitcoin.it/wiki/Protocol_documentation#Merkle_Trees
- https://cointelegraph.com/explained/what-is-front-running-in-crypto-and-nft-trading