本文主要介绍以太坊Merkle Patricia Trie以及区块的校验存储
sha3: 以太坊所用的一种hash方法
rlp :Recursive Length Prefix,递归长度前缀编码,以太坊用于编码数据。用于编码任意的嵌套结构的二进制数据(在python中就是列表,其实rlp在Merkle Patricia Trie的实现中主要就是编码列表用的),它是以太坊中数据序列化/反序列化的主要方法,区块、交易等数据结构在持久化时会先经过RLP编码后再存储到数据库中。
先看一下以太坊block区块头的结构
class BlockHeader(rlp.Serializable):
"""A block header. If the block with this header exists as an instance of :class:`Block`, the connection can be made explicit by setting :attr:`BlockHeader.block`. Then, :attr:`BlockHeader.state_root`, :attr:`BlockHeader.tx_list_root` and :attr:`BlockHeader.receipts_root` always refer to the up-to-date value in the block instance. :ivar block: an instance of :class:`Block` or `None` :ivar prevhash: the 32 byte hash of the previous block :ivar uncles_hash: the 32 byte hash of the RLP encoded list of uncle headers :ivar coinbase: the 20 byte coinbase address :ivar state_root: the root of the block's state trie :ivar tx_list_root: the root of the block's transaction trie :ivar receipts_root: the root of the block's receipts trie :ivar bloom: TODO :ivar difficulty: the block's difficulty :ivar number: the number of ancestors of this block (0 for the genesis block) :ivar gas_limit: the block's gas limit :ivar gas_used: the total amount of gas used by all transactions in this block :ivar timestamp: a UNIX timestamp :ivar extra_data: up to 1024 bytes of additional data :ivar nonce: a 32 byte nonce constituting a proof-of-work, or the empty string as a placeholder """
fields = [
('prevhash', hash32),#父区块的hash
('uncles_hash', hash32),#矿工引用的uncle区块列表的hash
('coinbase', address),#挖出此区块矿工的地址
('state_root', trie_root),#state trie树树根
('tx_list_root', trie_root),#交易trie树树根
('receipts_root', trie_root),#收据trie树树根
('bloom', int256),
('difficulty', big_endian_int),#当前区块难度
('number', big_endian_int),#区块的高度
('gas_limit', big_endian_int),#区块gas上限
('gas_used', big_endian_int),#区块使用的gas
('timestamp', big_endian_int),#时间戳
('extra_data', binary),
('mixhash', binary),
('nonce', Binary(8, allow_empty=True))#工作量证明随机数,通过校验该值来判断矿工的工作量证明是否合法。
]
首先要提一点,以太坊用Merkle Patricia Trie 这种数据结构来做区块中交易的快速校验和全网状态state的快速验证。Merkle Patricia Trie是一种树,区块头中的state_root,tx_list_root,receipts_root 分别是State Trie ,Transactions Trie,Receipts Trie的树根,这三个树都是Merkle Patricia Trie。
普通trie树和merkle树看这篇博客
先说下trie树的一个扩展Radix Trie树,这个树也是只存key / value的一棵树
这棵树每个节点看起来是这样的[i0, i1 … in, value] ,现在如果我们的key是用16进制(hex)编码,那么n=15,每个节点列表长度为17,[i0,i1,…i15,value],列表中的0~15的某一位就表示key中的一个16进制字符。
Radix Trie树有个问题,如果我要存储一个长度为20的hex编码的key,那么这个树的高度就有20,每一层就只存了key中一个字符,这样效率就不高,以太坊就改进了一下,增加了一些节点类型来解决这个事情。
Merkle Patricia Trie树也是用来存key / value的一棵树
Merkle Patricia Trie的四种节点类型
1.NULL (represented as the empty string) 空节点
2.branch A 17-item node [ v0 … v15, vt ] 分支节点,0~15每一位表示16进制编码的一个字符
3.leaf A 2-item node [ encodedPath, value ] 这个value是原始key在叶子节点上对应的value值
4.extension A 2-item node [ encodedPath, key ] 这个key是子节点的hash
实际在Merkle Patricia Trie的实现中,还用了HP(hex prefix)编码方式去编码的key,其实就是十六进制编码再加前缀来表示节点类型。同时key是以bytes存储的,而一个hex char只有4位,一个byte有八位,所以不够八位的时候prefix 后面再加0000补全。
HP编码前缀的规则
hex char | bits | node type partial | path length |
---|---|---|---|
0 | 0000 | extension | even |
1 | 0001 | extension | odd |
2 | 0010 | terminating (leaf) | even |
3 | 0011 | terminating (leaf) | odd |
先看一个Merkle Patricia Trie存储的列子
以下是要存储到tire树中的四个键值对
<64 6f> : ‘verb’
<64 6f 67> : ‘puppy’
<64 6f 67 65> : ‘coin’
<68 6f 72 73 65> : ‘stallion’
添加之后,trie树结构:
rootHash: [ <16>, hashA ] // extension node
hashA: [ <>, <>, <>, <>, hashB, <>, <>, <>, hashC, <>, <>, <>, <>, <>, <>, <>, <> ] // branch node
hashC: [ <20 6f 72 73 65>, ‘stallion’ ] //leaf node
hashB: [ <00 6f>, hashD ] // extension node
hashD: [ <>, <>, <>, <>, <>, <>, hashE, <>, <>, <>, <>, <>, <>, <>, <>, <>, verb ] // branch node
hashE: [ <17>, hashF ] // extension node
hashF: [ <>, <>, <>, <>, <>, <>, hashG, <>, <>, <>, <>, <>, <>, <>, <>, <>, puppy ] // branch node
hashG: [ <35>, ‘coin’ ] //leaf node
注意到root node 有公共前缀hex char 6 ,6前面的1表示这是个extension node,并且原来长度是奇数,补了个前缀1后正好凑成一个byte。
存储在root node上的key,value 便是 <16>, hashA,而hashA又是子节点在leveldb中的地址;
其中,hashA = sha3(rlp_encode([ <>, <>, <>, <>, hashB, <>, <>, <>, hashC, <>, <>, <>, <>, <>, <>, <>, <> ]))
这个hashA的计算方式正是体现了Merkle Patricia Trie的‘Merkle’部分,每个节点的hash地址正是节点数据rlp编码过后的hash。如果某个矿工篡改了叶子节点的数据,比如State Trie某个叶子节点一个账户的余额,并生成了State Tire的root hash,其他矿工通过transaction list算出来的State Tire的root hash一定跟他的不一致。
hashA 指向一个分支节点,分支节点中的第4位有hashB,表示有公共前缀‘4’;hashB 的公共前缀是<00 6f>,及公共前缀为‘6f’,一直到hashD,hashD的value为’verb’ ,经过的整个路径6-4-6f 即为原来的key。
再到hashG叶子节点 ,经过的整个路劲6-4-6f-6-7-6-5 即为value ‘coin’ 的key
在State Trie中,路径path(key)都是sha3(ethereumAddress) 账户地址的hash,value 都是该账户的信息:[nonce(当前交易次数),balance(账户余额),storageRoot,codeHash]
Transactions Trie和Receipts Trie 与 State Trie的区别是,每个区块都有独立的Transactions Trie和Receipts Trie ,而State Trie是全局的。
在Transactions Trie中,路径path(key)都是rlp(transactionIndex),transactionIndex是交易的编号,value都是rlp(transactionData)
Merkle Patricia Trie参考资料
https://github.com/ethereum/wiki/wiki/Patricia-Tree
https://easythereentropy.wordpress.com/2014/06/04/understanding-the-ethereum-trie/
从队里中取出一个block数据并尝试添加到本地chain
尝试实例化block对象,传入的参数有header区块头, transaction_list交易列表, uncles 叔叔区块列表
区块一致性校验:
if parent:
if hasattr(parent, 'db') and self.db != parent.db and self.db.db != parent.db:
raise ValueError("Parent lives in different database")
if self.prevhash != parent.header.hash:
raise ValueError("Block's prevhash and parent's hash do not match")
if self.number != parent.header.number + 1:
raise ValueError("Block's number is not the successor of its parent number")
if not check_gaslimit(parent, self.gas_limit):
raise ValueError("Block's gaslimit is inconsistent with its parent's gaslimit")
if self.difficulty != calc_difficulty(parent, self.timestamp):
raise ValueError("Block's difficulty is inconsistent with its parent's difficulty")
if self.gas_used > self.gas_limit:
raise ValueError("Gas used exceeds gas limit")
if self.timestamp <= parent.header.timestamp:
raise ValueError("Timestamp equal to or before parent")
交易及状态校验:
#初始化一个空的transaction trie,通过添加transaction list中的交易来重新构建transaction trie,这个transaction trie的root hash必须跟矿工给出的transaction trie root hash 一致
self.transactions = Trie(db, trie.BLANK_ROOT)
self.receipts = Trie(db, trie.BLANK_ROOT)
# replay transactions if state is unknown
state_unknown = (header.prevhash != GENESIS_PREVHASH and
header.number != 0 and
header.state_root != trie.BLANK_ROOT and
(len(header.state_root) != 32 or
b'validated:' + self.hash not in db) and
not making)
#如果没有添加过该区块
if state_unknown:
assert transaction_list is not None
if not parent:
parent = self.get_parent_header()
#取出该区块父区块的state trie
self.state = SecureTrie(Trie(db, parent.state_root))
self.transaction_count = 0
self.gas_used = 0
# replay
for tx in transaction_list:
#执行每一笔交易,同时更新transaction trie和state trie
success, output = processblock.apply_transaction(self, tx)
self.finalize()
else:
# trust the state root in the header
self.state = SecureTrie(Trie(self.db, header._state_root))
self.transaction_count = 0
if transaction_list:
for tx in transaction_list:
self.add_transaction_to_list(tx)
if self.transactions.root_hash != header.tx_list_root:
raise ValueError("Transaction list root hash does not match")
# receipts trie populated by add_transaction_to_list is incorrect
# (it doesn't know intermediate states), so reset it
self.receipts = Trie(self.db, header.receipts_root)
processblock.apply_transaction:
def apply_transaction(block, tx):
#交易合法性校验
validate_transaction(block, tx)
log_tx.debug('TX NEW', tx_dict=tx.log_dict())
# start transacting #################
#sender交易数nonce加1
block.increment_nonce(tx.sender)
# print block.get_nonce(tx.sender), '@@@'
# buy startgas
assert block.get_balance(tx.sender) >= tx.startgas * tx.gasprice
#sender账户扣除startgas
block.delta_balance(tx.sender, -tx.startgas * tx.gasprice)
message_gas = tx.startgas - intrinsic_gas_used(tx)
#合约代码
message_data = vm.CallData([safe_ord(x) for x in tx.data], 0, len(tx.data))
#交易
message = vm.Message(tx.sender, tx.to, tx.value, message_gas, message_data, code_address=tx.to)
# MESSAGE
ext = VMExt(block, tx)
if tx.to and tx.to != CREATE_CONTRACT_ADDRESS:
result, gas_remained, data = apply_msg(ext, message)
log_tx.debug('_res_', result=result, gas_remained=gas_remained, data=data)
else: # CREATE
result, gas_remained, data = create_contract(ext, message)
assert utils.is_numeric(gas_remained)
log_tx.debug('_create_', result=result, gas_remained=gas_remained, data=data)
assert gas_remained >= 0
log_tx.debug("TX APPLIED", result=result, gas_remained=gas_remained,
data=data)
if not result: # 0 = OOG failure in both cases
log_tx.debug('TX FAILED', reason='out of gas',
startgas=tx.startgas, gas_remained=gas_remained)
block.gas_used += tx.startgas
#给矿工手续费
block.delta_balance(block.coinbase, tx.gasprice * tx.startgas)
output = b''
success = 0
else:
log_tx.debug('TX SUCCESS', data=data)
gas_used = tx.startgas - gas_remained
block.refunds += len(block.suicides) * opcodes.GSUICIDEREFUND
if block.refunds > 0:
log_tx.debug('Refunding', gas_refunded=min(block.refunds, gas_used // 2))
gas_remained += min(block.refunds, gas_used // 2)
gas_used -= min(block.refunds, gas_used // 2)
block.refunds = 0
# sell remaining gas
block.delta_balance(tx.sender, tx.gasprice * gas_remained)
block.delta_balance(block.coinbase, tx.gasprice * gas_used)
block.gas_used += gas_used
if tx.to:
output = b''.join(map(ascii_chr, data))
else:
output = data
success = 1
block.commit_state()
suicides = block.suicides
block.suicides = []
for s in suicides:
block.ether_delta -= block.get_balance(s)
block.del_account(s)
block.add_transaction_to_list(tx)
block.logs = []
return success, output
block.commit_state()
更新state trie账户状态
def commit_state(self):
"""Commit account caches"""
"""Write the acount caches on the corresponding tries."""
changes = []
if len(self.journal) == 0:
# log_state.trace('delta', changes=[])
return
addresses = sorted(list(self.caches['all'].keys()))
for addr in addresses:
#获取一个账户
acct = self._get_acct(addr)
# storage
for field in ('balance', 'nonce', 'code', 'storage'):
if addr in self.caches[field]:
v = self.caches[field][addr]
changes.append([field, addr, v])
#实例化account对象
setattr(acct, field, v)
#更新account的storage trie
t = SecureTrie(Trie(self.db, acct.storage))
for k, v in self.caches.get(b'storage:' + addr, {}).items():
enckey = utils.zpad(utils.coerce_to_bytes(k), 32)
val = rlp.encode(v)
changes.append(['storage', addr, k, v])
# if self.number > 18280 and False:
# try:
# self.db.logging = True
# except:
# pass
# sys.stderr.write("pre: %r\n" % self.account_to_dict(addr)['storage'])
# sys.stderr.write("pre: %r\n" % self.get_storage(addr).root_hash.encode('hex'))
# sys.stderr.write("changed: %s %s %s\n" % (encode_hex(addr), encode_hex(enckey), encode_hex(val)))
if v:
t.update(enckey, val)
else:
t.delete(enckey)
acct.storage = t.root_hash
#更新state trie中该账户的数据,从新生成root hash
self.state.update(addr, rlp.encode(acct))
log_state.trace('delta', changes=changes)
self.reset_cache()
self.db.put_temporarily(b'validated:' + self.hash, '1')
block.add_transaction_to_list(tx)
根据transaction list 重新构建transaction trie
def add_transaction_to_list(self, tx):
"""Add a transaction to the transaction trie. Note that this does not execute anything, i.e. the state is not updated. """
k = rlp.encode(self.transaction_count)
self.transactions.update(k, rlp.encode(tx))
r = self.mk_transaction_receipt(tx)
self.receipts.update(k, rlp.encode(r))
self.bloom |= r.bloom # int
self.transaction_count += 1
所有交易都添加完后,添加矿工奖励,self.finalize():
def finalize(self):
"""Apply rewards and commit."""
delta = int(BLOCK_REWARD + NEPHEW_REWARD * len(self.uncles))
self.delta_balance(self.coinbase, delta)
self.ether_delta += delta
for uncle in self.uncles:
r = BLOCK_REWARD * \
(UNCLE_DEPTH_PENALTY_FACTOR + uncle.number - self.number) \
/ UNCLE_DEPTH_PENALTY_FACTOR
r = int(r)
self.delta_balance(uncle.coinbase, r)
self.ether_delta += r
self.commit_state()
所有交易添加完后,再来校验state root ,transaction root 是否与block header 中的值一致,不一致的化,区块不合法。Block.init
if parent:
must_equal('prev_hash', self.prevhash, parent.hash)
must_equal('gas_used', original_values['gas_used'], self.gas_used)
must_equal('timestamp', self.timestamp, original_values['timestamp'])
must_equal('difficulty', self.difficulty, original_values['difficulty'])
must_equal('uncles_hash', utils.sha3(rlp.encode(uncles)), original_values['uncles_hash'])
assert header.block is None
must_equal('state_root', self.state.root_hash, header.state_root)
must_equal('tx_list_root', self.transactions.root_hash, header.tx_list_root)
must_equal('receipts_root', self.receipts.root_hash, header.receipts_root)
must_equal('bloom', self.bloom, original_values['bloom'])
# from now on, trie roots refer to block instead of header
header.block = self
# Basic consistency verifications
if not self.check_fields():
raise ValueError("Block is invalid")
if len(self.header.extra_data) > MAX_EXTRADATA_LENGTH:
raise ValueError("Extra data cannot exceed %d bytes" % MAX_EXTRADATA_LENGTH)
if self.header.coinbase == '':
raise ValueError("Coinbase cannot be empty address")
if not self.state.root_hash_valid():
raise ValueError("State Merkle root of block %r not found in "
"database" % self)
if (not self.is_genesis() and self.nonce and not self.header.check_pow()):
raise ValueError("PoW check failed")
if b'validated:' + self.hash not in self.db:
if self.number == 0:
self.db.put(b'validated:' + self.hash, '1')
else:
self.db.put_temporarily(b'validated:' + self.hash, '1')
至此,区块校验完成,之后便是将区块加入本地区块链
现在再回到chain service 最开始的_add_block函数中来
将区块加入本地链self.chain.add_block:
def add_block(self, block, forward_pending_transactions=True):
"returns True if block was added sucessfully"
_log = log.bind(block_hash=block)
# make sure we know the parent
if not block.has_parent() and not block.is_genesis():
_log.debug('missing parent')
return False
#再次检查矿工引用的uncle区块是否合法
if not block.validate_uncles():
_log.debug('invalid uncles')
return False
if not len(block.nonce) == 8:
_log.debug('nonce not set')
return False
#再次检查工作量证明是否合法
elif not block.header.check_pow(nonce=block.nonce) and\
not block.is_genesis():
_log.debug('invalid nonce')
return False
#再检查下父区块是否是区块头中指定的父区块
if block.has_parent():
try:
processblock.verify(block, block.get_parent())
except processblock.VerificationFailed as e:
_log.critical('VERIFICATION FAILED', error=e)
f = os.path.join(utils.data_dir, 'badblock.log')
open(f, 'w').write(to_string(block.hex_serialize()))
return False
if block.number < self.head.number:
_log.debug("older than head", head_hash=self.head)
# Q: Should we have any limitations on adding blocks?
self.index.add_block(block)
#存入本地chain
self._store_block(block)
# set to head if this makes the longest chain w/ most work for that number
if block.chain_difficulty() > self.head.chain_difficulty():
_log.debug('new head')
#更新head candidate,这个其实只针对矿工,只要不挖矿,就不用去管候选头区块也不用去打包交易
self._update_head(block, forward_pending_transactions)
elif block.number > self.head.number:
_log.warn('has higher blk number than head but lower chain_difficulty',
head_hash=self.head, block_difficulty=block.chain_difficulty(),
head_difficulty=self.head.chain_difficulty())
block.transactions.clear_all()
block.receipts.clear_all()
block.state.db.commit_refcount_changes(block.number)
block.state.db.cleanup(block.number)
self.commit() # batch commits all changes that came with the new block
return True
至此,新区块成功写入本地