python ethereum 代码分析 《3》

python ethereum 代码分析 《3》

python 版本以太坊

pyethereum部分

本文主要介绍以太坊Merkle Patricia Trie以及区块的校验存储

一些关键字解释

sha3: 以太坊所用的一种hash方法

rlp :Recursive Length Prefix,递归长度前缀编码,以太坊用于编码数据。用于编码任意的嵌套结构的二进制数据(在python中就是列表,其实rlp在Merkle Patricia Trie的实现中主要就是编码列表用的),它是以太坊中数据序列化/反序列化的主要方法,区块、交易等数据结构在持久化时会先经过RLP编码后再存储到数据库中。

BlockHeader

先看一下以太坊block区块头的结构

class BlockHeader(rlp.Serializable):

    """A block header. If the block with this header exists as an instance of :class:`Block`, the connection can be made explicit by setting :attr:`BlockHeader.block`. Then, :attr:`BlockHeader.state_root`, :attr:`BlockHeader.tx_list_root` and :attr:`BlockHeader.receipts_root` always refer to the up-to-date value in the block instance. :ivar block: an instance of :class:`Block` or `None` :ivar prevhash: the 32 byte hash of the previous block :ivar uncles_hash: the 32 byte hash of the RLP encoded list of uncle headers :ivar coinbase: the 20 byte coinbase address :ivar state_root: the root of the block's state trie :ivar tx_list_root: the root of the block's transaction trie :ivar receipts_root: the root of the block's receipts trie :ivar bloom: TODO :ivar difficulty: the block's difficulty :ivar number: the number of ancestors of this block (0 for the genesis block) :ivar gas_limit: the block's gas limit :ivar gas_used: the total amount of gas used by all transactions in this block :ivar timestamp: a UNIX timestamp :ivar extra_data: up to 1024 bytes of additional data :ivar nonce: a 32 byte nonce constituting a proof-of-work, or the empty string as a placeholder """

    fields = [
        ('prevhash', hash32),#父区块的hash
        ('uncles_hash', hash32),#矿工引用的uncle区块列表的hash
        ('coinbase', address),#挖出此区块矿工的地址
        ('state_root', trie_root),#state trie树树根
        ('tx_list_root', trie_root),#交易trie树树根
        ('receipts_root', trie_root),#收据trie树树根
        ('bloom', int256),
        ('difficulty', big_endian_int),#当前区块难度
        ('number', big_endian_int),#区块的高度
        ('gas_limit', big_endian_int),#区块gas上限
        ('gas_used', big_endian_int),#区块使用的gas
        ('timestamp', big_endian_int),#时间戳
        ('extra_data', binary),
        ('mixhash', binary),
        ('nonce', Binary(8, allow_empty=True))#工作量证明随机数,通过校验该值来判断矿工的工作量证明是否合法。
    ]

Merkle Patricia Trie

首先要提一点,以太坊用Merkle Patricia Trie 这种数据结构来做区块中交易的快速校验和全网状态state的快速验证。Merkle Patricia Trie是一种树,区块头中的state_root,tx_list_root,receipts_root 分别是State Trie ,Transactions Trie,Receipts Trie的树根,这三个树都是Merkle Patricia Trie。

普通trie树和merkle树看这篇博客

先说下trie树的一个扩展Radix Trie树,这个树也是只存key / value的一棵树
这棵树每个节点看起来是这样的[i0, i1 … in, value] ,现在如果我们的key是用16进制(hex)编码,那么n=15,每个节点列表长度为17,[i0,i1,…i15,value],列表中的0~15的某一位就表示key中的一个16进制字符。
Radix Trie树有个问题,如果我要存储一个长度为20的hex编码的key,那么这个树的高度就有20,每一层就只存了key中一个字符,这样效率就不高,以太坊就改进了一下,增加了一些节点类型来解决这个事情。
Merkle Patricia Trie树也是用来存key / value的一棵树
Merkle Patricia Trie的四种节点类型
1.NULL (represented as the empty string) 空节点
2.branch A 17-item node [ v0 … v15, vt ] 分支节点,0~15每一位表示16进制编码的一个字符
3.leaf A 2-item node [ encodedPath, value ] 这个value是原始key在叶子节点上对应的value值
4.extension A 2-item node [ encodedPath, key ] 这个key是子节点的hash

实际在Merkle Patricia Trie的实现中,还用了HP(hex prefix)编码方式去编码的key,其实就是十六进制编码再加前缀来表示节点类型。同时key是以bytes存储的,而一个hex char只有4位,一个byte有八位,所以不够八位的时候prefix 后面再加0000补全。
HP编码前缀的规则

hex char bits node type partial path length
0 0000 extension even
1 0001 extension odd
2 0010 terminating (leaf) even
3 0011 terminating (leaf) odd

先看一个Merkle Patricia Trie存储的列子
以下是要存储到tire树中的四个键值对
<64 6f> : ‘verb’
<64 6f 67> : ‘puppy’
<64 6f 67 65> : ‘coin’
<68 6f 72 73 65> : ‘stallion’
添加之后,trie树结构:
rootHash: [ <16>, hashA ] // extension node
hashA: [ <>, <>, <>, <>, hashB, <>, <>, <>, hashC, <>, <>, <>, <>, <>, <>, <>, <> ] // branch node
hashC: [ <20 6f 72 73 65>, ‘stallion’ ] //leaf node
hashB: [ <00 6f>, hashD ] // extension node
hashD: [ <>, <>, <>, <>, <>, <>, hashE, <>, <>, <>, <>, <>, <>, <>, <>, <>, verb ] // branch node
hashE: [ <17>, hashF ] // extension node
hashF: [ <>, <>, <>, <>, <>, <>, hashG, <>, <>, <>, <>, <>, <>, <>, <>, <>, puppy ] // branch node
hashG: [ <35>, ‘coin’ ] //leaf node

注意到root node 有公共前缀hex char 6 ,6前面的1表示这是个extension node,并且原来长度是奇数,补了个前缀1后正好凑成一个byte。
存储在root node上的key,value 便是 <16>, hashA,而hashA又是子节点在leveldb中的地址;
其中,hashA = sha3(rlp_encode([ <>, <>, <>, <>, hashB, <>, <>, <>, hashC, <>, <>, <>, <>, <>, <>, <>, <> ]))
这个hashA的计算方式正是体现了Merkle Patricia Trie的‘Merkle’部分,每个节点的hash地址正是节点数据rlp编码过后的hash。如果某个矿工篡改了叶子节点的数据,比如State Trie某个叶子节点一个账户的余额,并生成了State Tire的root hash,其他矿工通过transaction list算出来的State Tire的root hash一定跟他的不一致。
hashA 指向一个分支节点,分支节点中的第4位有hashB,表示有公共前缀‘4’;hashB 的公共前缀是<00 6f>,及公共前缀为‘6f’,一直到hashD,hashD的value为’verb’ ,经过的整个路径6-4-6f 即为原来的key。
再到hashG叶子节点 ,经过的整个路劲6-4-6f-6-7-6-5 即为value ‘coin’ 的key

在State Trie中,路径path(key)都是sha3(ethereumAddress) 账户地址的hash,value 都是该账户的信息:[nonce(当前交易次数),balance(账户余额),storageRoot,codeHash]

Transactions Trie和Receipts Trie 与 State Trie的区别是,每个区块都有独立的Transactions Trie和Receipts Trie ,而State Trie是全局的。
在Transactions Trie中,路径path(key)都是rlp(transactionIndex),transactionIndex是交易的编号,value都是rlp(transactionData)

Merkle Patricia Trie参考资料
https://github.com/ethereum/wiki/wiki/Patricia-Tree
https://easythereentropy.wordpress.com/2014/06/04/understanding-the-ethereum-trie/

区块的校验与存储

从队里中取出一个block数据并尝试添加到本地chain
尝试实例化block对象,传入的参数有header区块头, transaction_list交易列表, uncles 叔叔区块列表
区块一致性校验:

        if parent:
            if hasattr(parent, 'db') and self.db != parent.db and self.db.db != parent.db:
                raise ValueError("Parent lives in different database")
            if self.prevhash != parent.header.hash:
                raise ValueError("Block's prevhash and parent's hash do not match")
            if self.number != parent.header.number + 1:
                raise ValueError("Block's number is not the successor of its parent number")
            if not check_gaslimit(parent, self.gas_limit):
                raise ValueError("Block's gaslimit is inconsistent with its parent's gaslimit")
            if self.difficulty != calc_difficulty(parent, self.timestamp):
                raise ValueError("Block's difficulty is inconsistent with its parent's difficulty")
            if self.gas_used > self.gas_limit:
                raise ValueError("Gas used exceeds gas limit")
            if self.timestamp <= parent.header.timestamp:
                raise ValueError("Timestamp equal to or before parent")

交易及状态校验:

        #初始化一个空的transaction trie,通过添加transaction list中的交易来重新构建transaction trie,这个transaction trie的root hash必须跟矿工给出的transaction trie root hash 一致
        self.transactions = Trie(db, trie.BLANK_ROOT)
        self.receipts = Trie(db, trie.BLANK_ROOT)
        # replay transactions if state is unknown
        state_unknown = (header.prevhash != GENESIS_PREVHASH and
                         header.number != 0 and
                         header.state_root != trie.BLANK_ROOT and
                         (len(header.state_root) != 32 or
                          b'validated:' + self.hash not in db) and
                         not making)
        #如果没有添加过该区块
        if state_unknown:
            assert transaction_list is not None
            if not parent:
                parent = self.get_parent_header()
            #取出该区块父区块的state trie
            self.state = SecureTrie(Trie(db, parent.state_root))
            self.transaction_count = 0
            self.gas_used = 0
            # replay
            for tx in transaction_list:
                #执行每一笔交易,同时更新transaction trie和state trie
                success, output = processblock.apply_transaction(self, tx)
            self.finalize()
        else:
            # trust the state root in the header
            self.state = SecureTrie(Trie(self.db, header._state_root))
            self.transaction_count = 0
            if transaction_list:
                for tx in transaction_list:
                    self.add_transaction_to_list(tx)
            if self.transactions.root_hash != header.tx_list_root:
                raise ValueError("Transaction list root hash does not match")
            # receipts trie populated by add_transaction_to_list is incorrect
            # (it doesn't know intermediate states), so reset it
            self.receipts = Trie(self.db, header.receipts_root)

processblock.apply_transaction

def apply_transaction(block, tx):
    #交易合法性校验
    validate_transaction(block, tx)

    log_tx.debug('TX NEW', tx_dict=tx.log_dict())
    # start transacting #################
    #sender交易数nonce加1
    block.increment_nonce(tx.sender)
    # print block.get_nonce(tx.sender), '@@@'

    # buy startgas
    assert block.get_balance(tx.sender) >= tx.startgas * tx.gasprice
    #sender账户扣除startgas
    block.delta_balance(tx.sender, -tx.startgas * tx.gasprice)
    message_gas = tx.startgas - intrinsic_gas_used(tx)
    #合约代码
    message_data = vm.CallData([safe_ord(x) for x in tx.data], 0, len(tx.data))
    #交易
    message = vm.Message(tx.sender, tx.to, tx.value, message_gas, message_data, code_address=tx.to)

    # MESSAGE
    ext = VMExt(block, tx)
    if tx.to and tx.to != CREATE_CONTRACT_ADDRESS:
        result, gas_remained, data = apply_msg(ext, message)
        log_tx.debug('_res_', result=result, gas_remained=gas_remained, data=data)
    else:  # CREATE
        result, gas_remained, data = create_contract(ext, message)
        assert utils.is_numeric(gas_remained)
        log_tx.debug('_create_', result=result, gas_remained=gas_remained, data=data)

    assert gas_remained >= 0

    log_tx.debug("TX APPLIED", result=result, gas_remained=gas_remained,
                 data=data)

    if not result:  # 0 = OOG failure in both cases
        log_tx.debug('TX FAILED', reason='out of gas',
                     startgas=tx.startgas, gas_remained=gas_remained)
        block.gas_used += tx.startgas
        #给矿工手续费
        block.delta_balance(block.coinbase, tx.gasprice * tx.startgas)
        output = b''
        success = 0
    else:
        log_tx.debug('TX SUCCESS', data=data)
        gas_used = tx.startgas - gas_remained
        block.refunds += len(block.suicides) * opcodes.GSUICIDEREFUND
        if block.refunds > 0:
            log_tx.debug('Refunding', gas_refunded=min(block.refunds, gas_used // 2))
            gas_remained += min(block.refunds, gas_used // 2)
            gas_used -= min(block.refunds, gas_used // 2)
            block.refunds = 0
        # sell remaining gas
        block.delta_balance(tx.sender, tx.gasprice * gas_remained)
        block.delta_balance(block.coinbase, tx.gasprice * gas_used)
        block.gas_used += gas_used
        if tx.to:
            output = b''.join(map(ascii_chr, data))
        else:
            output = data
        success = 1
    block.commit_state()
    suicides = block.suicides
    block.suicides = []
    for s in suicides:
        block.ether_delta -= block.get_balance(s)
        block.del_account(s)
    block.add_transaction_to_list(tx)
    block.logs = []
    return success, output

block.commit_state()
更新state trie账户状态

    def commit_state(self):
        """Commit account caches"""
        """Write the acount caches on the corresponding tries."""
        changes = []
        if len(self.journal) == 0:
            # log_state.trace('delta', changes=[])
            return
        addresses = sorted(list(self.caches['all'].keys()))
        for addr in addresses:
            #获取一个账户
            acct = self._get_acct(addr)

            # storage
            for field in ('balance', 'nonce', 'code', 'storage'):
                if addr in self.caches[field]:
                    v = self.caches[field][addr]
                    changes.append([field, addr, v])
                    #实例化account对象
                    setattr(acct, field, v)
            #更新account的storage trie
            t = SecureTrie(Trie(self.db, acct.storage))
            for k, v in self.caches.get(b'storage:' + addr, {}).items():
                enckey = utils.zpad(utils.coerce_to_bytes(k), 32)
                val = rlp.encode(v)
                changes.append(['storage', addr, k, v])
                # if self.number > 18280 and False:
                # try:
                # self.db.logging = True
                # except:
                # pass
                # sys.stderr.write("pre: %r\n" % self.account_to_dict(addr)['storage'])
                # sys.stderr.write("pre: %r\n" % self.get_storage(addr).root_hash.encode('hex'))
                # sys.stderr.write("changed: %s %s %s\n" % (encode_hex(addr), encode_hex(enckey), encode_hex(val)))
                if v:
                    t.update(enckey, val)
                else:
                    t.delete(enckey)
            acct.storage = t.root_hash
            #更新state trie中该账户的数据,从新生成root hash
            self.state.update(addr, rlp.encode(acct))
        log_state.trace('delta', changes=changes)
        self.reset_cache()
        self.db.put_temporarily(b'validated:' + self.hash, '1')

block.add_transaction_to_list(tx)
根据transaction list 重新构建transaction trie

    def add_transaction_to_list(self, tx):
        """Add a transaction to the transaction trie. Note that this does not execute anything, i.e. the state is not updated. """
        k = rlp.encode(self.transaction_count)
        self.transactions.update(k, rlp.encode(tx))
        r = self.mk_transaction_receipt(tx)
        self.receipts.update(k, rlp.encode(r))
        self.bloom |= r.bloom  # int
        self.transaction_count += 1

所有交易都添加完后,添加矿工奖励,self.finalize()

    def finalize(self):
        """Apply rewards and commit."""
        delta = int(BLOCK_REWARD + NEPHEW_REWARD * len(self.uncles))
        self.delta_balance(self.coinbase, delta)
        self.ether_delta += delta

        for uncle in self.uncles:
            r = BLOCK_REWARD * \
                (UNCLE_DEPTH_PENALTY_FACTOR + uncle.number - self.number) \
                / UNCLE_DEPTH_PENALTY_FACTOR
            r = int(r)
            self.delta_balance(uncle.coinbase, r)
            self.ether_delta += r
        self.commit_state()

所有交易添加完后,再来校验state root ,transaction root 是否与block header 中的值一致,不一致的化,区块不合法。Block.init

        if parent:
            must_equal('prev_hash', self.prevhash, parent.hash)
        must_equal('gas_used', original_values['gas_used'], self.gas_used)
        must_equal('timestamp', self.timestamp, original_values['timestamp'])
        must_equal('difficulty', self.difficulty, original_values['difficulty'])
        must_equal('uncles_hash', utils.sha3(rlp.encode(uncles)), original_values['uncles_hash'])
        assert header.block is None
        must_equal('state_root', self.state.root_hash, header.state_root)
        must_equal('tx_list_root', self.transactions.root_hash, header.tx_list_root)
        must_equal('receipts_root', self.receipts.root_hash, header.receipts_root)
        must_equal('bloom', self.bloom, original_values['bloom'])

        # from now on, trie roots refer to block instead of header
        header.block = self

        # Basic consistency verifications
        if not self.check_fields():
            raise ValueError("Block is invalid")
        if len(self.header.extra_data) > MAX_EXTRADATA_LENGTH:
            raise ValueError("Extra data cannot exceed %d bytes" % MAX_EXTRADATA_LENGTH)
        if self.header.coinbase == '':
            raise ValueError("Coinbase cannot be empty address")
        if not self.state.root_hash_valid():
            raise ValueError("State Merkle root of block %r not found in "
                             "database" % self)
        if (not self.is_genesis() and self.nonce and not self.header.check_pow()):
            raise ValueError("PoW check failed")
        if b'validated:' + self.hash not in self.db:
            if self.number == 0:
                self.db.put(b'validated:' + self.hash, '1')
            else:
                self.db.put_temporarily(b'validated:' + self.hash, '1')

至此,区块校验完成,之后便是将区块加入本地区块链

现在再回到chain service 最开始的_add_block函数中来
将区块加入本地链self.chain.add_block


    def add_block(self, block, forward_pending_transactions=True):
        "returns True if block was added sucessfully"
        _log = log.bind(block_hash=block)
        # make sure we know the parent
        if not block.has_parent() and not block.is_genesis():
            _log.debug('missing parent')
            return False
        #再次检查矿工引用的uncle区块是否合法
        if not block.validate_uncles():
            _log.debug('invalid uncles')
            return False

        if not len(block.nonce) == 8:
            _log.debug('nonce not set')
            return False
        #再次检查工作量证明是否合法
        elif not block.header.check_pow(nonce=block.nonce) and\
                not block.is_genesis():
            _log.debug('invalid nonce')
            return False
        #再检查下父区块是否是区块头中指定的父区块
        if block.has_parent():
            try:
                processblock.verify(block, block.get_parent())
            except processblock.VerificationFailed as e:
                _log.critical('VERIFICATION FAILED', error=e)
                f = os.path.join(utils.data_dir, 'badblock.log')
                open(f, 'w').write(to_string(block.hex_serialize()))
                return False

        if block.number < self.head.number:
            _log.debug("older than head", head_hash=self.head)
            # Q: Should we have any limitations on adding blocks?

        self.index.add_block(block)
        #存入本地chain
        self._store_block(block)

        # set to head if this makes the longest chain w/ most work for that number
        if block.chain_difficulty() > self.head.chain_difficulty():
            _log.debug('new head')
            #更新head candidate,这个其实只针对矿工,只要不挖矿,就不用去管候选头区块也不用去打包交易
            self._update_head(block, forward_pending_transactions)
        elif block.number > self.head.number:
            _log.warn('has higher blk number than head but lower chain_difficulty',
                      head_hash=self.head, block_difficulty=block.chain_difficulty(),
                      head_difficulty=self.head.chain_difficulty())
        block.transactions.clear_all()
        block.receipts.clear_all()
        block.state.db.commit_refcount_changes(block.number)
        block.state.db.cleanup(block.number)
        self.commit()  # batch commits all changes that came with the new block
        return True

至此,新区块成功写入本地

阅读更多

更多精彩内容