Skip to main content Link Menu Expand (external link) Document Search Copy Copied

Motivation

Bitcoin nodes can connect to the P2P network via multiple network types; naively interpreting each reachable addresses as a node thus leads to an overestimation of the actual number of nodes. Consider, for example, a single node which connects to the network via IPv4, IPv6, and Tor, in this way contributing three distinct reachable addresses. Without additional information, there is no way of telling whether these three reachable addresses belong to three distinct nodes, each connected via a single network type; to two distinct nodes, one connected via two network types, the other via one; or to a single node, connected to the network via three network types.

Methodology

Fortunately, there is a way to infer the number of network types used by a node to connect to the P2P network. Whenever a node receives node advertisements from its peers, it only stores those addresses whose network types the node supports and discards all other addresses (e.g., an IPv4-only Bitcoin node will only retain advertised nodes with IPv4 addresses; an IPv4/Tor node only IPv4 and Tor addresses). Consequently, whenever a node is advertising nodes to its peers, it does so from a pool comprising exclusively of addresses with network types used by the node. Inspecting the network types of node addresses advertised by a peer thus yields insights into the network types used by that peer.

The figures below illustrate this concept: On the left, the purple node requests node advertisements from its three peers by sending a getaddr message to each of them. The response is shown on the right: each peer replies by sending one or more addr messages to advertise nodes.

The fact that the nodes advertised by the leftmost peer comprise exclusively IPv4 and Tor addresses is a strong indication that this peer is connected to the Bitcoin network via both IPv4 and Tor. Likewise, the peer advertising IPv4 and IPv6 nodes is presumably connected via IPv4 and IPv6. Finally, the peer advertising exclusively IPv4 nodes appears to be connected only via IPv4.

Node requests Node advertisements
Node request Node advertisements
Node sending getaddr message to its peers Node receiving addr replies from its peers

Note that in practice, node advertisements sent by peers in response to getaddr messages typically comprise around a thousand nodes, making the approach very robust (after all, the probability of a peer randomly selecting a thousand addresses from its pool of known nodes and missing one or more network types is practically zero).

The figure below shows an exemplary analysis of data obtained from nodes reachable via IPv4 addresses. Each IPv4 node on the Bitcoin network observed during data collection is represented on the (normalized) x axis. For each node, the y axis shows a breakdown of the advertised nodes into shares contributed by different network types. The data indicate that around 31% of IPv4 nodes advertised only IPv4 addresses. Around 57% of the nodes (between 31% and 88% on the x axis) advertised IPv4 and IPv6 addresses. The remaining 12% of nodes also advertised Torv3 (and, to a small degree, I2P) addresses. In line with the proposed methodology, this invites the conclusion that 31% of nodes reachable via IPv4 addresses are IPV4-only nodes, connected to the network using only one network type; 57% are IPv4/IPv6 nodes, connected via two network types; and around 12% are connected via three network types: IPv4, IPv6, and Tor.

Exemplified results
Exemplified results
Network-type composition of nodes advertised via addr messages from IPv4 peers.

Equipped with this information, it is trivial to estimate the number of unique Bitcoin nodes: instead of simple adding up reachable addresses, each address is weighed with the inverse of the number of different network types observed in node advertisements received from that addresses. This way, if a node is using n network types, each of its n addresses will be weighted by 1/n, resulting in an overall contribution of one for each node.

Caveat: While the proposed approach can handle nodes connected to the network using multiple network types, it fails in cases where a node connects to the network using multiple addresses of the same network type.