Nebula Logo

Nebula #

Nebula is a libp2p DHT crawler and monitor that tracks the liveliness of peers. The crawler connects to DHT bootstrap peers and then recursively follows all entries in their k-buckets until all peers have been visited. The crawler supports the IPFS, Filecoin, Ethereum, Celestia, Avail, Polkadot, Kusama, Rococo, Westend networks and more.

GitHub

Components #

Nebula is split into two components 1) a crawler and 2) a monitor. The crawler is responsible for discovering peers in the DHT and the monitor is responsible for tracking their uptime.

How does it work? #

Crawler #

The Kademlia DHT network is a distributed system where each peer maintains a routing table containing other peers in the network with specific XOR distances to itself. These peers are grouped into so-called k-buckets based on the number of leading zeroes of the XOR between other’s PeerIDs and the own PeerID. Also called shared prefix bits or common prefix length. For example, bucket 0 includes nodes without any shared prefix bits, and bucket 3 contains peers where the first three bits of both PeerIDs match. Each bucket contains a maximum of 20 entries.

To begin the crawl process, Nebula connects to a configurable set of bootstrap peers and successively follows every peer in their routing tables until it doesn’t find any new peer. Given that Nebula knows the other peers' PeerIDs, it also knows about the k-buckets they maintain. In order to gather information from the routing tables of other peers, Nebula generates random keys that have a certain number of leading zeroes. These keys fall into each of the buckets that the other peers maintain. Nebula sends these keys to the other peers and asks if they know any peers that are closer to those keys. In response, the other peers provide Nebula with the closest peers they know of, which are all the peers in their respective buckets. Nebula performs this request in parallel for buckets numbered 0 to 15 [source].

When a new peer is discovered, the crawler records the start of a session of availability and extends the session length with every subsequent successful connection attempt. In the opposite case, a failed connection terminates the session, and a later successful attempt starts a new session. Depending on the error that the connection attempt returns, Nebula will retry to connect or immediately mark the peer as offline. For example, if the other peer responds with an error that indicates that their resource limit is exceeded, then Nebula retries to connect another two times after five and ten seconds.

Monitor #

The monitoring process periodically queries the database for peers that Nebula considers to be online and tries to connect to them. That way we try to gather a precise measurement of the uptime. If the peer is dialable Nebula updates the session with the new uptime and if the peer was not dialable it “closes” the session. The peer is now considered offline. This allows for precise peer churn measurements.

The longer a peer is seen online, the lower the frequency of connection attempts that is made by Nebula to this particular peer. This is based on the assumption that if a peer remains online for a while, it is more likely to continue being online. Conversely, if Nebula discovers a new peer in the network, there is a high probability that it will go offline (churn) relatively quickly.

Nebula calculates when the next probe is due with the following formula

NOW + floor(max_interval, ceil(min_interval, 1.2 * (NOW - previous_successful_dial))

The maximum interval is set to 15m and the minimum interval is set to 1m.

The monitoring process doesn’t establish a proper libp2p connection (which involves the protocol negotiation) but only dials the peer on a transport level by calling DialPeer. It also liberally retries dialing peers if errors occur [source].

What data does Nebula gather? #

The crawler component establishes a proper libp2p connection to the remote peer. This means that Nebula and the remote peer exchange the list of supported protocols and user agent information. Furthermore, in order to connect to the remote peer, Nebula must have knowledge of the network addresses of the remote peer. It also measures the latency to dial, connect, and crawl. The dial latency includes only the establishment of a connection on the transport level and the connect latency also includes the protocol handshake. The crawl latency measures how long it took to extract information from other peers k-buckets.

To summarize, Nebula gathers the following information about all peers it was able to connect to:

peer ID
user agent
supported protocols
all advertised Multiaddresses
connection latencies (dial, connect, crawl - see above)
potential connection errors

The monitoring component periodically checks if peers are still online. This allows us to additionally measure sessions of uptime for each peer.

Because we crawl and probe the network periodically, and because Multiaddresses contain IP addresses we can also answer the following questions:

When do peers update their IPFS node?
How long has a peer been online?
In which country/city is a peer located? (powered by ipregistry)
Does the peer run in a data center? (powered by ipregistry)

On top of the above, Nebula also tracks neighbor information. We consider peers in k-buckets to be neighbors of the peer who maintains these k-buckets. This information spans a graph where each node is a peer and each edge corresponds to a k-bucket entry.

Deployment #

We are running the crawler to measure the following networks:

Network	Crawl Frequency	Crawl duration
IPFS	every 2 hours	5 minutes
Filecoin	every 2 hours	1 minute
Ethereum CL	every 2 hours	20 minutes
Ethereum EL	every 2 hours	23 minutes
Polkadot	every 2 hours	3 minutes
Celestia	every 2 hours	1 minute
Avail	every 2 hours	2 minutes

Contributing #

Feel free to head over to the GitHub repository and dive in! Open an issue or submit PRs.

GitHub