— Peer-reviewed work

Publications

ProbeLab team members regularly publish in world-class academic venues. Explore our articles below.

2024
EthResearch Technical Report ·

Bandwidth Availability in Ethereum: Regional Differences and Network Impacts

Mikel Cortes · Yiannis Psaras

This comprehensive bandwidth availability study by Mikel Cortes and Yiannis Psaras from ProbeLab measures the capacity of Ethereum nodes to handle increased blob propagation by analyzing data from 9,179 nodes across four geographical regions (California, Virginia, Frankfurt, and Sydney) over six days in November 2024. Using a custom tool called "net-probe" that saturates nodes' uplinks through block-by-range RPC calls, the researchers found significant geographical disparities in available bandwidth: 60% of nodes in Europe and North America exceed 20 Mbps upload capacity, while only 20% of Australian nodes achieve this threshold due to the distribution of 65% of Ethereum's nodes in the US and Europe. The study demonstrates that bandwidth availability drops 9-13% during the first four seconds of each slot when blocks and blobs are broadcast, yet nodes still maintain 18-23 Mbps of mean available capacity during these critical periods. The analysis of blob distribution during the measurement period shows 35% of slots contained no blobs and 42% contained 5-6 blobs, while cloud-hosted nodes demonstrated approximately 5 Mbps more available bandwidth than non-cloud deployments. The authors conclude that the proposed increase in blob target and maximum values from 3/6 to 6/9 represents a reasonable adjustment given current network capacity, though they emphasize the importance of implementing bandwidth-saving improvements such as PeerDAS and modified Gossipsub protocols for future scalability.

Read more
BIOTC '24 Conference Paper ·

Scalability Limitations of Kademlia DHTs when Enabling Data Availability Sampling in Ethereum

Mikel Cortes-Goicoechea · Csaba Kiraly · Dmitriy Ryajov · Jose Luis Muñoz-Tapia · Leonardo Bautista-Gomez

Scalability in blockchain remains a significant challenge, especially when prioritizing decentralization and security. The Ethereum community has proposed comprehensive data-sharding techniques to overcome storage, computational, and network processing limitations. In this context, the propagation and availability of large blocks become the subject of research to achieve scalable data-sharding. This paper provides insights after exploring the usage of a Kademlia-based Distributed Hash Table (DHT) to enable Data Availability Sampling (DAS) in Ethereum. It presents a DAS-DHT simulator to study this problem and validates the results of the simulator with experiments in a real DHT network, InterPlanetary File System (IPFS). Our results help us understand what parts of DAS can be achieved based on existing Kademlia DHT solutions and which ones cannot. We discuss the limitations of DHT solutions and discuss other alternatives.

Read more
BIOTC '24 Conference Paper ·

An Analytical Study of Large Blocks on Ethereum

Patrick Ocheja · Mikel Cortes-Goicoechea · Tarun Mohandas-Daryanani · Brendan Flanagan · Hiroaki Ogata · Jose Luis Munoz · Leonardo Bautista-Gomez

Ethereum's application layer data, especially the analysis of large blocks, holds critical insights into its network dynamics and future developments. In this study, we systematically analyze various components of this data, including blocks, transactions, gas prices, and address interactions. This study places a significant emphasis on systematically analyzing Ethereum blocks of size 500KB and above, a crucial element considering the impending Ethereum Improvement Proposals (EIPs) like EIP4844. EIP4844 proposes the introduction of a new type of data, known as "blob data" specifically designed to be used in rollups, which are off-chain aggregation of transactions within a single on-chain transaction. Our analysis includes blocks, transactions, gas prices, and address interactions, with a special focus on the characteristics and implications of large block sizes. We observe notable trends such as block fullness, transaction count fluctuations, and gas price variations. Importantly, our findings reveal that a substantial number of Ethereum blocks exceed the expected size of 1.875MB, reaching up to 2MB. This is particularly relevant in the context of EIP4844, as many of these large blocks might be related to rollup data, which is a cornerstone of Ethereum's scalability strategy. We also discovered a moderate negative correlation between block sizes and number of transactions contained in them. Similarly, average daily gas prices tend to decrease with an increase in block sizes. These insights are invaluable for the blockchain community, offering guidance to developers and users for optimizing transaction strategies and managing costs in anticipation of future network changes. Our study not only contributes to a deeper understanding of Ethereum's current state but also provides a foundational analysis for assessing the impact of rollups and other scalability solutions on Ethereum's evolving ecosystem.

Read more
University Other ·

A Deep Dive into Ethereum's PoS Transition: Protocol Design Choices and their Empirical Unexpected Limitations

Mikel Cortes-Goicoechea

The advent of the internet, marked by pivotal developments such as the launch of Arpanet and the standardization of HTTP, has irrevocably changed the fabric of modern society. Centralized platforms like Microsoft, Google, Apple, and Amazon have dominated this digital landscape, offering many services ranging from cloud computing to online storage. However, the centralized nature of these services has raised significant concerns regarding user privacy, data integrity, and the potential for censorship. In response to these issues, the open-source community has explored peer-to-peer alternatives, notably in the realm of distributed file systems, ledgers, and blockchain technology. Blockchains, popularized by the emergence of Bitcoin, promote a democratized service model that challenges the centralized status quo. Yet, they are not without their own challenges, including decentralization, security, privacy, and performance. This thesis delves into the nuances of blockchain technology, focusing on Ethereum's transition from Proof of Work (PoW) to Proof of Stake (PoS) and its implications on network hardware requirements, topology, and overall performance. The development of Ethereum serves as a small-scale reflection of the broader ambitions and challenges in transitioning to Decentralized Finance (DeFi) platforms. Despite significant theoretical advancements in consensus mechanisms and scalability solutions, real-world implementations and experimental validations remain sparse. This work aims to bridge this gap by comprehensively analysing Ethereum's PoS transition by examining the interlaced relationships between software logic, hardware configurations, and network dynamics. Through novel measurement models and tools, this thesis contributes to a deeper understanding of how Ethereum's architectural changes impact its ecosystem and its participants' behaviours. Lastly, the research presented in this thesis illustrates the technical and operational challenges facing Ethereum and similar blockchain platforms and proposes a series of contributions that advance the field. This work empirically analyses the future enhancements in blockchain technology by exploring the implications of the network and its topology, to the viability of decentralized validation processes, and the potential for scaling solutions like Data Availability Sampling. The open-source tools and methodologies developed within the thesis scope represent the commitment to transparency and collaboration, which follows the spirit of the decentralized communities it seeks to serve. Through a mix of theoretical exploration and empirical research, this thesis aims to provide a deeper and more detailed understanding of Ethereum PoS' design choices, its capabilities and the limitations this one represents in future steps and upgrades, leading the way for more resilient, scalable, and decentralized digital infrastructures.

Read more
Usenix Sec '24 Conference Paper ·

Guardians of the Galaxy: Content Moderation in the InterPlanetary File System

Saidu Sokoto · Leonhard Balduf · Dennis Trautwein · Yiluo Wei · Gareth Tyson · Ignacio Castro · Onur Ascigil · George Pavlou · Maciej Korczyński · Björn Scheuermann · Michał Król

The Interplanetary File System (IPFS) is one of the largest platforms in the growing "Decentralized Web". The increasing popularity of IPFS has attracted large volumes of users and content. Unfortunately, some of this content could be considered "problematic". Content moderation is always hard. With a completely decentralized infrastructure and administration, content moderation in IPFS is even more difficult. In this paper, we examine this challenge. We identify, characterize, and measure the presence of problematic content in IPFS (e.g. subject to takedown notices). Our analysis covers 368,762 files. We analyze the complete content moderation process including how these files are flagged, who hosts and retrieves them. We also measure the efficacy of the process. We analyze content submitted to denylist, showing that notable volumes of problematic content are served, and the lack of a centralized approach facilitates its spread. While we identify fast reactions to takedown requests, we also test the resilience of multiple gateways and show that existing means to filter problematic content can be circumvented. We end by proposing improvements to content moderation that result in 227% increase in the detection of phishing content and reduce the average time to filter such content by 43%.

Read more
EuroS&P '24 Conference Paper ·

DISC-NG: Robust Service Discovery in the Ethereum Global Network

Michał Król · Onur Ascigil · Sergi Rene · Alberto Sonnino · Matthieu Pigaglio · Ramin Sadre

The Ethereum Global Network (EGN) hosts a complete ecosystem of decentralized services, including blockchains such as Ethereum mainnet but also exchange markets, content delivery networks, and many more. Service discovery is a fundamental mechanism in the EGN, allowing new nodes to look up and connect to other nodes already participating in one of these services. The current service discovery of the EGN, DISCv5, is not scalable and efficient enough to support the current and future needs of the ecosystem. We present DISC-NG, a novel service discovery protocol for the EGN that is scalable, efficient, and secure. DISC-NG leverages the EGN-wide DHT to allow service participation advertisements to meet service discovery requests. DISC-NG compensates the unbalance in service popularity and minimizes the potential for abuse by malicious nodes. We implement DISC-NG in devp2p, the network stack used by the majority of clients connecting to the EGN, as well as in a large-scale simulator. DISC-NG can discover services in the EGN faster than DISCv5 while being more robust to malicious nodes. DISC-NG is now in a staging phase and scheduled for deployment as an improvement to DISCv5.

Read more
EthResearch Technical Report ·

Gossipsub Message Propagation Latency

Mikel Cortes · Yiannis Psaras

This message propagation latency study by Yiannis Psaras and Mikel Cortes from ProbeLab investigates how quickly GossipSub delivers blocks across Ethereum's peer-to-peer network using data from three days of measurements (June 14-16) from the Ethereum Foundation's Xatu monitoring nodes deployed across three continents (Europe, North America, and Australia) running all major consensus clients. The analysis reveals that 98% of beacon blocks arrive within the critical 4-second propagation window required to prevent network forks, though a small fraction of outliers arrive as late as 12 seconds. Examination of per-client performance shows distinct patterns, with Teku and Prysm nodes receiving messages fastest while Lodestar exhibits the longest arrival times and highest variance, though the authors note these differences may relate to how different implementations timestamp message arrivals in their validation logic. Geographic analysis demonstrates that European nodes enjoy a modest ~0.6-second latency advantage over North American and Oceanian nodes, highlighting a slight centralization incentive toward lower-latency network core regions, though current differences remain within acceptable safety margins. The study finds no significant correlation between block size (mostly 50-150 KB) and arrival time, and concludes that despite these minor geographic and client-based differences, message propagation latency is generally well-controlled and sufficient to maintain network stability.

Read more
EthResearch Technical Report ·

Ethereum Node Message Propagation Bandwidth Consumption

Mikel Cortes · Yiannis Psaras

This bandwidth consumption analysis by Yiannis Psaras and Mikel Cortes from ProbeLab investigates the GossipSub protocol components responsible for message propagation bandwidth usage in Ethereum's peer-to-peer network using the Hermes monitoring tool. Analyzing a 3.5-hour trace of GossipSub traffic, the study reveals that sent messages (SENT_MSG) consume the largest share at 53% of total bandwidth (69% of outbound), followed surprisingly by control messages, with SENT_IHAVE messages accounting for 23.4% of total bandwidth and 30% of outbound traffic, and received IHAVE messages representing 10% of total bandwidth and 42% of inbound traffic. The analysis shows that IHAVE and IHAVE-related control traffic consumes approximately 400 KB/s collectively, representing a major optimization opportunity, while received duplicates account for 7.3% of total bandwidth compared to only 3.6% for original messages. The study validates findings from prior research on IHAVE/IWANT effectiveness and emphasizes that duplicates collectively represent approximately 42% of total bandwidth consumption. The authors project that standard Ethereum nodes consume approximately 386 KB/s inbound and 580 KB/s outbound (including execution layer), and strongly recommend adoption of GossipSub 1.2 to eliminate duplicate message bandwidth, as this would provide significant network-wide efficiency gains while remaining a small fraction of typical household bandwidth availability.

Read more
EthResearch Technical Report ·

Number Duplicate Messages in Ethereum’s Gossipsub Network

Mikel Cortes · Yiannis Psaras

This duplicate message analysis by Yiannis Psaras from ProbeLab investigates the prevalence and characteristics of duplicate messages in Ethereum's GossipSub network using the Hermes monitoring tool over a 3.5-hour period on Holesky testnet. The study establishes a theoretical framework predicting that with a mesh degree (k) of 8, nodes should receive approximately 3 duplicates per message (calculated as (k-2)/2), and confirms this for certain message types while revealing significant variations across different topics: beacon_block messages show almost no non-duplicated instances (1-2%) with 54% receiving the expected 3 duplicates but outliers reaching 34-40 copies, while smaller, more frequent messages like beacon_aggregate_and_proof show 32-45% with no duplicates and 50% with fewer than 2 duplicates. The analysis finds no correlation between message size and duplicate count, and reveals a critical temporal pattern where 50% of duplicate arrivals occur within 73 milliseconds of the original message, enabling potential optimization. The study identifies that numerous duplicates originate from IWANT messages sent milliseconds before the same message arrives through mesh propagation, leading to two key recommendations: implementing a concurrency limiter on IWANT messages (similar to Kademlia's alpha parameter) and increasing the heartbeat interval from 0.7 to 1.0 seconds to reduce excessive IHAVE messages and duplicate-generating race conditions. The authors conclude that GossipSub 1.2's proposed IDONTWANT control message would be valuable for preventing the majority of duplicates.

Read more
EthResearch Conference Paper ·

Gossipsub Network Dynamicity through GRAFTs and PRUNEs

Mikel Cortes · Yiannis Psaras

This network dynamicity study by Yiannis Psaras from ProbeLab analyzes GossipSub's mesh stability by investigating GRAFT and PRUNE message frequencies, session durations, and network stability through a 3.5-hour trace using the Hermes monitoring tool on Holesky testnet. The analysis reveals that despite increased dynamicity observed through GRAFT and PRUNE events, GossipSub successfully maintains stable mesh topology with the number of mesh peers per topic consistently remaining between the DLow (6) and DHigh (12) thresholds around the target of 8 peers. The study finds that 80% of peer connections are relatively short (dropping within seconds), while 10% persist for approximately 4 minutes and the remaining 10% maintain connections from 5 minutes to 1.6 hours; notably, Lodestar and Nimbus clients maintain significantly longer connection durations compared to Teku nodes, which consistently disconnect almost immediately. A notable spike in GRAFT and PRUNE events occurs during the final hour of the measurement period, with subsequent analysis identifying this as stemming from GossipSub peer scoring mechanisms responding to peers forwarding invalid messages on the voluntary_exit topic. The authors conclude that while the Hermes node observes elevated mesh connectivity dynamics, GossipSub maintains a healthy mesh structure with stable peer degrees despite these fluctuations, and the observed anomalies do not appear to impact broader network operation.

Read more
EthResearch Technical Report ·

Gossip IWANT/IHAVE Effectiveness in Ethereum’s Gossipsusb Network

Mikel Cortes · Yiannis Psaras

This study by Yiannis Psaras from ProbeLab examines the efficiency of Gossipsub's IHAVE/IWANT control message mechanism in Ethereum's peer-to-peer network using the custom-built Hermes monitoring tool, which traces all GossipSub protocol interactions. Based on a 3.5-hour trace of network activity, the research reveals that the gossip mechanism is substantially inefficient, with ratios of sent IHAVE message IDs to received IWANT message IDs reaching approximately 1:100 for beacon blocks, 1:10 for beacon aggregate and proofs, and 1:6 for sync committee contributions—indicating that over 10 times more bandwidth is consumed by control messages than what is actually needed. The analysis shows that IHAVE and IWANT messages serve different purposes depending on message type: less frequent large messages (like beacon blocks) rely primarily on mesh propagation and receive minimal benefit from gossip, while very frequent small messages incur substantial control message overhead. The study identifies implementation anomalies in Teku nodes that send IHAVE and IWANT messages with empty topics, and proposes three optimization directions: replacing message ID lists with bloom filters, reducing the GossipsubHistoryGossip parameter from 3 to 2 heartbeats, and implementing adaptive GossipFactor per topic to balance bandwidth efficiency with network robustness. The authors emphasize that while IHAVE/IWANT mechanisms are essential for network resilience during attacks and anomalies, significant optimization potential exists under normal operating conditions.

Read more
INFOCOM '24 Conference Paper ·

IPFS in the Fast Lane: Accelerating Record Storage with Optimistic Provide

Dennis Trautwein · Yiluo Wei · Yiannis Psaras · Moritz Schubotz · Ignacio Castro · Bela Gipp

The centralization of web services has raised concerns about critical single points of failure, such as content hosting, name resolution, and certification. To address these issues, the "Decentralized Web" movement advocates for de-centralized alternatives. Distributed Hash Tables (DHTs) have emerged as a key component facilitating this movement, as they offer efficient key/value indexing. The InterPlanetary File System (IPFS) exemplifies this approach by leveraging DHTs for data indexing and distribution. A critical finding of previous studies is that DHT PUT performance for record storage is unacceptably slow, sometimes taking minutes to complete and hindering the adoption of delay-intolerant applications. To address this challenge, this research paper presents three significant contributions. First, we present the design of Optimistic Provide, an approach to accelerate DHT PUT operations in Kademlia-based IPFS networks while maintaining full backward compatibility. Second, we implement and deploy the mechanism and see its usage in the de-facto IPFS deployment, Kubo. Third, we evaluate its effectiveness in the IPFS and Filecoin DHTs. We confirm that we enable sub-second record storage from North America and Europe for 90% of PUT operations while reducing networking overhead by over 40% and maintaining record availability.

Read more
NSDI '24 Conference Paper ·

The Eternal Tussle: Exploring the Role of Centralization in IPFS

Yiluo Wei · Dennis Trautwein · Yiannis Psaras · Ignacio Castro · Will Scott · Aravindh Raman · Gareth Tyson

Web centralization and consolidation has created potential single points of failure, e.g., in areas such as content hosting, name resolution, and certification. The "Decentralized Web", led by open-source software implementations, attempts to build decentralized alternatives. The InterPlanetary File System (IPFS) is part of this effort and attempts to provide a decentralized layer for object storage and retrieval. This comes with challenges, though: Decentralization can increase complexity, overhead, as well as compromise performance and scalability. As the core maintainers of IPFS, we have therefore begun to explore more hybrid approaches. This paper reports on our experiences building three centralized components within IPFS: (i) InterPlanetary Network Indexers, which provides an alternative centralized method for content indexing; (ii) Hydra Boosters, which are strategic DHT nodes that assist IPFS in content routing; and (iii) HTTP Gateways, which are a public access point for users to retrieve IPFShosted content. Through this approach, we trade-off the level of decentralization within IPFS in an attempt to gain certain benefits of centralization. We evaluate the performance of these components and demonstrate their ability to successfully address the challenges that IPFS faces.

Read more
NDSS '24 Conference Paper ·

Content Censorship in the InterPlanetary File System

Srivatsan Sridhar · Onur Ascigil · Navin Keizer · François Genon · Sébastien Pierre · Yiannis Psaras · Etienne Rivière · Michał Król

The InterPlanetary File System (IPFS) is currently the largest decentralized storage solution in operation, with thousands of active participants and millions of daily content transfers. IPFS is used as remote data storage for numerous blockchain-based smart contracts, Non-Fungible Tokens (NFT), and decentralized applications. We present a content censorship attack that can be executed with minimal effort and cost, and that prevents the retrieval of any chosen content in the IPFS network. The attack exploits a conceptual issue in a core component of IPFS, the Kademlia Distributed Hash Table (DHT), which is used to resolve content IDs to peer addresses. We provide efficient detection and mitigation mechanisms for this vulnerability. Our mechanisms achieve a 99.6% detection rate and mitigate 100% of the detected attacks with minimal signaling and computational overhead. We followed responsible disclosure procedures, and our countermeasures are scheduled for deployment in the future versions of IPFS.

Read more