Abstract
The Internet's transformation from a decentralized network of networks into a landscape dominated by centralized platforms has created systemic risks including infrastructure fragility, censorship vulnerabilities, and concentrated control. While decentralized peer-to-peer systems offer architectural alternatives, they suffer from a fundamental architectural knowledge imbalance: centralized platforms observe all user interactions enabling sophisticated optimization, whereas peer-to-peer networks lack unified observability, hindering both performance analysis and targeted improvements. This dissertation addresses this challenge through comprehensive characterization and optimization of the InterPlanetary File System (IPFS), a prominent peer-to-peer storage network. IPFS's hybrid architecture of combining peer-to-peer networking with strategic centralized components creates fragmented visibility where different subsystems offer complementary but partial vantage points. The research develops measurement methodologies that synthesize these fragmented observations into comprehensive network understanding. It then leverages empirical insights for a targeted protocol optimization. The investigation spans three critical measurement domains. Network topology and content routing performance characterization identified Distributed Hash Table publication latency as the critical bottleneck, with median latencies of 27.7 seconds from Europe. Peer connectivity and Network Address Translation traversal measurements through a novel honeypot methodology established a 70% success rate for decentralized hole-punching across 4.4 million measurements from 167 countries and 85k networks. Usage patterns and content governance analysis revealed a concentration-replication duality where single peers hosted up to 63% of denylist content yet widespread replication ensured only 0.1% of content remained uniquely hosted, complicating coordinated takedown efforts. IPFS' systematic characterization enabled the design and deployment of Optimistic Provide, a backward-compatible protocol optimization achieving order-of-magnitude performance improvements: sub-second publication latency for 90% of operations while reducing network overhead by 40%. The research demonstrates that systematic synthesis of partial observations can overcome the architectural knowledge imbalance, enabling evidence-based optimizations that enhance peer-to-peer systems' viability as alternatives to centralized architectures.