Filecoin logo

Filecoin

Retrieval Check

Dealbot Methodology

This document is the source of truth for how dealbot's Retrieval check works.

Source code links throughout this document point to the current implementation.

For event and metric definitions used by the dashboard, see Dealbot Events & Metrics.

Rendered from the dealbot documentation · cached for 24 hours

Overview

The Retrieval check tests that previously stored pieces from data storage checks remain retrievable over time. It runs on a separate schedule from data storage checks.

This is distinct from the inline retrieval verification in the data storage check, which confirms an SP can serve data immediately after indexing. The Retrieval check answers a different question: does the SP continue to serve data correctly after the initial storage operation?

Definition of Successful Retrieval

A successful retrieval requires ALL of:

  1. Randomly select a previously stored test piece of a dealbot-managed dataset.
  2. Verify the root CID is discoverable via IPNI and the SP is listed as a provider
  3. Perform /ipfs retrieval with the SP.
  4. Fetch the root CID from the SP and confirm the response is HTTP 2xx.
  5. Verify the root block hashes to its CID.
  6. Walk the DAG and repeat steps 4–5 for each child block.

Failure occurs if ANY required check fails (IPNI verification, download, or content verification) or the retrieval exceeds its max allowed time.

Operational timeouts exist to prevent jobs from running indefinitely. If a retrieval exceeds the configured limit (RETRIEVAL_JOB_TIMEOUT_SECONDS), it is marked as failed.

Note on location: Retrieval latency varies by dealbot-to-SP distance. Measurements reflect dealbot's probe location, not absolute SP performance. This check tests retrievability, not latency.

What Happens Each Cycle

The scheduler triggers retrieval testing on a configurable interval.

flowchart TD
  SelectPiece["Select random test piece for SP under test"] --> IPNIVerify["IPNI verification"]
  IPNIVerify --> RecordResult["Record result (success/failure)"]
  SelectPiece --> DownloadPiece["Download via SP IPFS gateway<br/>/ipfs/{rootCid}"]
  DownloadPiece --> ValidateData["Validate downloaded data"]
  ValidateData --> RecordResult

Piece Selection

Dealbot randomly selects one piece per SP for each scheduled retrieval job. The per‑SP job frequency is controlled by RETRIEVALS_PER_SP_PER_HOUR. Selection follows these constraints:

  • Only pieces from "data storage" check deals with overall status success (saved in the DB as DEAL_CREATED).
  • Only pieces with IPNI metadata enabled and a root CID.
  • Only pieces of size RANDOM_PIECE_SIZES (matched against metadata.ipfs_pin.originalSize).

Source: retrieval.service.ts (selectRandomDealsForRetrieval)

Retrieval Checks

For each selected piece, dealbot performs the following in parallel:

IPNI Verification

Retrieval checks only query the IPNI indexer to confirm the SP is listed as a provider for the root CID. We do not poll the SP for piece status in retrieval checks. The polling interval and timeout are controlled by IPNI_VERIFICATION_POLLING_MS and IPNI_VERIFICATION_TIMEOUT_MS.

/ipfs Retrieval

Dealbot retrieves the content by traversing the DAG rooted at the IPFS Root CID and fetching each block from the SP IPFS gateway.

  • Root URL: {serviceURL}/ipfs/{rootCID}?format=raw
  • Block URL: {serviceURL}/ipfs/{cid}?format=raw
  • Request: HTTP/2 with Accept: application/vnd.ipld.raw
  • Applicable when: Piece has IPNI metadata enabled with a root CID
  • What this tests: The SP can serve the root CID and all linked blocks in its DAG via the IPFS gateway

Source: apps/backend/src/retrieval-addons/strategies/ipfs-block.strategy.ts

What Gets Asserted

For each retrieval attempt:

# Assertion How It's Checked Sub Status Affected Retries Relevant Metric for Setting a Max Duration Implemented?
1 Valid provider record from filecoinpin.contact IPNI query for root CID returns a result that includes the SP as a provider Discoverability (discoverabilityStatus) unlimited polling with delay until timeout ipniVerifyMs Yes
2 IPFS content is retrievable All DAG block requests return 2xx status Retrieval (retrievalStatus) 0. Failure to establish a connection or getting a 5xx response marks the retrieval as failed. There is no retry. ipfsRetrievalLastByteMs Yes
3 Content integrity via CID Each fetched block is hash-verified against its CID during DAG traversal Retrieval (retrievalStatus) none - if we receive non-matching bytes it's a failure n/a (client-side) Yes
4 All checks pass Check is not marked successful until all assertions pass within window All sub-statuses above feed dataStorageStatus n/a retrievalCheckMs Yes

Retrieval Result Recording

Each retrieval step (post IPNI validation) creates a Retrieval entity in the database:

Field Description
dealId Which deal was tested
retrievalMethod Only sp_ipfs supported currently but in future could imagine sp_piece or cdn
retrievalEndpoint URL used for the download
status success or failed
responseCode HTTP status code
bytesRetrieved Actual bytes downloaded
latencyMs Total download time
ttfbMs Time to first byte
throughputBps Download throughput in bytes per second
errorMessage Error details (if failed)
retryCount Number of retry attempts (0 means the first attempt succeeded)

Source: retrieval.entity.ts

Source: retrieval-addons.service.ts, cdn.strategy.ts

Source: apps/backend/src/config/app.config.ts

Metrics Recorded

Metric definitions (including Prometheus metrics) live in Dealbot Events & Metrics.

retrievalStatus counts the /ipfs transport stage only (assertions 2 and 3). The IPNI assertion (1) is counted on discoverabilityStatus. There is no composite "retrieval check" counter; overall success comes from dataStorageStatus, which is success only when all sub-statuses succeed. See Deal Status Progression.

skipped.piece_missing

Emitted when a retrieval pre-flight probe to ${serviceUrl}/pdp/piece/:pieceCid/status returns 404. The deal is marked cleaned_up=true and removed from future retrieval candidate selection. This is not a failure of the SP's transport surface, but a signal that the piece no longer exists on the SP while the dataset is still live (for example, the SP scheduled a piece removal via PDP, or the piece dropped without an on-chain notification). The probe runs before IPNI verification and transport, so a 30s IPNI timeout is avoided on stale candidates. Search logs for retrieval_skipped_piece_missing to correlate.

Configuration

Key environment variables that control retrieval testing:

Variable Description
RETRIEVAL_INTERVAL_SECONDS Retrieval schedule interval in cron mode.
RETRIEVALS_PER_SP_PER_HOUR Retrieval rate per SP in pg-boss mode.
RETRIEVAL_JOB_TIMEOUT_SECONDS Max end-to-end retrieval job runtime before abort.
CONNECT_TIMEOUT_MS Connection/header timeout for HTTP requests.
HTTP_REQUEST_TIMEOUT_MS Total timeout for HTTP/1.1 retrieval requests.
HTTP2_REQUEST_TIMEOUT_MS Total timeout for HTTP/2 retrieval requests.
IPNI_VERIFICATION_TIMEOUT_MS Max time to wait for IPNI provider verification.
IPNI_VERIFICATION_POLLING_MS Poll interval between IPNI verification attempts.
IPFS_BLOCK_FETCH_CONCURRENCY Parallel block fetches during DAG traversal validation.
RANDOM_PIECE_SIZES Eligible original content sizes for random retrieval selection.

See also: docs/environment-variables.md for the full configuration reference.

Back to Onchain Cloud