Dealbot Methodology
This document is the source of truth for how dealbot's Retrieval check works.
Source code links throughout this document point to the current implementation.
For event and metric definitions used by the dashboard, see Dealbot Events & Metrics.
Rendered from the dealbot documentation · cached for 24 hours
The Retrieval check tests that previously stored pieces from data storage checks remain retrievable over time. It runs on a separate schedule from data storage checks.
This is distinct from the inline retrieval verification in the data storage check, which confirms an SP can serve data immediately after indexing. The Retrieval check answers a different question: does the SP continue to serve data correctly after the initial storage operation?
A successful retrieval requires ALL of:
/ipfs retrieval with the SP.Failure occurs if ANY required check fails (IPNI verification, download, or content verification) or the retrieval exceeds its max allowed time.
Operational timeouts exist to prevent jobs from running indefinitely. If a retrieval exceeds the configured limit (RETRIEVAL_JOB_TIMEOUT_SECONDS), it is marked as failed.
Note on location: Retrieval latency varies by dealbot-to-SP distance. Measurements reflect dealbot's probe location, not absolute SP performance. This check tests retrievability, not latency.
The scheduler triggers retrieval testing on a configurable interval.
flowchart TD
SelectPiece["Select random test piece for SP under test"] --> IPNIVerify["IPNI verification"]
IPNIVerify --> RecordResult["Record result (success/failure)"]
SelectPiece --> DownloadPiece["Download via SP IPFS gateway<br/>/ipfs/{rootCid}"]
DownloadPiece --> ValidateData["Validate downloaded data"]
ValidateData --> RecordResult
Dealbot randomly selects one piece per SP for each scheduled retrieval job. The per‑SP job frequency is controlled by RETRIEVALS_PER_SP_PER_HOUR. Selection follows these constraints:
DEAL_CREATED).RANDOM_PIECE_SIZES (matched against metadata.ipfs_pin.originalSize).Source: retrieval.service.ts (selectRandomDealsForRetrieval)
For each selected piece, dealbot performs the following in parallel:
Retrieval checks only query the IPNI indexer to confirm the SP is listed as a provider for the root CID. We do not poll the SP for piece status in retrieval checks. The polling interval and timeout are controlled by IPNI_VERIFICATION_POLLING_MS and IPNI_VERIFICATION_TIMEOUT_MS.
/ipfs RetrievalDealbot retrieves the content by traversing the DAG rooted at the IPFS Root CID and fetching each block from the SP IPFS gateway.
{serviceURL}/ipfs/{rootCID}?format=raw{serviceURL}/ipfs/{cid}?format=rawAccept: application/vnd.ipld.rawSource: apps/backend/src/retrieval-addons/strategies/ipfs-block.strategy.ts
For each retrieval attempt:
| # | Assertion | How It's Checked | Sub Status Affected | Retries | Relevant Metric for Setting a Max Duration | Implemented? |
|---|---|---|---|---|---|---|
| 1 | Valid |
IPNI query for root CID returns a result that includes the SP as a provider | Discoverability (discoverabilityStatus) |
unlimited polling with delay until timeout | ipniVerifyMs |
Yes |
| 2 | IPFS content is retrievable | All DAG block requests return 2xx status | Retrieval (retrievalStatus) |
0. Failure to establish a connection or getting a 5xx response marks the retrieval as failed. There is no retry. | ipfsRetrievalLastByteMs |
Yes |
| 3 | Content integrity via CID | Each fetched block is hash-verified against its CID during DAG traversal | Retrieval (retrievalStatus) |
none - if we receive non-matching bytes it's a failure | n/a (client-side) | Yes |
| 4 | All checks pass | Check is not marked successful until all assertions pass within window | All sub-statuses above feed dataStorageStatus |
n/a | retrievalCheckMs |
Yes |
Each retrieval step (post IPNI validation) creates a Retrieval entity in the database:
| Field | Description |
|---|---|
dealId |
Which deal was tested |
retrievalMethod |
Only sp_ipfs supported currently but in future could imagine sp_piece or cdn |
retrievalEndpoint |
URL used for the download |
status |
success or failed |
responseCode |
HTTP status code |
bytesRetrieved |
Actual bytes downloaded |
latencyMs |
Total download time |
ttfbMs |
Time to first byte |
throughputBps |
Download throughput in bytes per second |
errorMessage |
Error details (if failed) |
retryCount |
Number of retry attempts (0 means the first attempt succeeded) |
Source: retrieval.entity.ts
Source: retrieval-addons.service.ts, cdn.strategy.ts
Source: apps/backend/src/config/app.config.ts
Metric definitions (including Prometheus metrics) live in Dealbot Events & Metrics.
retrievalStatus counts the /ipfs transport stage only (assertions 2 and 3). The IPNI assertion (1) is counted on discoverabilityStatus. There is no composite "retrieval check" counter; overall success comes from dataStorageStatus, which is success only when all sub-statuses succeed. See Deal Status Progression.
skipped.piece_missingEmitted when a retrieval pre-flight probe to ${serviceUrl}/pdp/piece/:pieceCid/status returns 404. The deal is marked cleaned_up=true and removed from future retrieval candidate selection. This is not a failure of the SP's transport surface, but a signal that the piece no longer exists on the SP while the dataset is still live (for example, the SP scheduled a piece removal via PDP, or the piece dropped without an on-chain notification). The probe runs before IPNI verification and transport, so a 30s IPNI timeout is avoided on stale candidates. Search logs for retrieval_skipped_piece_missing to correlate.
Key environment variables that control retrieval testing:
| Variable | Description |
|---|---|
RETRIEVAL_INTERVAL_SECONDS |
Retrieval schedule interval in cron mode. |
RETRIEVALS_PER_SP_PER_HOUR |
Retrieval rate per SP in pg-boss mode. |
RETRIEVAL_JOB_TIMEOUT_SECONDS |
Max end-to-end retrieval job runtime before abort. |
CONNECT_TIMEOUT_MS |
Connection/header timeout for HTTP requests. |
HTTP_REQUEST_TIMEOUT_MS |
Total timeout for HTTP/1.1 retrieval requests. |
HTTP2_REQUEST_TIMEOUT_MS |
Total timeout for HTTP/2 retrieval requests. |
IPNI_VERIFICATION_TIMEOUT_MS |
Max time to wait for IPNI provider verification. |
IPNI_VERIFICATION_POLLING_MS |
Poll interval between IPNI verification attempts. |
IPFS_BLOCK_FETCH_CONCURRENCY |
Parallel block fetches during DAG traversal validation. |
RANDOM_PIECE_SIZES |
Eligible original content sizes for random retrieval selection. |
See also: docs/environment-variables.md for the full configuration reference.