Filecoin logo

Filecoin

Data Storage Check

Dealbot Methodology

This document is the source of truth for how dealbot's Data Storage check works.

Source code links throughout this document point to the current implementation.

For event and metric definitions used by the dashboard, see Dealbot Events & Metrics.

Rendered from the dealbot documentation · cached for 24 hours

Overview

A "data storage check" is dealbot's end-to-end test of uploading a piece to a storage provider (SP) and confirming the uploaded data is publicly discoverable and retrievable. ("Deal" is a synonym for "data storage check".)

Every data storage check, dealbot:

  1. Generates a random data file
  2. Converts it to CAR format
  3. Uploads the CAR to a testable SP as a new piece in one of the dealbot-managed datasets.
  4. Waits for:
  5. Onchain confirmation - the SP sends a message adding the piece to the dataset and dealbot confirms it onchain
  6. IPNI discoverability - the SP indexes the CAR announces the index to IPNI and dealbot confirms that IPNI has the index.
  7. Runs retrieval checks as defined in Retrieval Check.

A successful operation requires all assertions in the table below to pass.

Failure occurs if any step fails or the deal exceeds its max allowed time. There are no timing-based quality assertions. Operational timeouts exist to prevent jobs from running indefinitely, but they are not quality assertions.

What Gets Asserted

Each deal asserts the following for every SP:

# Assertion How It's Checked Sub Status Affected Retries Relevant Metric for Setting a Max Duration Implemented?
1 SP accepts piece upload Upload completes without error (HTTP 200); piece CID is returned Upload 1 ingestMs Yes
2 Piece submission recorded on-chain Synapse piecesAdded progress event fires with a transaction hash Onchain n/a pieceAddedOnChainMs Yes
3 Piece is confirmed on-chain Synapse piecesConfirmed progress event fires Onchain n/a pieceConfirmedOnChainMs Yes
4 SP indexes piece locally PDP server reports indexed: true Discoverability n/a spIndexLocallyMs Yes
5 Content is discoverable on filecoinpin.contact IPNI index returns a provider record on filecoinpin.contact. Drives the Discoverability sub-status. Discoverability (indexer=filecoinpin.contact) Polling with delay until timeout ipniVerifyMs Yes
5b Content is discoverable on cid.contact (observational cross-check) IPNI index returns a provider record on cid.contact. Only attempted when step 5 succeeds and does not affect deal success/failure. cid.contact Verification Polling with delay until timeout ipniVerifyMs Yes
6 Content is retrievable See Retrieval Check for specific assertions Retrieval 0 ipfsRetrievalLastByteMs Yes
7 All checks pass Deal is not marked successful until all assertions pass within window All four n/a dataStorageCheckMs Yes

Deal Lifecycle

The dealbot scheduler triggers data storage check jobs at a configurable rate.

flowchart TD
  CreateCar --> SelectDataSet["Select a dataset for data storage check"]
  SelectDataSet --> Upload["Upload CAR as piece to SP"]
  Upload --> Chain["Wait for on-chain piece creation confirmation"]
  Upload --> LocalIndex["Wait for SP local indexing"]
  LocalIndex --> IpniAnnouncement["Wait for SP to announce local index to IPNI"]
  IpniAnnouncement --> IpniVerification["IPNI verification"]
  LocalIndex --> IpfsRetrieval["SP /ipfs Retrieval Check"]
  Chain --> CheckResults["Mark data storage check successful if all steps pass"]
  IpniVerification --> CheckResults
  IpfsRetrieval --> CheckResults

1. Generate Random Data

Dealbot generates a random binary file with a unique name and embedded markers (prefix/suffix with timestamp and unique ID).

  • File format: random-{timestamp}-{uniqueId}.bin
  • Possible sizes: Configurable via RANDOM_PIECE_SIZES (default: 10 MiB)

Source: dataSource.service.ts

2. Convert to CAR Format

The raw data is converted to a CAR (Content Addressable Archive) file (via filecoin-pin integration). See https://github.com/filecoin-project/filecoin-pin/blob/master/documentation/behind-the-scenes-of-adding-a-file.md#create-car for more info.

Source: ipni.strategy.ts (convertToCar)

3. Upload to the SP

  1. Select a previously created dataset for this data storage check.
  2. Uploads the CAR file to the SP (adding a piece to the selected dataset). Callbacks track progress:
  3. stored — SP confirms receipt (HTTP 2xx). Records the piece CID.

Source: deal.service.ts (createDeal)

4. Wait for Onchain Confirmation

After upload completes, dealbot waits for the piece to be confirmed onchain via Synapse executeUpload(...).onProgress events: - piecesAdded — piece submission is recorded as reported by the SP on-chain (transaction hash available). - piecesConfirmed — confirm the piece is onchain by querying the chain RPC endpoint. filecoin-pin and synapse-sdk are doing this work under the hood.

5. Wait for SP to Index and Announce Index to IPNI

After upload completes, dealbot polls the SP's PDP server to track the piece through its indexing lifecycle: - sp_indexed: SP has indexed the piece locally. Any CID in the CAR is now retrievable with /ipfs/$CID retrieval, but it may not be discoverable by the rest of the network. Direct SP retrieval checking can commence. - sp_advertised: SP has announced the piece index to IPNI. (In IPNI terminology this is "advertisement announcement" (see docs)). IPNI indexing verification can commence. - Poll interval: 2.5 seconds (hardcoded POLLING_INTERVAL_MS in ipni.strategy.ts)

When the SP returns indexedAt or advertisedAt, dealbot uses those provider-side timestamps for spIndexLocallyMs, spAnnounceAdvertisementMs, and data-storage ipniVerifyMs. If those fields are absent or unusable, dealbot falls back to the time it observed the status while polling.

Source: ipni.strategy.ts (monitorPieceStatus)

6. Verify IPNI indexing

After the SP announces the piece index to IPNI, dealbot ensures the uploaded piece can be discovered by others with standard IPFS tooling. It does this in two sequential stages using the waitForIpniProviderResults function from the filecoin-pin library, passing an explicit ipniIndexerUrl for each call:

  1. filecoinpin.contact check: Polls filecoinpin.contact for a valid provider record. This result drives the Discoverability sub-status. If the CID is not confirmed here, the cid.contact check is skipped.
  2. cid.contact check: Only attempted when the filecoinpin.contact check succeeds. Polls cid.contact for the same provider record. The outcome is recorded in cidContactVerification but does not affect the Discoverability sub-status.
  3. Note: this sequential cid.contact check is intentional due to the negative caching of cid.contact. See Why do we rely on filecoinpin.contact rather than cid.contact? for more details.

Additional notes: - Polling interval: 2 seconds (configurable via IPNI_VERIFICATION_POLLING_MS) - ipniVerifyMs indexer=filecoinpin.contact observation is measured from the SP's advertisedAt timestamp to the end of filecoinpin.contact verification when the SP provides a sane timestamp. This attributes the full "announced to visible on filecoinpin.contact" window instead of only dealbot's local polling window. - ipniVerifyMs indexer=cid.contact observation is measured from filecoinpin.contact verification completion until verification completion on cid.contact. This captures the incremental propagation gap between the two indexers.

Source: ipni.strategy.ts (monitorAndVerifyIPNI)

7. Retrieve and Verify Content

See Retrieval Check for the specifics of retrieving and verifying the returned bytes match the CID.

Deal Status Progression

A deal's overall status is a function of four sub-statuses: Upload, Onchain, Discoverability, and Retrieval. The deal succeeds only if all four report success; if any one fails, the overall deal is a failure. The flow is sequential at the start, then branches:

  1. Upload must succeed first.
  2. After upload succeeds, Onchain and Discoverability run in parallel (two branches).
  3. Retrieval runs as soon as Discoverability progresses past sp_indexed.
  4. cid.contact Verification runs as soon as Discoverability completes. This does not affect the overall deal status, but is recorded in the metrics for cid.contact visibility.
flowchart TD
  U["Upload Status"]
  O["Onchain Status"]
  D["Discoverability Status"]
  CV["cid.contact Verification"]
  R["Retrieval Status"]
  OK["Data Storage Check success"]
  FAIL["Data Storage Check failure"]

  U -->|failure| FAIL
  U -->|success| O
  U -->|success| D
  D -->|sp_indexed| R

  O -->|failure| FAIL
  D -->|failure| FAIL
  R -->|failure| FAIL

  O -->|success| OK
  D -->|success| OK
  D -->|success| CV
  R -->|success| OK

It's expected that a Data Storage check will still store an overall status for easy querying:

Overall Status Meaning
pending Upload Status = pending (i.e., piece upload to the SP hasn't started.)
inProgress Data Storage check is running.
success All sub-statuses are success.
failure.timedout Any sub-status is failure.timedout.
failure.other Any sub-status is failure.other.

Sub-status meanings

Upload Status Meaning
pending Piece upload to the SP hasn't started.
success SP confirmed receipt of the piece.
failure.timedout Failed to upload within the allotted time.
failure.other Failed to upload for other reasons.
Onchain Status Meaning
pending Onchain verification hasn't started yet because waiting for successful upload.
success Piece confirmed on-chain (transaction hash recorded).
failure.timedout Failed to confirm piece onchain within the allotted time.
failure.other Failed to confirm piece onchain for other reasons.
Discoverability Status Meaning
pending Discoverability verification hasn't started yet because waiting for successful upload.
sp_indexed SP indexed the piece locally
sp_announced_advertisement SP announced the local index to IPNI so IPNI can pull it from the SP.
success Root CID is discoverable via IPNI and the SP is listed as a provider in the IPNI response.
skipped IPNI verification was not attempted because rootCID/blockCIDs are absent from deal metadata or rootCID cannot be parsed as a valid CID.
failure.timedout Dealbot failed to confirm provider record within the allotted time
failure.other Dealbot failed to confirm provider record for other reasons.
Retrieval Status Meaning
pending Retrieval checking hasn't started yet because Discoverability verification hasn't progressed past sp_indexed.
success Piece was retrieved and verified with standard IPFS tooling.
failure.timedout Piece wasn't retrieved and verified within the allotted time.
failure.other Piece wasn't retrieved and verified for other reasons.
cid.contact Verification Status Meaning
success Root CID is discoverable via cid.contact and the SP is listed as a provider in the cid.contact response.
skipped cid.contact verification was not attempted because Discoverability Status is skipped or failure.*.
failure.timedout Dealbot started but failed to verify provider record within the allotted time.
failure.other Dealbot started but failed to confirm provider record for other reasons.

Sources: - types.ts (DealStatus) - types.ts (IpniStatus)

Metrics Recorded

Metric definitions live in Dealbot Events & Metrics.

Configuration

Key environment variables that control deal creation behavior:

Variable Description
RANDOM_PIECE_SIZES Possible random file sizes in bytes for data-storage checks. See docs/environment-variables.md#random_piece_sizes for defaults and examples.

Source: apps/backend/src/config/app.config.ts

See also: docs/environment-variables.md for the source-of-truth configuration reference.

FAQ

Why do we rely on filecoinpin.contact rather than cid.contact?

See https://github.com/filecoin-project/filecoin-pin/blob/master/documentation/content-routing-faq.md#why-is-there-filecoinpincontact-and-cidcontact

Implementation History

The items below were previously TBD and are now implemented. Tracking issue: https://github.com/FilOzone/dealbot/issues/280.

Item Status
Inline retrieval verification Done — deal.service.ts runs testAllRetrievalMethods inline; deal throws on failure.
CID-based content verification Done — ipfs-block.strategy.ts traverses the DAG and uses createBlock({ bytes, cid, hasher: sha256 }) which throws on hash mismatch (per-block CID integrity).
Per-deal max time limit Done — DEAL_JOB_TIMEOUT_SECONDS triggers an AbortController in jobs.service.ts; on abort the deal is set to DealStatus.FAILED with failure-status metrics emitted.
Deal gated on all checks Done — deal only reaches DealStatus.DEAL_CREATED after upload, onchain, IPNI, and retrieval all succeed.
Status model update Done — DealStatus includes PIECE_CONFIRMED, DEAL_CREATED, FAILED; IpniStatus includes SP_INDEXED, SP_ADVERTISED, VERIFIED, FAILED; RetrievalStatus enum exists.
piecesConfirmed progress event tracking Done — piecesConfirmedTime recorded, pieceConfirmedOnChainMs histogram emitted, DealStatus.PIECE_CONFIRMED state exists.
IPFS gateway retrieval verification Done — inline retrieval runs after sp_indexed.
filecoin-pin CAR conversion Done — car-utils.ts uses createCarFromPath from filecoin-pin/core/unixfs; deal.service.ts imports executeUpload from filecoin-pin.
Back to Onchain Cloud