# CSV Import

Fetch a CSV from a URL, split it into chunks, and process each chunk in a separate step.

`api.csv` lets a workflow fetch a CSV from a public URL, split it into fixed-size chunks in temporary storage, and process each chunk in a separate step. This is the standard pattern for large CSV imports that would otherwise risk execution limits.

## Methods

```js
// Phase 1: fetch and split (call once in the start step)
const meta = await api.csv.importCSV(url, options);

// Phase 2: read one chunk (call in each fan-out step)
const rows = await api.csv.readCSVChunk(key, chunkIndex);

// Phase 3: cleanup (call once after all chunks are processed)
await api.csv.deleteCSV(key);
```

### `api.csv.importCSV(url, options?)`

Fetches the CSV from `url`, parses it, and writes the rows into chunked temporary storage. Returns metadata used to schedule the fan-out steps.

| Option | Type | Default | Description |
| --- | --- | --- | --- |
| `url` | `string` | required | Public URL of the CSV file. It must be accessible without auth |
| `options.chunkSize` | `number` | `500` | Rows per chunk. Lower values mean shorter per-step processing time; higher values mean fewer scheduled steps |
| `options.delimiter` | `string` | `','` | Field separator character. Use `'\t'` for TSV files |

**Returns** `{ key, totalRows, chunkCount, headers }`:

| Field | Description |
| --- | --- |
| `key` | Unique import ID. Pass this to every `readCSVChunk` and `deleteCSV` call |
| `totalRows` | Total row count, excluding the header row |
| `chunkCount` | Number of chunks written. Schedule this many fan-out steps |
| `headers` | Column names from the first row |

### `api.csv.readCSVChunk(key, chunkIndex)`

Reads one chunk of a previously imported CSV. Returns an array of row objects (one object per row, keys are column names), or `null` if the chunk does not exist. Always null-guard the result.

### `api.csv.deleteCSV(key)`

Deletes all stored chunks for a CSV import. Call this in the final step after all chunks have been processed to release temporary storage immediately rather than waiting for the automatic 7-day expiry.

## Standard fan-out pattern

```js
export class Workflow {
  async start(data, headers, api) {
    const meta = await api.csv.importCSV('https://example.com/products.csv', { chunkSize: 500 });
    console.log(`Imported ${meta.totalRows} rows across ${meta.chunkCount} chunks`);

    for (let i = 0; i < meta.chunkCount; i++) {
      await api.scheduleNextStep({
        delay: 10 + i * 5,  // stagger chunks to avoid burst API calls
        action: 'processChunk',
        payload: { key: meta.key, chunkIndex: i, totalChunks: meta.chunkCount },
      });
    }
  }

  async processChunk({ key, chunkIndex, totalChunks }, headers, api) {
    const rows = await api.csv.readCSVChunk(key, chunkIndex);
    if (!rows) {
      console.log(`Chunk ${chunkIndex} not found; skipping`);
      return;
    }

    for (const row of rows) {
      console.log(`Row: ${JSON.stringify(row)}`);
      // process each row, for example create or update a Shopify resource
    }

    // Clean up only after all chunk steps have finished.
    const finished = await api.runStore.increment(`csv:${key}:chunksFinished`);
    if (finished === totalChunks) {
      await api.csv.deleteCSV(key);
      await api.runStore.delete(`csv:${key}:chunksFinished`);
    }
  }
}
```

Using `chunkIndex === totalChunks - 1` as the cleanup trigger is not safe in retry-heavy or out-of-order fan-out flows. Count completed chunks instead.

## Using a public Google Sheet as source

Google Sheets can serve as a CSV source without OAuth credentials if the sheet is shared as **Anyone with the link can view**. Convert the share URL to a CSV export URL:

```js
const csvUrl = SHEET_URL.replace(/\/edit.*$/, '/export?format=csv');
// For a specific tab, append: + '&gid=123456789'  (copy gid from the sheet URL bar)
const meta = await api.csv.importCSV(csvUrl, { chunkSize: 500 });
```

## Limitations

| Limitation | Detail |
| --- | --- |
| **Public URL required** | `importCSV` uses a plain `fetch(url)` with no auth. URLs that require login or redirect to an auth page will fail |
| **UTF-8 encoding only** | Other encodings produce garbled text. Google Sheets and Shopify CSV exports are UTF-8 |
| **Header row required** | The first row is always treated as column headers. Header-less CSVs are not supported |
| **Delimiter is not auto-detected** | Defaults to `,`. Pass `{ delimiter: '\t' }` explicitly for TSV files |
| **7-day chunk expiry** | Chunk files are automatically deleted from temporary storage after 7 days. Do not schedule `processChunk` steps more than 7 days after `importCSV` |
| **Waits are limited by chunk TTL** | If you pause between `importCSV` and chunk processing with `api.waitForEvent()`, resume within 7 days or the chunk data may already be gone |
| **Import blocks before processing** | The entire file is fetched and split before `importCSV` returns. There is no way to begin processing chunks while the import is still running |
| **Snapshot at import time** | `importCSV` reads the source once. Later changes to the file are not reflected in already-created chunks |
| **Not available in lifecycle hooks** | `api.csv` is not available inside `onWorkflowComplete` or `onWorkflowError` |