Pagination Patterns

NoETL supports multiple HTTP pagination patterns for efficiently fetching large datasets from APIs.

Working Examples

Complete, tested pagination playbooks are available in the repository:

tests/fixtures/playbooks/pagination/

Overview

When fetching paginated data, use the loop.pagination block on HTTP steps:

- step: fetch_all_data
  tool: http
  method: GET
  endpoint: "https://api.example.com/data"
  params:
    page: 1
  loop:
    pagination:
      type: response_based
      continue_while: "{{ response.data.hasMore }}"
      next_page:
        params:
          page: "{{ (response.data.page | int) + 1 }}"
      merge_strategy: append
      merge_path: data.items
      max_iterations: 100

Pagination Types

Page-Number Pagination

The most common pattern using page and pageSize parameters:

- step: fetch_assessments
  tool: http
  method: GET
  endpoint: "https://api.example.com/assessments"
  params:
    page: 1
    pageSize: 10
  loop:
    pagination:
      type: response_based
      continue_while: "{{ response.data.paging.hasMore }}"
      next_page:
        params:
          page: "{{ (response.data.paging.page | int) + 1 }}"
      merge_strategy: append
      merge_path: data.data
      max_iterations: 50

Use Case: Most REST APIs, user-facing pagination

API Response Example:

{
  "data": [
    {"id": 1, "name": "Item 1"},
    {"id": 2, "name": "Item 2"}
  ],
  "paging": {
    "page": 1,
    "pageSize": 10,
    "hasMore": true,
    "total": 35
  }
}

Offset-Based Pagination

SQL-style offset and limit parameters:

- step: fetch_users
  tool: http
  method: GET
  endpoint: "https://api.example.com/users"
  params:
    offset: 0
    limit: 10
  loop:
    pagination:
      type: response_based
      continue_while: "{{ response.data.has_more }}"
      next_page:
        params:
          offset: "{{ (response.data.offset | int) + (response.data.limit | int) }}"
      merge_strategy: append
      merge_path: data.data
      max_iterations: 100

Use Case: SQL-backed APIs, direct database pagination

API Response Example:

{
  "data": [...],
  "offset": 0,
  "limit": 10,
  "total": 100,
  "has_more": true
}

Cursor-Based Pagination

Opaque cursor tokens for stateless navigation:

- step: fetch_events
  tool: http
  method: GET
  endpoint: "https://api.example.com/events"
  params:
    cursor: ""
    limit: 10
  loop:
    pagination:
      type: response_based
      continue_while: "{{ response.data.next_cursor is defined and response.data.next_cursor }}"
      next_page:
        params:
          cursor: "{{ response.data.next_cursor }}"
      merge_strategy: append
      merge_path: data.events
      max_iterations: 100

Use Case: GraphQL APIs, cloud services (AWS, GCP), large datasets

API Response Example:

{
  "events": [...],
  "next_cursor": "eyJpZCI6MTAwfQ==",
  "has_more": true
}

Configuration Options

Pagination Parameters

Parameter	Type	Required	Description
`type`	string	Yes	Pagination type: `response_based`
`continue_while`	string	Yes	Jinja2 condition to continue
`next_page`	object	Yes	Parameters for next request
`merge_strategy`	string	Yes	How to merge results: `append`, `replace`
`merge_path`	string	Yes	JSON path to extract items from response
`max_iterations`	int	No	Safety limit (default: 100)

Merge Strategies

append: Accumulate items from all pages

merge_strategy: append
merge_path: data.items
# Result: [page1_items..., page2_items..., page3_items...]

replace: Only keep last page (useful for summary)

merge_strategy: replace
merge_path: data
# Result: {last_page_data}

Advanced Patterns

Pagination with Retry

Handle transient failures during pagination:

- step: fetch_with_retry
  tool: http
  method: GET
  endpoint: "https://api.example.com/flaky-endpoint"
  params:
    page: 1
  retry:
    max_attempts: 3
    initial_delay: 1.0
    backoff_multiplier: 2.0
    retryable_status_codes: [429, 500, 502, 503, 504]
  loop:
    pagination:
      type: response_based
      continue_while: "{{ response.data.hasMore }}"
      next_page:
        params:
          page: "{{ (response.data.page | int) + 1 }}"
      merge_strategy: append
      merge_path: data.items

Safety Limits

Prevent infinite loops with max_iterations:

- step: limited_fetch
  tool: http
  method: GET
  endpoint: "https://api.example.com/large-dataset"
  params:
    page: 1
  loop:
    pagination:
      type: response_based
      continue_while: "{{ response.data.hasMore }}"
      next_page:
        params:
          page: "{{ (response.data.page | int) + 1 }}"
      merge_strategy: append
      merge_path: data.items
      max_iterations: 5  # Only fetch first 5 pages

Use Case: Data sampling, cost control, time-constrained jobs

Loop with Pagination

Process multiple endpoints, each with pagination:

workflow:
  - step: start
    next:
      - step: fetch_endpoint_data

  - step: fetch_endpoint_data
    tool:
      kind: http
      method: GET
      url: "{{ current_endpoint.url }}"
      params:
        page: 1
    loop:
      in: "{{ workload.api_endpoints }}"
      iterator: current_endpoint
      mode: sequential
      pagination:
        type: response_based
        continue_while: "{{ response.data.hasMore }}"
        next_page:
          params:
            page: "{{ (response.data.page | int) + 1 }}"
        merge_strategy: append
        merge_path: data.items
    vars:
      fetched_items: "{{ result.data }}"
    next:
      - step: process_data

  - step: process_data
    tool:
      kind: python
      libs: {}
      args:
        items: "{{ vars.fetched_items }}"
        endpoint_name: "{{ current_endpoint.name }}"
      code: |
        result = {
            "endpoint": endpoint_name,
            "item_count": len(items),
            "status": "complete"
        }
    next:
      - step: end

  - step: end

Complete Example

Fetch all users and save to PostgreSQL:

apiVersion: noetl.io/v2
kind: Playbook
metadata:
  name: paginated_data_sync
  path: examples/pagination/sync_users

workload:
  api_base_url: "https://api.example.com"

workflow:
  - step: start
    next:
      - step: fetch_all_users

  - step: fetch_all_users
    tool: http
    method: GET
    endpoint: "{{ workload.api_base_url }}/users"
    params:
      page: 1
      per_page: 100
    loop:
      pagination:
        type: response_based
        continue_while: "{{ response.data.meta.has_more }}"
        next_page:
          params:
            page: "{{ (response.data.meta.page | int) + 1 }}"
        merge_strategy: append
        merge_path: data.users
        max_iterations: 100
    vars:
      all_users: "{{ result.data }}"
    next:
      - step: save_to_database

  - step: save_to_database
    tool: postgres
    auth:
      type: postgres
      credential: app_database
    query: |
      INSERT INTO users (id, email, name, created_at)
      SELECT 
        (u->>'id')::int,
        u->>'email',
        u->>'name',
        (u->>'created_at')::timestamp
      FROM jsonb_array_elements('{{ vars.all_users | tojson }}'::jsonb) u
      ON CONFLICT (id) DO UPDATE SET
        email = EXCLUDED.email,
        name = EXCLUDED.name
    vars:
      rows_affected: "{{ result.data.command_1[0].count }}"
    next:
      - step: log_complete

  - step: log_complete
    tool: python
    code: |
      def main(user_count, rows_affected):
          return {
              "status": "complete",
              "users_fetched": user_count,
              "rows_affected": rows_affected
          }
    args:
      user_count: "{{ vars.all_users | length }}"
      rows_affected: "{{ vars.rows_affected }}"
    next:
      - step: end

  - step: end

Response Format

After pagination completes, the step result contains merged data:

{
  "id": "task-uuid",
  "status": "success",
  "data": [
    // All items from all pages merged together
    {"id": 1, "name": "User 1"},
    {"id": 2, "name": "User 2"},
    // ... up to max_iterations * page_size items
  ]
}

Best Practices

Always set max_iterations: Prevent runaway pagination
Use appropriate page sizes: Balance between requests and payload size
Handle rate limits: Add retry configuration for 429 responses
Log progress: Use Python steps to track pagination progress
Test with small limits first: Validate logic before full fetch

Overview​

Pagination Types​

Page-Number Pagination​

Offset-Based Pagination​

Cursor-Based Pagination​

Configuration Options​

Pagination Parameters​

Merge Strategies​

Advanced Patterns​

Pagination with Retry​

Safety Limits​

Loop with Pagination​

Complete Example​

Response Format​

Best Practices​

See Also​

Overview

Pagination Types

Page-Number Pagination

Offset-Based Pagination

Cursor-Based Pagination

Configuration Options

Pagination Parameters

Merge Strategies

Advanced Patterns

Pagination with Retry

Safety Limits

Loop with Pagination

Complete Example

Response Format

Best Practices

See Also