HTTP Pagination
NoETL provides automatic pagination support for HTTP actions, allowing you to fetch all pages of data with declarative configuration.
Overview
Many REST APIs return data in pages to limit response sizes. NoETL's pagination feature automatically:
- Makes sequential HTTP requests while data remains
- Merges results across pages
- Handles retry for failed requests
- Prevents infinite loops with safety limits
Quick Start
- step: fetch_all_data
tool: http
url: "{{ api_url }}/data"
params:
page: 1
pageSize: 100
loop:
pagination:
type: response_based
continue_while: "{{ response.data.paging.hasMore == true }}"
next_page:
params:
page: "{{ (response.data.paging.page | int) + 1 }}"
merge_strategy: append
merge_path: data.data
This configuration:
- Starts with
page=1 - Fetches data from API
- Checks if
response.data.paging.hasMoreis true (note:response.dataaccesses the API response) - If true, increments page number and repeats
- Merges all data arrays into single result (using
data.datato extract from wrapper then API structure)
Pagination Patterns
Page Number Pagination
Most common pattern using page numbers:
- step: fetch_assessments
tool: http
url: "{{ api_url }}/assessments"
params:
page: 1
pageSize: 100
loop:
pagination:
type: response_based
continue_while: "{{ response.data.paging.hasMore == true }}"
next_page:
params:
page: "{{ (response.data.paging.page | int) + 1 }}"
pageSize: "{{ response.data.paging.pageSize }}"
merge_strategy: append
merge_path: data.data
max_iterations: 50
API Response Format:
{
"data": [{"id": 1}, {"id": 2}, ...],
"paging": {
"hasMore": true,
"page": 1,
"pageSize": 100,
"total": 350
}
}
Note: The HTTP executor wraps responses as {id, status, data: <api_response>}, which is why we use:
response.data.paging.hasMore(notresponse.paging.hasMore)merge_path: data.data(notmerge_path: data)
Offset-Based Pagination
Uses offset and limit parameters:
- step: fetch_users
tool: http
url: "{{ api_url }}/users"
params:
offset: 0
limit: 100
loop:
pagination:
type: response_based
continue_while: "{{ response.data.has_more == true }}"
next_page:
params:
offset: "{{ (response.data.offset | int) + (response.data.limit | int) }}"
limit: "{{ response.data.limit }}"
merge_strategy: append
merge_path: data.users
API Response Format:
{
"users": [{"id": 1}, {"id": 2}, ...],
"has_more": true,
"offset": 0,
"limit": 100,
"total": 850
}
Cursor-Based Pagination
Uses opaque continuation tokens:
- step: fetch_events
tool: http
url: "{{ api_url }}/events"
params:
limit: 100
loop:
pagination:
type: response_based
continue_while: "{{ response.data.next_cursor is not none and response.data.next_cursor != '' }}"
next_page:
params:
cursor: "{{ response.data.next_cursor }}"
limit: "{{ response.data.limit }}"
merge_strategy: append
merge_path: data.events
API Response Format:
{
"events": [{"id": 1}, {"id": 2}, ...],
"next_cursor": "eyJpZCI6MTAwfQ==",
"limit": 100
}
Configuration Reference
loop.pagination Block
type (string, required)
Pagination type identifier. Currently supported:
response_based- Continuation based on response inspection
continue_while (expression, required)
Jinja2 expression evaluated after each request to determine if pagination should continue.
Available Variables:
{{ response }}- Full HTTP executor result with structure{id, status, data: <api_response>}{{ iteration }}- Current iteration number (0-based){{ accumulated }}- Merged results so far
Important: HTTP responses are wrapped by the executor. Access API response fields via response.data.*:
Examples:
# Boolean flag - note response.data.paging
continue_while: "{{ response.data.paging.hasMore == true }}"
# Next cursor exists - note response.data.next_cursor
continue_while: "{{ response.data.next_cursor is not none }}"
# Combined conditions
continue_while: "{{ response.data.has_more and iteration < 100 }}"
next_page (object, required)
Defines how to update the HTTP request for the next page.
Subfields:
next_page.params (object, optional)
Query parameters to update/add for next request. Access API response via response.data.*:
next_page:
params:
page: "{{ (response.data.paging.page | int) + 1 }}"
cursor: "{{ response.data.next_page_token }}"
next_page.body (object, optional)
Request body modifications for POST/PUT requests. Access API response via response.data.*:
next_page:
body:
offset: "{{ (response.data.offset | int) + (response.data.limit | int) }}"
next_page.headers (object, optional)
Header modifications (useful for cursor tokens). Access API response via response.data.*:
next_page:
headers:
X-Continuation-Token: "{{ response.data.continuation_token }}"
merge_strategy (string, required)
How to combine results across pages:
-
append- Concatenate arrays atmerge_path# Page 1: [1, 2, 3]
# Page 2: [4, 5, 6]
# Result: [1, 2, 3, 4, 5, 6] -
extend- Flatten nested arrays# Page 1: [[1, 2], [3, 4]]
# Page 2: [[5, 6], [7, 8]]
# Result: [1, 2, 3, 4, 5, 6, 7, 8] -
replace- Keep only last response# Page 1: {"data": [1, 2, 3]}
# Page 2: {"data": [4, 5, 6]}
# Result: {"data": [4, 5, 6]} -
collect- Store all responses as array# Page 1: {"data": [1, 2], "page": 1}
# Page 2: {"data": [3, 4], "page": 2}
# Result: [
# {"data": [1, 2], "page": 1},
# {"data": [3, 4], "page": 2}
# ]
merge_path (string, optional)
JSONPath to data array in response. Uses dot notation to navigate through the HTTP executor wrapper and API response structure.
Important: HTTP responses are wrapped as {id, status, data: <api_response>}, so paths must account for this:
merge_path: "data.data" # response.data['data'] - API response has "data" field
merge_path: "data.result.items" # response.data['result']['items']
merge_path: "data.users" # response.data['users']
If omitted, merges entire response including wrapper.
max_iterations (integer, optional)
Safety limit to prevent infinite loops. Default: 1000
max_iterations: 50 # Stop after 50 pages
loop.pagination.retry Block (optional)
Retry configuration applied to each HTTP request:
max_attempts (integer, optional)
Number of retry attempts per request. Default: 1 (no retry)
retry:
max_attempts: 3 # Try up to 3 times
backoff (string, optional)
Backoff strategy: fixed or exponential. Default: fixed
retry:
backoff: exponential # 1s, 2s, 4s, 8s...
initial_delay (number, optional)
Seconds to wait before first retry. Default: 1
retry:
initial_delay: 0.5 # Wait 500ms
max_delay (number, optional)
Maximum backoff delay in seconds. Default: 60
retry:
max_delay: 10 # Cap at 10 seconds
Complete Example
apiVersion: noetl.io/v1
kind: Playbook
metadata:
name: paginated_data_fetch
path: examples/pagination
workload:
api_url: https://api.example.com
api_key: "{{ secret.API_KEY }}"
workflow:
- step: start
next:
- step: fetch_all_records
- step: fetch_all_records
desc: Fetch all records with automatic pagination
tool: http
url: "{{ workload.api_url }}/v1/records"
method: GET
headers:
Authorization: "Bearer {{ workload.api_key }}"
params:
page: 1
pageSize: 100
sortBy: created_at
loop:
pagination:
type: response_based
continue_while: "{{ response.data.pagination.hasMore == true }}"
next_page:
params:
page: "{{ (response.data.pagination.page | int) + 1 }}"
pageSize: "{{ response.data.pagination.pageSize }}"
merge_strategy: append
merge_path: data.data
max_iterations: 100
retry:
max_attempts: 3
backoff: exponential
initial_delay: 1
max_delay: 30
next:
- step: save_to_database
- step: save_to_database
desc: Save all fetched records
tool: postgres
auth:
type: postgres
credential: prod_db
command: |
INSERT INTO records (id, name, value, created_at)
SELECT
(data->>'id')::int,
data->>'name',
data->>'value',
(data->>'created_at')::timestamp
FROM jsonb_array_elements('{{ fetch_all_records | tojson }}'::jsonb) AS data
ON CONFLICT (id) DO UPDATE SET
name = EXCLUDED.name,
value = EXCLUDED.value,
created_at = EXCLUDED.created_at
next:
- step: end
- step: end
desc: Workflow complete
Best Practices
1. Always Set max_iterations
Prevent infinite loops from API bugs:
pagination:
max_iterations: 100 # Reasonable limit
2. Use Retry for Production
APIs can be flaky, retry improves reliability:
pagination:
retry:
max_attempts: 3
backoff: exponential
3. Choose Appropriate Page Sizes
Balance between performance and memory:
params:
pageSize: 100 # Not too large, not too small
4. Validate Merged Results
Add validation step after pagination:
- step: validate_data
tool: python
code: |
def main(input_data):
data = input_data['fetch_all_records']
assert len(data) > 0, "No data fetched"
assert len(data) < 10000, "Suspiciously large dataset"
return {'count': len(data)}
5. Handle Empty Results
Check for data existence, accounting for HTTP wrapper:
continue_while: "{{ response.data.data is defined and response.data.data | length > 0 and response.data.hasMore }}"
HTTP Response Structure
Critical: The HTTP executor wraps all responses with metadata before passing them to pagination logic:
# HTTP Executor Output
{
"id": "uuid-task-id",
"status": "success",
"data": {
# Your actual API response here
"data": [...],
"paging": {...}
}
}
This means:
- Access API fields via
response.data.*in templates - Use
merge_path: data.datato extract arrays (firstdatais executor wrapper, second is API field) - Templates like
{{ response.data.paging.hasMore }}(not{{ response.paging.hasMore }})
Example mapping:
| API Response Field | Template Path |
|---|---|
response.data | response.data.data |
response.paging.hasMore | response.data.paging.hasMore |
response.users | response.data.users |
response.next_cursor | response.data.next_cursor |
Troubleshooting
Pagination Stops Early
Problem: Fetching fewer items than expected
Solutions:
- Check
continue_whileexpression logic - remember to useresponse.data.* - Verify API response structure matches expectations
- Verify
merge_pathcorrectly navigates through HTTP wrapper (e.g.,data.datanot justdata) - Check logs for evaluation errors
Infinite Loop
Problem: Pagination never stops
Solutions:
- Ensure
continue_whileeventually becomes false - Set reasonable
max_iterationslimit - Check if API
hasMoreflag is accurate - Verify you're accessing the correct path:
response.data.hasMorenotresponse.hasMore
Merge Errors
Problem: "Failed to extract merge_path"
Solutions:
- Remember HTTP responses are wrapped: use
data.fieldNamenot justfieldName - Verify
merge_pathmatches response structure - Use correct dot notation (
data.items, notdata/items) - Check if path exists in all responses
- Common issue: Using
merge_path: datawhen you needmerge_path: data.data
Retry Not Working
Problem: Failures not retried
Solutions:
- Ensure
retryblock is insidepaginationblock - Check
max_attempts> 1 - Verify HTTP errors are retry-able (5xx, timeouts)
Implementation Details
Async HTTP Execution
The pagination feature uses asynchronous HTTP calls for better performance:
- Each HTTP request is executed with
asynciofor non-blocking I/O - Worker threads create dedicated event loops using
asyncio.new_event_loop() - Retry delays use
await asyncio.sleep()for efficient waiting - Multiple pagination steps can run concurrently in different workers
Performance Benefits:
- Non-blocking HTTP calls during retry delays
- Better resource utilization in high-throughput scenarios
- Scalable to multiple concurrent pagination workflows
Thread Safety
The pagination executor manages event loops carefully:
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
try:
return loop.run_until_complete(async_function())
finally:
loop.close()
asyncio.set_event_loop(None)
This ensures each worker thread has its own event loop, preventing conflicts in multi-threaded environments.
See Also
- Variables Feature - Using pagination results in downstream steps
- HTTP Tool Reference - HTTP action configuration
- Iterator Feature - Loop over collections vs pagination