Iceberg API Reference¶
The Iceberg backend implements the VersioningBackend protocol. Install with pip install portolan-cli[iceberg].
Loading the Backend¶
from portolan_cli.backends import get_backend
backend = get_backend("iceberg")
IcebergBackend¶
IcebergBackend
¶
Enterprise versioning backend using Apache Iceberg.
Implements the VersioningBackend protocol from portolan-cli. Discovered via get_backend("iceberg") when the [iceberg] extra is installed.
Data is stored natively in Iceberg tables (copy-on-write). Version metadata is stored in snapshot summary properties.
get_current_version(collection)
¶
Get the current (latest) version of a collection.
list_versions(collection)
¶
List all versions of a collection, oldest first.
publish(collection, assets, schema, breaking, message, removed=None, version=None)
¶
Publish a new version of a collection.
Reads actual Parquet data from asset files and writes it into the Iceberg table. Version metadata is stored in snapshot properties.
rollback(collection, target_version)
¶
Rollback to a previous version.
Uses Iceberg's native snapshot management to set the current snapshot pointer back to the target version. No data is copied — this is instant.
prune(collection, keep, dry_run)
¶
Remove old versions, keeping the N most recent.
get_stac_metadata(collection)
¶
Generate combined STAC metadata for a collection.
Returns a dict with table: (Layer 1) and iceberg: (Layer 2) fields. NOT part of the VersioningBackend protocol — Iceberg-specific extension.
check_drift(collection)
¶
Check for drift between local and remote state.
on_post_add(context)
¶
Post-add hook: update STAC extensions and upload metadata to remote.
Called by portolan-cli's finalize_datasets() after versioning completes. Receives batch context with all items in the collection.
pull(remote_url, local_root, collection, *, dry_run=False)
¶
Pull files from remote using Iceberg version info.
Queries get_current_version() for asset info, then downloads each asset from {remote_url}/{href} to {local_root}/{href}.
supports_push()
¶
Iceberg backend does not support push — add already uploads.
push_blocked_message(remote)
¶
Return human-readable message explaining why push is blocked.
Methods¶
get_current_version¶
Get the current (latest) version of a collection.
version = backend.get_current_version("demographics")
print(version.version) # "2.1.0"
print(version.breaking) # False
print(version.message) # "Updated population data"
Raises: FileNotFoundError if the collection has no versions.
list_versions¶
List all versions of a collection, ordered oldest to newest.
versions = backend.list_versions("demographics")
for v in versions:
print(f"{v.version} ({v.created}): {v.message}")
Returns: list[Version] — empty list if collection doesn't exist.
publish¶
Publish a new version. Creates the Iceberg table on first publish.
version = backend.publish(
collection="demographics",
assets={"data.parquet": "/path/to/data.parquet"},
schema={"columns": ["id", "geom"], "types": {"id": "int64"}, "hash": "abc123"},
breaking=False,
message="Updated population estimates",
)
print(version.version) # "1.1.0" (minor bump)
Versioning rules:
| Scenario | Version |
|---|---|
| First version | 1.0.0 |
| Non-breaking change | Minor bump (1.0.0 -> 1.1.0) |
| Breaking change | Major bump (1.2.3 -> 2.0.0) |
rollback¶
Roll back to a previous version. Uses Iceberg's native snapshot management — instant, no data copy.
rolled = backend.rollback("demographics", "1.0.0")
print(rolled.version) # "1.0.0"
Raises: ValueError if the target version doesn't exist.
prune¶
Remove old versions, keeping the N most recent.
# Preview
prunable = backend.prune("demographics", keep=5, dry_run=True)
print(f"Would prune {len(prunable)} versions")
# Execute
pruned = backend.prune("demographics", keep=5, dry_run=False)
check_drift¶
Check for drift between local and remote state. Currently a stub.
report = backend.check_drift("demographics")
print(report["has_drift"]) # False