Skip to content

Iceberg API Reference

The Iceberg backend implements the VersioningBackend protocol. Install with pip install portolan-cli[iceberg].

Loading the Backend

from portolan_cli.backends import get_backend

backend = get_backend("iceberg")

IcebergBackend

IcebergBackend

Enterprise versioning backend using Apache Iceberg.

Implements the VersioningBackend protocol from portolan-cli. Discovered via get_backend("iceberg") when the [iceberg] extra is installed.

Data is stored natively in Iceberg tables (copy-on-write). Version metadata is stored in snapshot summary properties.

get_current_version(collection)

Get the current (latest) version of a collection.

list_versions(collection)

List all versions of a collection, oldest first.

publish(collection, assets, schema, breaking, message, removed=None, version=None)

Publish a new version of a collection.

Reads actual Parquet data from asset files and writes it into the Iceberg table. Version metadata is stored in snapshot properties.

rollback(collection, target_version)

Rollback to a previous version.

Uses Iceberg's native snapshot management to set the current snapshot pointer back to the target version. No data is copied — this is instant.

prune(collection, keep, dry_run)

Remove old versions, keeping the N most recent.

get_stac_metadata(collection)

Generate combined STAC metadata for a collection.

Returns a dict with table: (Layer 1) and iceberg: (Layer 2) fields. NOT part of the VersioningBackend protocol — Iceberg-specific extension.

check_drift(collection)

Check for drift between local and remote state.

on_post_add(context)

Post-add hook: update STAC extensions and upload metadata to remote.

Called by portolan-cli's finalize_datasets() after versioning completes. Receives batch context with all items in the collection.

pull(remote_url, local_root, collection, *, dry_run=False)

Pull files from remote using Iceberg version info.

Queries get_current_version() for asset info, then downloads each asset from {remote_url}/{href} to {local_root}/{href}.

supports_push()

Iceberg backend does not support push — add already uploads.

push_blocked_message(remote)

Return human-readable message explaining why push is blocked.

Methods

get_current_version

Get the current (latest) version of a collection.

version = backend.get_current_version("demographics")
print(version.version)   # "2.1.0"
print(version.breaking)  # False
print(version.message)   # "Updated population data"

Raises: FileNotFoundError if the collection has no versions.


list_versions

List all versions of a collection, ordered oldest to newest.

versions = backend.list_versions("demographics")
for v in versions:
    print(f"{v.version} ({v.created}): {v.message}")

Returns: list[Version] — empty list if collection doesn't exist.


publish

Publish a new version. Creates the Iceberg table on first publish.

version = backend.publish(
    collection="demographics",
    assets={"data.parquet": "/path/to/data.parquet"},
    schema={"columns": ["id", "geom"], "types": {"id": "int64"}, "hash": "abc123"},
    breaking=False,
    message="Updated population estimates",
)
print(version.version)  # "1.1.0" (minor bump)

Versioning rules:

Scenario Version
First version 1.0.0
Non-breaking change Minor bump (1.0.0 -> 1.1.0)
Breaking change Major bump (1.2.3 -> 2.0.0)

rollback

Roll back to a previous version. Uses Iceberg's native snapshot management — instant, no data copy.

rolled = backend.rollback("demographics", "1.0.0")
print(rolled.version)  # "1.0.0"

Raises: ValueError if the target version doesn't exist.


prune

Remove old versions, keeping the N most recent.

# Preview
prunable = backend.prune("demographics", keep=5, dry_run=True)
print(f"Would prune {len(prunable)} versions")

# Execute
pruned = backend.prune("demographics", keep=5, dry_run=False)

check_drift

Check for drift between local and remote state. Currently a stub.

report = backend.check_drift("demographics")
print(report["has_drift"])  # False