Skip to content

Maxar ARD Python SDK - ARD Collections

ARDCollections connect you to a storage location (cloud or local) and provides high-level abstractions of ARD tiles. The ARDCollection object scans ARD tiles in a location and can filter them down to more specific groups based on spatial or temporal filters. It also provides logical groupings of tiles by source (an Acquisition) or by location (a Stack of tiles in a grid cell). ARD Tiles also have their own high-level object to make data access simple and convenient.

This example uses Maxar sample data stored in the public S3 bucket maxar-ard-samples in the prefix v4/sample-001. If you've read the Selecting ARD tutorial already, you'll see that ARDCollections work similarly to SelectResult objects.

Passing public=True tells the SDK to skip authentication since this is a public bucket. See the end of this document for more information about authenticating with the different cloud providers.

from max_ard import ARDCollection

collection = ARDCollection('s3://maxar-ard-samples/v4/sample-001/', public=True)

An ARDCollection can tell you which IDs it has:

collection.acquisitions
[<Acquisition of 104001002124FA00 [<ARDTile at Z10-120020223023>]>,
 <Acquisition of 1040010022712A00 [<ARDTile at Z10-120020223023>, <ARDTile at Z10-120020223032>]>,
 <Acquisition of 103001005C2E5E00 [<ARDTile at Z10-120020223023>, <ARDTile at Z10-120020223032>]>,
 <Acquisition of 103001005D31F500 [<ARDTile at Z10-120020223032>]>]

An ARDCollection can also tell you which ARD grid cells it covers. A Stack is a container of all the tiles that cover a grid cell.

collection.stacks
[<Stack at Z10-120020223023 [<ARDTile of 104001002124FA00>, <ARDTile of 1040010022712A00>, <ARDTile of 103001005C2E5E00>]>,
 <Stack at Z10-120020223032 [<ARDTile of 1040010022712A00>, <ARDTile of 103001005C2E5E00>, <ARDTile of 103001005D31F500>]>]

Date properties

print(collection.dates)
print(collection.start_date)
print(collection.end_date)
['2016-09-22', '2016-09-23', '2016-09-30', '2016-10-08']
2016-09-22
2016-10-08

Zone properties

print(collection.zones)
[10]
- You can modify an ARDCollection's attributes that match the keywords in the call signature
  • This resets the collection but keeps the backing file cache
  • The resulting properties are read-only
start = '2016-01-01'
middle = '2019-10-01'
end = '2021-02-01'

c = ARDCollection('s3://maxar-analytics-data/ard_orders', earliest_date=start, latest_date=middle)
for tile in c.tiles:
    # do something with the tile


### Change the dates on the collection
### this resets the collection automatically
c.earliest_date = middle
c.latest_date = end

# c.end_date = end # this is a read-only property! It will return the end date of the actual data

for tile in c.tiles:
    # do something with the tile

If you need to reset the file cache:

c.clear_filesystem_cache()

ARD Tiles

Collections expose ARDTile objects representing the tiles. A tile is considering to be all of the data products from one acquisition within a specific grid tile.

collection.tiles
[<ARDTile of 104001002124FA00 at z10-120020223023>,
 <ARDTile of 1040010022712A00 at z10-120020223023>,
 <ARDTile of 103001005C2E5E00 at z10-120020223023>,
 <ARDTile of 1040010022712A00 at z10-120020223032>,
 <ARDTile of 103001005C2E5E00 at z10-120020223032>,
 <ARDTile of 103001005D31F500 at z10-120020223032>]

Let's look at a tile:

tile = collection.get_tile('1040010022712A00', 'z10-120020223032')
print('Tile:', tile)
print('Cell:', tile.cell)
print('Properties:')
for k,v in tile.properties.items():
    print(f'  {k}: {v}')
Tile: <ARDTile of 1040010022712A00 at z10-120020223032>
Cell: <Cell Z10-120020223032>
Properties:
  datetime: 2016-09-23 19:36:58Z
  platform: WV03
  gsd: 0.38
  ard_metadata_version: 0.0.1
  catalog_id: 1040010022712A00
  utm_zone: 10
  quadkey: 120020223032
  view:off_nadir: 28.0
  view:azimuth: 290.3
  view:incidence_angle: 59.0
  view:sun_azimuth: 170.0
  view:sun_elevation: 51.5
  proj:epsg: 32610
  proj:geometry: {'type': 'Polygon', 'coordinates': [[[549843.75, 4185156.25], [549843.75, 4179843.75], [555156.25, 4179843.75], [555156.25, 4185156.25], [549843.75, 4185156.25]]]}
  proj:bbox: [549843.75, 4179843.75, 555156.25, 4185156.25]
  tile:data_area: 28.2
  tile:clouds_area: 0.0
  tile:clouds_percent: 0

Tiles have Shapely geometry properties. For the full tile, use the geometry of a tile's cell:

from shapely.geometry import shape
shape(tile.cell)

A tile's data_mask is the geometry outlining filled pixels.

tile.data_mask

ARD V4 deliverables also includes the following vector mask assets:

  • Cloud mask
  • Cloud shadow mask
  • Terrain shadow mask
  • Water mask
  • Healthy vegetation mask
  • Saturated multispectral pixel mask
  • Pan band flare pixel mask

These assets are available through "magic" accessors which return Shapely geometries:

  • .cloud_mask
  • .cloud_shadow_mask
  • .terrain_shadow_mask
  • .water_mask
  • .healthy_vegetation_mask
  • .ms_saturation_mask
  • .pan_flare_mask
tile.water_mask

Data masks can also be accessed "inverted" by prefixing no_ to the accessor name: for example to get the geometry not covered by water, use .no_water_mask:

tile.no_water_mask

ARDtiles return Rasterio readers for raster assets:

  • visual (RGB)
  • pan_analytic
  • ms_analytic
  • cloud_mask_raster (includes clouds and cloud shadows)
  • terrain_shadow_mask_raster
  • water_mask_raster
  • healthy_vegetation_mask_raster
  • ms_saturation_mask_raster
  • pan_flare_mask_raster

Like the data masks, these are available as attributes on tile objects. We'll show a thumbnail of the visual image by reading from the COG's overviews. You can see the water matches the water_mask geometry.

See the ARD Order Deliveries section of the User Guide for more details about the asset files.

%matplotlib inline
from rasterio.plot import show


with tile.visual as src:
    # read from overviews
    arr = src.read(out_shape=(3, int(src.height / 64), int(src.width / 64)))
    show(arr)

png

ARDCollections can be subset with the following keywords:

  • aoi: almost any geometry object or string representation
  • cat_id_in: catalog ID list
  • zone_in: zone number list
  • earliest_date: start date (inclusive) as YYYY-MM-DD (for now)
  • latest_date : end date (incusive) as YYYY-MM-DD (for now)
print('Full collection:', len(collection.tiles))

aoi = 'POINT(-122.446 37.793)' # a point over the western half of San Francisco

aoi_coll  = ARDCollection('s3://maxar-ard-samples/v4/sample-001/', aoi=aoi, public=True)
print("AOI'ed collection:", len(aoi_coll.tiles))
Full collection: 6
AOI'ed collection: 3

Example: plot thumbnail from all tiles in an AOI-filtered collection

from rasterio.plot import show
for tile in aoi_coll.tiles:
    with tile.visual as src:
        # read overview
        arr = src.read(out_shape=(3, int(src.height / 64), int(src.width / 64)))
        show(arr)

png

png

png

You can run idiomatic Python like:

collection = ARDCollection('s3://maxar-ard-samples/v4/sample-002/', aoi=aoi)
filtered = (tile for tile in collection.tiles if tile.properties['platform'] = 'WVO3') #generator comprehension
for tile in filtered:
    # do something with the tile

Acquisitions and Stacks

As shown above ARDCollections can also return Acquisitions and Stacks. Both are list-like objects of ARDTile objects and have 'getter' methods. See the ARD Select tutorial for more detail about Stacks and Acquisitions.

acquisition = collection.acquisitions[0]
acquisition = collection.get_acquisition('1040010022712A00')
print(acquisition)
print(acquisition.properties)
<Acquisition at 1040010022712A00 (2 tiles)>
{'catalog_id': '1040010022712A00', 'platform': 'WV03', 'datetime': '2016-09-23 19:36:58Z'}
acquisition.get_tile_from_cell('z10-120020223023')
<ARDTile of 1040010022712A00 at z10-120020223023>

An acquisition can also be read by Rasterio. Acquisition.open_acquisition() returns an asset that can treat the tiles of an acquisition as one large raster. The reader will attempt to color balance the source images to appear seamless, using the tile that has the largest overlap with the AOI as the reference image.

bounds = (-122.44897283332135, 37.79138422369229, -122.44808031104897, 37.7920925019562)

# Convert the bounding box to a geometry object
from shapely.geometry import box
aoi = box(*bounds)

# Read the "visual" asset pixels from the acquisition without having to access individual tiles
reader = acquisition.open_acquisition()
data = reader.read(aoi, 'visual')
show(data)

png

<AxesSubplot:>

Stacks are a collection of ARD tiles from different acquisitions that cover the same ARD grid cell.

stack = collection.stacks[0]
stack = collection.get_stack('z10-120020223023')
print(stack)
for tile in stack:
    print(tile.date, tile.acq_id)
<Stack at Z10-120020223023 (3 tiles)>
2016-09-22 104001002124FA00
2016-09-23 1040010022712A00
2016-09-30 103001005C2E5E00

Local files

You can copy a collection's tiles locally. This will only copy the tiles that match any provided filters. The ARD folder structure is preserved:

from max_ard.ard_collection import copy

copy(aoi_coll, './my_tiles')
# ./my_tiles/ard-demo/38/033133232332/2014-05-30/103001003151DE00-visual.tif

A local ARDCollection of structured data can be accessed in the same way:

c = ARDCollection('./my_tiles')

Tiles from the same acquisition have the same file name. If local copies are needed in a flat structure, the copy tool can rename them to have flat filenames instead being arranged in folder structures.

copy(aoi_coll, './my_flat_tiles', flat=True)
# ./my_tiles/ard-demo/38-033133232332-2014-05-30-103001003151DE00-visual.tif

Note: flat files can not be accessed by an ARDCollection at this time.

c = ARDCollection('./my_flat_tiles') 
# NOPE

Authentication for cloud providers

There are multiple ways to supply credentials to the SDK to read ARD tiles. Credentials are needed to accomplish two things: interacting with the files using fsspec and reading raster or vector files using a GDAL virtual file system.

Amazon S3

S3 file systems are read using https://github.com/fsspec/s3fs which builds on top of botocore. Raster and vector assets are read using GDAL's /vsis3/ virtual protocol. These two systems largely overlap in how they access S3 credentials so the methods are interchangeable. They also extend to the AWS CLI tool, so if your method of storing credentials works with aws s3 ls <path> then they are also likely to work with the ARD SDK.

As Maxar is a substantial AWS S3 user, we recommend either using a credentials file or setting the AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID environment variables.

For public S3 buckets, passing public=True when creating an ARDCollection will skip authentication.

Azure Blob Storage

The Azure package for fsspec is not installed by default. You will need to install adlfs separately.

For Azure, while there are several ways to provide credentials, we have found the simplest is to set the AZURE_STORAGE_CONNECTION_STRING environment variable. You can create connections strings from the Container Blade settings menu.

For other options, see adlfs and /vsiaz/.

Public Azure containers require an additional environment variable to be set in addition to passing public=True. The name the account owning the container needs to be set in the variable AZURE_STORAGE_ACCOUNT_NAME.

Google Cloud Storage

The GCS package for fsspec is not installed by default. You will need to install gcfsf separately.

See /vsigs/ for additional access information.

Back to top