Maxar ARD Python SDK - ARD Collections¶
ARDCollection
s connect you to a storage location (cloud or local) and provides high-level abstractions of ARD tiles. The ARDCollection
object scans ARD tiles in a location and can filter them down to more specific groups based on spatial or temporal filters. It also provides logical groupings of tiles by source (an Acquisition) or by location (a Stack of tiles in a grid cell). ARD Tiles also have their own high-level object to make data access simple and convenient.
This example uses Maxar sample data stored in the public S3 bucket maxar-ard-samples
in the prefix v4/sample-001
. If you've read the Selecting ARD tutorial already, you'll see that ARDCollection
s work similarly to SelectResult
objects.
Passing public=True
tells the SDK to skip authentication since this is a public bucket. See the end of this document for more information about authenticating with the different cloud providers.
from max_ard import ARDCollection
collection = ARDCollection('s3://maxar-ard-samples/v4/sample-001/', public=True)
An ARDCollection
can tell you which IDs it has:
collection.acquisitions
[<Acquisition of 104001002124FA00 [<ARDTile at Z10-120020223023>]>,
<Acquisition of 1040010022712A00 [<ARDTile at Z10-120020223023>, <ARDTile at Z10-120020223032>]>,
<Acquisition of 103001005C2E5E00 [<ARDTile at Z10-120020223023>, <ARDTile at Z10-120020223032>]>,
<Acquisition of 103001005D31F500 [<ARDTile at Z10-120020223032>]>]
An ARDCollection can also tell you which ARD grid cells it covers. A Stack
is a container of all the tiles that cover a grid cell.
collection.stacks
[<Stack at Z10-120020223023 [<ARDTile of 104001002124FA00>, <ARDTile of 1040010022712A00>, <ARDTile of 103001005C2E5E00>]>,
<Stack at Z10-120020223032 [<ARDTile of 1040010022712A00>, <ARDTile of 103001005C2E5E00>, <ARDTile of 103001005D31F500>]>]
Date properties
print(collection.dates)
print(collection.start_date)
print(collection.end_date)
['2016-09-22', '2016-09-23', '2016-09-30', '2016-10-08']
2016-09-22
2016-10-08
Zone properties
print(collection.zones)
[10]
- You can modify an ARDCollection's attributes that match the keywords in the call signature¶
- This resets the collection but keeps the backing file cache
- The resulting properties are read-only
start = '2016-01-01'
middle = '2019-10-01'
end = '2021-02-01'
c = ARDCollection('s3://maxar-analytics-data/ard_orders', earliest_date=start, latest_date=middle)
for tile in c.tiles:
# do something with the tile
### Change the dates on the collection
### this resets the collection automatically
c.earliest_date = middle
c.latest_date = end
# c.end_date = end # this is a read-only property! It will return the end date of the actual data
for tile in c.tiles:
# do something with the tile
If you need to reset the file cache:
c.clear_filesystem_cache()
ARD Tiles¶
Collections expose ARDTile
objects representing the tiles. A tile is considering to be all of the data products from one acquisition within a specific grid tile.
collection.tiles
[<ARDTile of 104001002124FA00 at z10-120020223023>,
<ARDTile of 1040010022712A00 at z10-120020223023>,
<ARDTile of 103001005C2E5E00 at z10-120020223023>,
<ARDTile of 1040010022712A00 at z10-120020223032>,
<ARDTile of 103001005C2E5E00 at z10-120020223032>,
<ARDTile of 103001005D31F500 at z10-120020223032>]
Let's look at a tile:
tile = collection.get_tile('1040010022712A00', 'z10-120020223032')
print('Tile:', tile)
print('Cell:', tile.cell)
print('Properties:')
for k,v in tile.properties.items():
print(f' {k}: {v}')
Tile: <ARDTile of 1040010022712A00 at z10-120020223032>
Cell: <Cell Z10-120020223032>
Properties:
datetime: 2016-09-23 19:36:58Z
platform: WV03
gsd: 0.38
ard_metadata_version: 0.0.1
catalog_id: 1040010022712A00
utm_zone: 10
quadkey: 120020223032
view:off_nadir: 28.0
view:azimuth: 290.3
view:incidence_angle: 59.0
view:sun_azimuth: 170.0
view:sun_elevation: 51.5
proj:epsg: 32610
proj:geometry: {'type': 'Polygon', 'coordinates': [[[549843.75, 4185156.25], [549843.75, 4179843.75], [555156.25, 4179843.75], [555156.25, 4185156.25], [549843.75, 4185156.25]]]}
proj:bbox: [549843.75, 4179843.75, 555156.25, 4185156.25]
tile:data_area: 28.2
tile:clouds_area: 0.0
tile:clouds_percent: 0
Tiles have Shapely geometry properties. For the full tile, use the geometry of a tile's cell:
from shapely.geometry import shape
shape(tile.cell)
A tile's data_mask
is the geometry outlining filled pixels.
tile.data_mask
ARD V4 deliverables also includes the following vector mask assets:
- Cloud mask
- Cloud shadow mask
- Terrain shadow mask
- Water mask
- Healthy vegetation mask
- Saturated multispectral pixel mask
- Pan band flare pixel mask
These assets are available through "magic" accessors which return Shapely geometries:
.cloud_mask
.cloud_shadow_mask
.terrain_shadow_mask
.water_mask
.healthy_vegetation_mask
.ms_saturation_mask
.pan_flare_mask
tile.water_mask
Data masks can also be accessed "inverted" by prefixing no_
to the accessor name: for example to get the geometry not covered by water, use .no_water_mask
:
tile.no_water_mask
ARDtiles
return Rasterio readers for raster assets:
visual
(RGB)pan_analytic
ms_analytic
cloud_mask_raster
(includes clouds and cloud shadows)terrain_shadow_mask_raster
water_mask_raster
healthy_vegetation_mask_raster
ms_saturation_mask_raster
pan_flare_mask_raster
Like the data masks, these are available as attributes on tile objects. We'll show a thumbnail of the visual image by reading from the COG's overviews. You can see the water matches the water_mask
geometry.
See the ARD Order Deliveries section of the User Guide for more details about the asset files.
%matplotlib inline
from rasterio.plot import show
with tile.visual as src:
# read from overviews
arr = src.read(out_shape=(3, int(src.height / 64), int(src.width / 64)))
show(arr)
ARDCollection
s can be subset with the following keywords:
aoi
: almost any geometry object or string representationcat_id_in
: catalog ID listzone_in
: zone number listearliest_date
: start date (inclusive) as YYYY-MM-DD (for now)latest_date
: end date (incusive) as YYYY-MM-DD (for now)
print('Full collection:', len(collection.tiles))
aoi = 'POINT(-122.446 37.793)' # a point over the western half of San Francisco
aoi_coll = ARDCollection('s3://maxar-ard-samples/v4/sample-001/', aoi=aoi, public=True)
print("AOI'ed collection:", len(aoi_coll.tiles))
Full collection: 6
AOI'ed collection: 3
Example: plot thumbnail from all tiles in an AOI-filtered collection
from rasterio.plot import show
for tile in aoi_coll.tiles:
with tile.visual as src:
# read overview
arr = src.read(out_shape=(3, int(src.height / 64), int(src.width / 64)))
show(arr)
You can run idiomatic Python like:
collection = ARDCollection('s3://maxar-ard-samples/v4/sample-002/', aoi=aoi)
filtered = (tile for tile in collection.tiles if tile.properties['platform'] = 'WVO3') #generator comprehension
for tile in filtered:
# do something with the tile
Acquisitions and Stacks¶
As shown above ARDCollections
can also return Acquisition
s and Stack
s. Both are list-like objects of ARDTile
objects and have 'getter' methods. See the ARD Select tutorial for more detail about Stacks
and Acquisitions
.
acquisition = collection.acquisitions[0]
acquisition = collection.get_acquisition('1040010022712A00')
print(acquisition)
print(acquisition.properties)
<Acquisition at 1040010022712A00 (2 tiles)>
{'catalog_id': '1040010022712A00', 'platform': 'WV03', 'datetime': '2016-09-23 19:36:58Z'}
acquisition.get_tile_from_cell('z10-120020223023')
<ARDTile of 1040010022712A00 at z10-120020223023>
An acquisition can also be read by Rasterio. Acquisition.open_acquisition()
returns an asset that can treat the tiles of an acquisition as one large raster. The reader will attempt to color balance the source images to appear seamless, using the tile that has the largest overlap with the AOI as the reference image.
bounds = (-122.44897283332135, 37.79138422369229, -122.44808031104897, 37.7920925019562)
# Convert the bounding box to a geometry object
from shapely.geometry import box
aoi = box(*bounds)
# Read the "visual" asset pixels from the acquisition without having to access individual tiles
reader = acquisition.open_acquisition()
data = reader.read(aoi, 'visual')
show(data)
<AxesSubplot:>
Stack
s are a collection of ARD tiles from different acquisitions that cover the same ARD grid cell.
stack = collection.stacks[0]
stack = collection.get_stack('z10-120020223023')
print(stack)
for tile in stack:
print(tile.date, tile.acq_id)
<Stack at Z10-120020223023 (3 tiles)>
2016-09-22 104001002124FA00
2016-09-23 1040010022712A00
2016-09-30 103001005C2E5E00
Local files¶
You can copy a collection's tiles locally. This will only copy the tiles that match any provided filters. The ARD folder structure is preserved:
from max_ard.ard_collection import copy
copy(aoi_coll, './my_tiles')
# ./my_tiles/ard-demo/38/033133232332/2014-05-30/103001003151DE00-visual.tif
A local ARDCollection
of structured data can be accessed in the same way:
c = ARDCollection('./my_tiles')
Tiles from the same acquisition have the same file name. If local copies are needed in a flat structure, the copy tool can rename them to have flat filenames instead being arranged in folder structures.
copy(aoi_coll, './my_flat_tiles', flat=True)
# ./my_tiles/ard-demo/38-033133232332-2014-05-30-103001003151DE00-visual.tif
Note: flat files can not be accessed by an ARDCollection
at this time.
c = ARDCollection('./my_flat_tiles')
# NOPE
Authentication for cloud providers¶
There are multiple ways to supply credentials to the SDK to read ARD tiles. Credentials are needed to accomplish two things: interacting with the files using fsspec
and reading raster or vector files using a GDAL virtual file system.
Amazon S3¶
S3 file systems are read using https://github.com/fsspec/s3fs
which builds on top of botocore
. Raster and vector assets are read using GDAL's /vsis3/
virtual protocol. These two systems largely overlap in how they access S3 credentials so the methods are interchangeable. They also extend to the AWS CLI tool, so if your method of storing credentials works with aws s3 ls <path>
then they are also likely to work with the ARD SDK.
As Maxar is a substantial AWS S3 user, we recommend either using a credentials file or setting the AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID environment variables.
For public S3 buckets, passing public=True
when creating an ARDCollection
will skip authentication.
Azure Blob Storage¶
The Azure package for fsspec
is not installed by default. You will need to install adlfs
separately.
For Azure, while there are several ways to provide credentials, we have found the simplest is to set the AZURE_STORAGE_CONNECTION_STRING
environment variable. You can create connections strings from the Container Blade settings menu.
For other options, see adlfs
and /vsiaz/
.
Public Azure containers require an additional environment variable to be set in addition to passing public=True
. The name the account owning the container needs to be set in the variable AZURE_STORAGE_ACCOUNT_NAME
.
Google Cloud Storage¶
The GCS package for fsspec
is not installed by default. You will need to install gcfsf
separately.
See /vsigs/
for additional access information.