Skip to content

Data Access

Data scientists know that one of the biggest challenges they face is managing data. By delivering imagery that is sorted and structured ARD reduces the time needed to organize data. It also allows access tools to work in an automatic manner.

ARDCollections have methods that can read (and then also write) windows of data. The software handles finding the intersecting images and reading the pixels for you so you don't have to spend time iterating through image tiles yourself.

Deforestation Example

One of the Maxar ARD sample datasets covers deforestation over time near Rio. Looking at one of the tiles in QGIS revealed this dam creating a pond:

rio_feature.png

We'd like to leverage ARD to see when this dam first appeared in the ARD tiles.

In QGIS we drew the red AOI and exported it as GeoJSON to use as bounds for our image chips.

First, let's import some libraries:

# ARDCollection to work with data
from max_ard import ARDCollection

# Fiona and shapely to load our AOI geometry from a GeoJSON file
import fiona

# Modules for displaying and writing images
import os
from rasterio.plot import show
import matplotlib.pyplot as plt
%matplotlib inline

To get started, we'll create an ARDCollection of the Rio sample data.

Note: because the sample data is in a public AWS S3 bucket, for a simpler setup we'll pass public=True to disable authentication.

We can see that there are 24 tiles, and they all cover the same ARD Grid tile: z19-300022033202.

collection = ARDCollection('s3://maxar-ard-samples/v4/rio_deforestation', public=True)
collection.tiles
[<ARDTile of 105041000218BB00 at z19-300022033202>,
 <ARDTile of 1050410001B2E100 at z19-300022033202>,
 <ARDTile of 103001000AC5B000 at z19-300022033202>,
 <ARDTile of 103001000CB3FD00 at z19-300022033202>,
 <ARDTile of 103001001826FC00 at z19-300022033202>,
 <ARDTile of 1050410000818900 at z19-300022033202>,
 <ARDTile of 1050410002473100 at z19-300022033202>,
 <ARDTile of 10300100253F7900 at z19-300022033202>,
 <ARDTile of 1030010026C30800 at z19-300022033202>,
 <ARDTile of 10300100296ED400 at z19-300022033202>,
 <ARDTile of 10504100105F9E00 at z19-300022033202>,
 <ARDTile of 1030010044098C00 at z19-300022033202>,
 <ARDTile of 105001000A224100 at z19-300022033202>,
 <ARDTile of 105001000A3F3700 at z19-300022033202>,
 <ARDTile of 10500100108C6000 at z19-300022033202>,
 <ARDTile of 1050010010B6CE00 at z19-300022033202>,
 <ARDTile of 105001001583E900 at z19-300022033202>,
 <ARDTile of 103001009A654D00 at z19-300022033202>,
 <ARDTile of 103001009D8E0A00 at z19-300022033202>,
 <ARDTile of 103001009E4FAD00 at z19-300022033202>,
 <ARDTile of 105001001BD5A600 at z19-300022033202>,
 <ARDTile of 10300100AC6F6800 at z19-300022033202>,
 <ARDTile of 10300100ABB61700 at z19-300022033202>,
 <ARDTile of 1050010020C7EE00 at z19-300022033202>]

In this case we have selected an AOI that is already in this tile, but what if this collection had tiles all over the globe? How would we know which tiles to open, and how do we read just the window we want?

ARDCollection objects have a method which does this all for you.

(If you run the next cell in a notebook you can read the docstring, but here it is for reference:)

ARDCollection.read_windows(geoms, src_proj=4326)
A generator for reading windows from overlapping tiles in an ARDCollection

Args:
    geoms: A Shapely geometry object, or a list of geometries
    src_proj: An EPSG identifier parseable by Proj4, default is 4326 (WGS84 Geographic)

Yields:
    Tuple of:
        - ARDTile of tile
        - Shapely polygon representing the bounds of the input geometry
            in the tile's UTM coordinate system
        - a Reader function:

            reader(asset)

                Args:
                    asset(str): asset to read: `visual`, `pan`, or `ms`

                Returns:
                    numpy array of data covered by the returned geometry

Example:

geoms = [...]
for tile, geom, reader in read_windows(geoms):
    # can check tile.properties, or fetch tile.clouds
    # can check if `geom` intersects with clouds
    data = reader('visual')
# shows the docstring in a notebook
?ARDCollection.read_windows

We'll need the AOI geometry from the GeoJSON file, so we'll open the file with fiona. read_windows() can work directly with a Fiona dataset reader so there's no need to open the file and parse out geometries. Note that while we've named this chips, there's only one polygon in this file.

chips = fiona.open('rio-chip.geojson')

read_windows() returns a generator that will yield a tuple of the following for every intersecting ARD tile in the ARDCollection:

  • the intersecting ARDTile object
  • the input geometry, reprojected to the tile's UTM coordinate system
  • a reader function: reader(<asset name>)

The reader function, when called with a name of the raster asset (visual, pan, or ms), will return a Numpy array of the window covering the bounds of the input geometry.

So in a very simple example, we'll loop through windows with read_windows() and the features read by Fiona. Then we'll use Rasterio's show method to plot the image:

for tile, geom, reader in collection.read_windows(chips):
    # reader is a function, so you call it with the asset name
    show(reader('visual'))

png

png

png

png

png

png

png

png

png

png

png

png

png

png

png

png

png

png

png

png

png

png

png

png

If you scroll down through the tiles, you'll notice some of them are cut off or even empty of data, and some are covered in clouds.

Since read_chips also returns the projected chip geometry and an ARDTile object, we can do some checks to see if the chip intersects any of the vector ARD masks that are exposed as ARDtile attributes. If they do, we'll skip that tile.

for tile, geom, reader in collection.read_windows(chips):
    # skip if the chip geometry intersects any clouds:
    if geom.intersects(tile.cloud_mask):
        continue
    # skip if the chip geometry intersects any nodata:
    if geom.intersects(tile.no_data_mask):
        continue
    show(reader('visual'))

png

png

png

png

png

png

png

png

png

png

png

png

png

png

png

png

png

While show() is an easy way to view the images, it would be much nicer to see them in grid labeled by date and without the axes labeled.

Let's create a nicer Matplotlib figure:

# set up a new figure
fig = plt.figure(figsize=(12,24))

# Loop through the windows as before
index = 1
for tile, geom, reader in collection.read_windows(chips):
    if geom.intersects(tile.cloud_mask):
        continue
    if geom.intersects(tile.no_data_mask):
        continue

    # Create a subplot
    ax = fig.add_subplot(6,3,index)
    ax.set_axis_off() # turn off the axes
    show(reader('visual'), ax=ax, title=tile.date) # add a title using the tile's date
    index += 1

# makes a nicer looking plot    
plt.tight_layout()

png

Like read_windows(), there is also a matching write_windows that works in a similar way. Instead of a function you can call to read data, the generator returns a function that can be called to write the data.

writer(asset, path, **kwargs)

    Args:
        asset(str): asset to read: `visual`, `pan`, or `ms`
        path(str): path and name to write the file to.

    Kwargs:
        driver(str): default is 'GTIFF', but can be 'JPEG' or any other support GDAL driver
        Additional keywords are passed to Rasterio's `open` method to create the output file.

    Returns:
        None
try:
    os.mkdir('./chips')
except:
    pass

for tile, geom, writer in collection.write_windows(chips):
    # filter
    if geom.intersects(tile.cloud_mask):
        continue
    if geom.intersects(tile.no_data_mask):
        continue

    # generate a name and path for the chip
    name = f'./chips/{tile.acq_id}-chip.jpg'

    # Call the writer function with the source asset name and the path for the chip.
    # Default behavior is to use the GTIFF driver but we'll write JPEGs instead

    writer('visual', name, driver='JPEG')

os.listdir('./chips')
['105001000A3F3700-chip.jpg',
 '105041000218BB00-chip.jpg',
 '103001000AC5B000-chip.jpg.aux.xml',
 '103001009A654D00-chip.jpg.aux.xml',
 '103001000AC5B000-chip.jpg',
 '105001000A3F3700-chip.jpg.aux.xml',
 '103001000CB3FD00-chip.jpg.aux.xml',
 '10300100AC6F6800-chip.jpg',
 '10300100253F7900-chip.jpg',
 '103001009D8E0A00-chip.jpg.aux.xml',
 '1030010044098C00-chip.jpg',
 '103001000CB3FD00-chip.jpg',
 '103001009D8E0A00-chip.jpg',
 '1050010010B6CE00-chip.jpg.aux.xml',
 '105041000218BB00-chip.jpg.aux.xml',
 '10300100296ED400-chip.jpg.aux.xml',
 '105001000A224100-chip.jpg',
 '1050410002473100-chip.jpg',
 '10300100ABB61700-chip.jpg',
 '103001009A654D00-chip.jpg',
 '10300100296ED400-chip.jpg',
 '105001001583E900-chip.jpg',
 '105001001583E900-chip.jpg.aux.xml',
 '105001000A224100-chip.jpg.aux.xml',
 '1030010044098C00-chip.jpg.aux.xml',
 '10500100108C6000-chip.jpg.aux.xml',
 '10500100108C6000-chip.jpg',
 '10300100ABB61700-chip.jpg.aux.xml',
 '1050010010B6CE00-chip.jpg',
 '1050410002473100-chip.jpg.aux.xml',
 '103001001826FC00-chip.jpg.aux.xml',
 '1050010020C7EE00-chip.jpg.aux.xml',
 '10300100253F7900-chip.jpg.aux.xml',
 '103001001826FC00-chip.jpg',
 '10300100AC6F6800-chip.jpg.aux.xml',
 '1050010020C7EE00-chip.jpg']

Putting this all together, a minimalist chipping example that writes chips with incrementally numbered names is:

collection = ARDCollection('path/to/data')
chips = fiona.open('path/to/geojson')
for index, (tile, geom, writer) in enumerate(collection.write_windows(chips)):
    name = f'./chips/chip-{index}.tif'
    writer('visual', name)

The window functions can handle a single geometry or an iterable of geometries for input. If you need a reference to the original geometry object, you can loop over the geometries outside the generator and feed them individually.

collection = ARDCollection('path/to/data')
chips = fiona.open('path/to/geojson')
for chip in chips:
    for index, (tile, geom, writer) in enumerate(collection.write_windows(chip['geometry'])):
        name = f'./chips/chip-{chip["properties"]["id"]}-{tile.acq_id}.tif'
        writer('visual', name)

Reading outside tiles

Since ARD tiles are gridded, what happens when reading a window that spans across tile boundaries?

read_windows and write_windows can handle a window extending beyond the extent of an ARD tile. In this example over San Francisco we read a window that extends beyond the image apron of the tile on the left.

sf-overlap.png

When we plot the windows, we can see that they only contain data and the window has been clipped to the data extents.

collection = ARDCollection('s3://maxar-ard-samples/v4/sample-001', public=True)
collection.tiles
[<ARDTile of 104001002124FA00 at z10-120020223023>,
 <ARDTile of 1040010022712A00 at z10-120020223023>,
 <ARDTile of 103001005C2E5E00 at z10-120020223023>,
 <ARDTile of 1040010022712A00 at z10-120020223032>,
 <ARDTile of 103001005C2E5E00 at z10-120020223032>,
 <ARDTile of 103001005D31F500 at z10-120020223032>]
chips = fiona.open('outside-sf.geojson')
for tile, geom, reader in collection.read_windows(chips):
    # reader is a function, so you call it with the asset name
    show(reader('visual'))

png

png

png

png

png

png

Reading across tiles

As shown above, ARD tiles have an apron of pixels that can help read pixels from windows that aren't completely within the ARD grid cell. However, if you have a window that is larger than the overlapped area, a single tile can't fill the whole window.

It is possible to take ARD tiles from the same acquisition and composite them together as a GDAL VRT (Virtual Raster). However the dynamic range of ARD tiles is adjusted per tile so the stack of tiles over a given grid cell look more consistent over time. This means that in certain cases two neighboring tiles might have slightly different dynamic ranges and a seam would be visible. VRTs also have a set stacking order so there is no logic to pick the best pixel source when merging image tiles together.

To address some of these limitations with VRTs, the SDK lets you open a reader on Acquisition objects and read seamless windows from the acquisition's tiles. The component pixels are histogram matched to the source that covers the largest portion of the window to minimize visible seams.

Acquisition.open_acquisition() returns an AcquisitionReader object. The reader object then has a read() method:

AcquisitionReader.read(aoi, asset, src_proj=4326, match=True) -> NDArray

  • aoi: The AOI to read. This can be almost any geometry object, and if not rectilinear its bounds will be used
  • asset: The asset to read, such as visual
  • src_proj: Optionally an EPSG code of the AOI's projection (default is 4326)
  • match: An optional argument to enable/disable color matching (default is true)

Like rasterio and the window readers, this method returns a Numpy NDArray.

Note: this function does not support reading across UTM zones at this time.

For example, using the ARDcollection above, we can read a window from each acquisition:

# there are four acquisitions in this collection, two have adjacent tiles
collection.acquisitions
[<Acquisition of 104001002124FA00 [<ARDTile at Z10-120020223023>]>,
 <Acquisition of 1040010022712A00 [<ARDTile at Z10-120020223023>, <ARDTile at Z10-120020223032>]>,
 <Acquisition of 103001005C2E5E00 [<ARDTile at Z10-120020223023>, <ARDTile at Z10-120020223032>]>,
 <Acquisition of 103001005D31F500 [<ARDTile at Z10-120020223032>]>]
for acquisition in collection.acquisitions:
    # get a reader at the acquisition level
    reader = acquisition.open_acquisition()
    # loop through the chips (though there's only one polygon in this example)
    for chip in chips:
        show(reader.read(chip, 'visual'))

png

png

png

png

Reading at the acquisition levels returns four windows:

  • One that is incomplete because there is only one tile in the acquisition and it doesn't cover the AOI
  • Two that are completely filled by loading data from adjacent tiles
  • One that is completely filled because the one tile in the acquisition covers the AOI

Back to top