Data Access¶
Data scientists know that one of the biggest challenges they face is managing data. By delivering imagery that is sorted and structured ARD reduces the time needed to organize data. It also allows access tools to work in an automatic manner.
ARDCollection
s have methods that can read (and then also write) windows of data. The software handles finding the intersecting images and reading the pixels for you so you don't have to spend time iterating through image tiles yourself.
Deforestation Example¶
One of the Maxar ARD sample datasets covers deforestation over time near Rio. Looking at one of the tiles in QGIS revealed this dam creating a pond:
We'd like to leverage ARD to see when this dam first appeared in the ARD tiles.
In QGIS we drew the red AOI and exported it as GeoJSON to use as bounds for our image chips.
First, let's import some libraries:
# ARDCollection to work with data
from max_ard import ARDCollection
# Fiona and shapely to load our AOI geometry from a GeoJSON file
import fiona
# Modules for displaying and writing images
import os
from rasterio.plot import show
import matplotlib.pyplot as plt
%matplotlib inline
To get started, we'll create an ARDCollection
of the Rio sample data.
Note: because the sample data is in a public AWS S3 bucket, for a simpler setup we'll pass public=True
to disable authentication.
We can see that there are 24 tiles, and they all cover the same ARD Grid tile: z19-300022033202.
collection = ARDCollection('s3://maxar-ard-samples/v4/rio_deforestation', public=True)
collection.tiles
[<ARDTile of 105041000218BB00 at z19-300022033202>,
<ARDTile of 1050410001B2E100 at z19-300022033202>,
<ARDTile of 103001000AC5B000 at z19-300022033202>,
<ARDTile of 103001000CB3FD00 at z19-300022033202>,
<ARDTile of 103001001826FC00 at z19-300022033202>,
<ARDTile of 1050410000818900 at z19-300022033202>,
<ARDTile of 1050410002473100 at z19-300022033202>,
<ARDTile of 10300100253F7900 at z19-300022033202>,
<ARDTile of 1030010026C30800 at z19-300022033202>,
<ARDTile of 10300100296ED400 at z19-300022033202>,
<ARDTile of 10504100105F9E00 at z19-300022033202>,
<ARDTile of 1030010044098C00 at z19-300022033202>,
<ARDTile of 105001000A224100 at z19-300022033202>,
<ARDTile of 105001000A3F3700 at z19-300022033202>,
<ARDTile of 10500100108C6000 at z19-300022033202>,
<ARDTile of 1050010010B6CE00 at z19-300022033202>,
<ARDTile of 105001001583E900 at z19-300022033202>,
<ARDTile of 103001009A654D00 at z19-300022033202>,
<ARDTile of 103001009D8E0A00 at z19-300022033202>,
<ARDTile of 103001009E4FAD00 at z19-300022033202>,
<ARDTile of 105001001BD5A600 at z19-300022033202>,
<ARDTile of 10300100AC6F6800 at z19-300022033202>,
<ARDTile of 10300100ABB61700 at z19-300022033202>,
<ARDTile of 1050010020C7EE00 at z19-300022033202>]
In this case we have selected an AOI that is already in this tile, but what if this collection had tiles all over the globe? How would we know which tiles to open, and how do we read just the window we want?
ARDCollection objects have a method which does this all for you.
(If you run the next cell in a notebook you can read the docstring, but here it is for reference:)
ARDCollection.read_windows(geoms, src_proj=4326)
A generator for reading windows from overlapping tiles in an ARDCollection
Args:
geoms: A Shapely geometry object, or a list of geometries
src_proj: An EPSG identifier parseable by Proj4, default is 4326 (WGS84 Geographic)
Yields:
Tuple of:
- ARDTile of tile
- Shapely polygon representing the bounds of the input geometry
in the tile's UTM coordinate system
- a Reader function:
reader(asset)
Args:
asset(str): asset to read: `visual`, `pan`, or `ms`
Returns:
numpy array of data covered by the returned geometry
Example:
geoms = [...]
for tile, geom, reader in read_windows(geoms):
# can check tile.properties, or fetch tile.clouds
# can check if `geom` intersects with clouds
data = reader('visual')
# shows the docstring in a notebook
?ARDCollection.read_windows
We'll need the AOI geometry from the GeoJSON file, so we'll open the file with fiona
. read_windows()
can work directly with a Fiona dataset reader so there's no need to open the file and parse out geometries. Note that while we've named this chips
, there's only one polygon in this file.
chips = fiona.open('rio-chip.geojson')
read_windows()
returns a generator that will yield a tuple of the following for every intersecting ARD tile in the ARDCollection:
- the intersecting ARDTile object
- the input geometry, reprojected to the tile's UTM coordinate system
- a reader function:
reader(<asset name>)
The reader function, when called with a name of the raster asset (visual
, pan
, or ms
), will return a Numpy array of the window covering the bounds of the input geometry.
So in a very simple example, we'll loop through windows with read_windows()
and the features read by Fiona. Then we'll use Rasterio's show
method to plot the image:
for tile, geom, reader in collection.read_windows(chips):
# reader is a function, so you call it with the asset name
show(reader('visual'))