Opening and Exploring the Geo Images

Opening and Exploring the Geo Images

1) Install dependencies

pip install geopandas shapely rioxarray rasterio xarray dask[complete] pandas

2) Load the GeoJSON (AOIs / footprints)

import geopandas as gpd

 

gdf = gpd.read_file(“aoi.geojson”)   # FeatureCollection

# If there are multiple features, pick one:

aoi = gdf.geometry.iloc[0]

3) Prepare a list of image paths (local or S3 COGs)

If you’ve downloaded files locally, use local paths. If you want to read COGs directly from the public S3 bucket with GDAL, you can use /vsis3/ and set unsigned-access env vars:

import os

os.environ[“AWS_NO_SIGN_REQUEST”] = “YES”

os.environ[“AWS_REGION”] = “us-west-2”

 

Example paths:

cog_paths = [

“local/path/to/image1.tif”,

“local/path/to/image2.tif”,

# or S3 (GDAL):

# “/vsis3/capella-open-data/data/<prefix>/image1.tif”,

]

4) Open, clip to the GeoJSON AOI, align grids, and stack

This version uses rioxarray (xarray-friendly) and clips each image to the AOI, then reprojects/resamples everything to match the first image before stacking along a time dimension.

import rioxarray as rxr

import xarray as xr

import pandas as pd

 

cog_paths:

times = pd.to_datetime([

“2026-01-05T12:00:00Z”,

“2026-02-02T12:00:00Z”,

], utc=True)

 

arrays = []

ref = None

 

for path, t in zip(cog_paths, times):

da = rxr.open_rasterio(path, masked=True, chunks={“x”: 2048, “y”: 2048})  # dims: band,y,x

da = da.squeeze(“band”, drop=True)  # if single-band; keep band if you need it

 

# Ensure AOI is in the same CRS as the raster

aoi_in_crs = gdf.set_geometry([aoi]).to_crs(da.rio.crs).geometry.iloc[0]

 

# Clip to AOI (keeps only pixels intersecting polygon)

da = da.rio.clip([aoi_in_crs], da.rio.crs, drop=True)

 

# Align to a common grid

if ref is None:

     ref = da

else:

     da = da.rio.reproject_match(ref)

 

da = da.assign_coords(time=t).expand_dims(“time”)

arrays.append(da)

 

stack = xr.concat(arrays, dim=”time”).sortby(“time”)  # dims: time,y,x

print(stack)

5) Save the stack for fast reuse

For time-series work, Zarr is usually the most convenient:

stack.to_dataset(name=”backscatter”).to_zarr(“sar_stack.zarr”, mode=”w”)

# Or NetCDF:

stack.to_netcdf(“sar_stack.nc”)