Skip to content

preprocess_toolbox.dataset.cli


reproject()

Reproject a dataset from one CRS to another CRS.

Example usage: preprocess_reproject -v -c ./reproject.era5.day.north.json --workers 8 -ps train -sn train,val,test -ss 2023-1-1,2024-2-1,2024-12-1 -se 2023-12-31,2024-2-14,2024-12-1 -sh 4 -st 1 --source-crs 'EPSG:4326' --target-crs 'EPSG:6931' --shape 500 --ease2 data.aws.day.north.json proc.aws

This command reprojects an ERA5 lat/lon grid (EPSG:4326) to an EASE Grid 2.0 grid
(EPSG:6931) with an output shape of (500, 500). The dataset only processes dates
for the splits defined: 2023-1-1 to 2024-2-1, 2024-2-1 to 2024-12-1 and
2024-12-1 to 2025-1-1.
It adds 4 days prior to start and 1 day after due to `-sh` and `-st` flags.
Source code in preprocess_toolbox/dataset/cli.py
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
def reproject():
    """
    Reproject a dataset from one CRS to another CRS.

    Example usage:
        preprocess_reproject -v -c ./reproject.era5.day.north.json --workers 8 -ps train \
        -sn train,val,test -ss 2023-1-1,2024-2-1,2024-12-1 -se 2023-12-31,2024-2-14,2024-12-1 \
        -sh 4 -st 1 --source-crs 'EPSG:4326' --target-crs 'EPSG:6931' --shape 500 \
        --ease2 data.aws.day.north.json proc.aws

        This command reprojects an ERA5 lat/lon grid (EPSG:4326) to an EASE Grid 2.0 grid
        (EPSG:6931) with an output shape of (500, 500). The dataset only processes dates
        for the splits defined: 2023-1-1 to 2024-2-1, 2024-2-1 to 2024-12-1 and
        2024-12-1 to 2025-1-1.
        It adds 4 days prior to start and 1 day after due to `-sh` and `-st` flags.
    """
    args = (
        ProcessingArgParser()
        .add_destination()
        .add_splits()
        .add_extra_args(
            [
                (("-w", "--workers"), dict(default=1, type=int)),
                (("-sc", "--source-crs"), dict(
                        default="EPSG:4326",
                        type=str,
                        required=True,
                        help="Source dataset CRS definition: EPSG code (e.g., `EPSG:4326`)",
                )),
                (("-tc", "--target-crs"), dict(
                        default="EPSG:6931",
                        type=str,
                        required=False,
                        help="Target dataset CRS definition: Full cartopy.crs expression (e.g., `EPSG:6931`)",
                )),
                (("-r", "--resolution"), dict(
                        default=None,
                        type=float,
                        required=False,
                        help="Resolution of output grid (in meters or degrees). Can only specify either `--resolution` or `--shape`, not both",
                )),
                (("-s", "--shape"), dict(
                        default="720,720",
                        type=str,
                        required=False,
                        help="Shape of output grid (in pixels, e.g. '720,720'). Can only specify either `--resolution` or `--shape`, not both",
                )),
                (("-e", "--ease2"), dict(
                        action="store_true",
                        help="Enable to output an EASE-Grid 2.0 conformal grid",
                )),
                (("-cn", "--coarsen"), dict(
                        default=1,
                        type=int,
                        help="To coarsen output grid by this integer factor.",
                )),
                (("-in", "--interpolate-nans"), dict(
                        action="store_true",
                        help="Enable nearest neighbour interpolation to fill in missing areas.",
                )),
            ]
        )
        .parse_args()
    )
    # Initially copy across the source data from `./data/` to the destination
    # `./processed_data/`
    ds, ds_config = init_dataset(args)
    # Reproject and overwrite the copied data
    reproject_datasets_from_config(
        ds_config,
        source_crs=args.source_crs,
        target_crs=args.target_crs,
        resolution=args.resolution,
        shape=args.shape,
        ease2=args.ease2,
        coarsen=args.coarsen,
        interpolate_nans=args.interpolate_nans,
        workers=args.workers,
    )
    ds_config.save_config()