Downloads PDS GPS tracks and applies a statistical model to predict fishing activity, uploading results to cloud storage. Implements version-aware incremental processing: trips already predicted with the current model version are skipped, while files from outdated model versions are deleted and reprocessed.
Usage
predict_pds_tracks(
log_threshold = logger::DEBUG,
date_from = "2023-01-01",
n_workers = NULL,
batch_size = 500L,
max_trip_days = 5,
package = "coasts"
)Arguments
- log_threshold
The logging threshold to use. Default is `logger::DEBUG`.
- date_from
Character. Start date for trip retrieval in "YYYY-MM-DD" format. Default is `"2023-01-01"`.
- n_workers
Integer or NULL. Number of parallel workers for the fetch stage. Defaults to `parallel::detectCores() - 1`.
- batch_size
Integer. Number of tracks to process per prediction batch. Default is 500.
- max_trip_days
Numeric or NULL. Trips whose timestamp range exceeds this number of days are skipped before prediction (they are recorded as `"skipped_too_long"` in the summary). This guards against corrupted or never-reset device tracks that would cause extreme memory and CPU usage. Default is `5`. Set to `NULL` to disable the filter.
- package
Name of the package whose `inst/conf.yml` to read. Defaults to `"coasts"`. Pass your own package name when calling from a downstream package with a compatible configuration.
Value
Invisibly returns a data frame summarising the outcome for each trip (columns: `trip`, `status`).
Details
The pipeline runs in two stages: 1. **Parallel fetch** (I/O-bound): Downloads raw track points for all new trips using multiple workers. 2. **Sequential predict + upload** (Python-bound): Applies the `ssfaitk` statistical model and uploads fishing-only points as parquet files to the PDS storage bucket.
Requires the `ssfaitk` package for fishing activity classification and a working Python environment accessible via `reticulate`. Set the `RETICULATE_PYTHON` environment variable to specify the Python interpreter path when running in CI or container environments.