Skip to contents

Downloads PDS GPS tracks and applies a statistical model to predict fishing activity, uploading results to cloud storage. Implements version-aware incremental processing: trips already predicted with the current model version are skipped, while files from outdated model versions are deleted and reprocessed.

Usage

predict_pds_tracks(
  log_threshold = logger::DEBUG,
  date_from = "2023-01-01",
  n_workers = NULL,
  batch_size = 500L,
  max_trip_days = 5,
  package = "coasts"
)

Arguments

log_threshold

The logging threshold to use. Default is `logger::DEBUG`.

date_from

Character. Start date for trip retrieval in "YYYY-MM-DD" format. Default is `"2023-01-01"`.

n_workers

Integer or NULL. Number of parallel workers for the fetch stage. Defaults to `parallel::detectCores() - 1`.

batch_size

Integer. Number of tracks to process per prediction batch. Default is 500.

max_trip_days

Numeric or NULL. Trips whose timestamp range exceeds this number of days are skipped before prediction (they are recorded as `"skipped_too_long"` in the summary). This guards against corrupted or never-reset device tracks that would cause extreme memory and CPU usage. Default is `5`. Set to `NULL` to disable the filter.

package

Name of the package whose `inst/conf.yml` to read. Defaults to `"coasts"`. Pass your own package name when calling from a downstream package with a compatible configuration.

Value

Invisibly returns a data frame summarising the outcome for each trip (columns: `trip`, `status`).

Details

The pipeline runs in two stages: 1. **Parallel fetch** (I/O-bound): Downloads raw track points for all new trips using multiple workers. 2. **Sequential predict + upload** (Python-bound): Applies the `ssfaitk` statistical model and uploads fishing-only points as parquet files to the PDS storage bucket.

Requires the `ssfaitk` package for fishing activity classification and a working Python environment accessible via `reticulate`. Set the `RETICULATE_PYTHON` environment variable to specify the Python interpreter path when running in CI or container environments.

See also

[get_trips()], [get_trip_points()], [resolve_storage_opts()]