
Merge Survey and GPS Trip Data
merge_trips.RdEnd-to-end workflow for matching KEFS catch surveys to PDS GPS trips for Kenya data. Loads device registry, validated surveys, and GPS trips, then performs fuzzy matching, merges all records (matched and unmatched), and uploads the result to cloud storage.
Arguments
- conf
Configuration list from
read_config()containing:metadata$airtable$assets: Path to device registry
surveys$kefs$v2$validated$file_prefix: Path to validated surveys
surveys$kefs$v2$merged: Output path for merged data
pds$pds_trips$file_prefix: Path to preprocessed PDS trips data
pds$pds_trips$version: Version of PDS trips data to use
storage$google: Cloud storage settings (key, options, options_coasts)
Value
Invisible NULL. The function uploads a parquet file to cloud storage containing all merged records with the following structure:
submission_id: Survey identifier (NA for unmatched trips)
landing_date: Landing date
imei: Device IMEI
n_fields_used, n_fields_ok, match_ok: Match quality indicators
trip: GPS trip identifier (NA for unmatched surveys)
started, ended: Trip timestamps
registration_number_survey, registration_number_trip
boat_name_survey, boat_name_trip
fisher_name_survey, fisher_name_trip
Additional survey and trip metadata
The uploaded dataset includes:
Matched survey-trip pairs (where both submission_id and trip are non-NA)
Unmatched surveys (trip = NA)
Unmatched trips (submission_id = NA)
Details
The function executes the following pipeline:
Load device registry: Downloads Airtable assets from cloud storage and filters for Kenya devices (WorldFish - Kenya, Kenya, Kenya AABS)
Load validated surveys: Downloads preprocessed and validated KEFS v2 surveys from cloud storage
Load GPS trips: Downloads preprocessed PDS trips data from cloud storage using configured file prefix and version
Filter PDS surveys: Selects only surveys where pds = "yes" or "pds"
Match surveys to trips: Runs
match_surveys_to_gps_trips()with two-step fuzzy matching (surveys -> registry -> trips)Merge with full datasets: Combines matched subset with all unmatched surveys and trips
Upload to cloud storage: Saves merged data as versioned parquet file
Logging
The function logs progress at each step:
Loading device registry
Loading validated surveys
Loading GPS trips from cloud storage
Number of PDS surveys being matched
Merging with full datasets
Final counts (total records and matched pairs)
Uploading merged data to cloud storage
Pipeline Integration
This function is typically run after:
ingest_kefs_surveys_v2()- Downloads raw data from Kobopreprocess_kefs_surveys_v2()- Cleans and standardizes surveysvalidate_kefs_surveys_v2()- Validates catch dataingest_pds_trips()andpreprocess_pds_tracks()- Preprocessed PDS trips data must be available in cloud storage
The output can be downloaded using:
merged_data <- download_parquet_from_cloud(
prefix = conf$surveys$kefs$v2$merged,
provider = conf$storage$google$key,
options = conf$storage$google$options
)Examples
if (FALSE) { # \dontrun{
# Standard usage - merges and uploads to cloud
conf <- read_config()
merge_trips(conf)
# Download the merged data to analyze
merged_data <- download_parquet_from_cloud(
prefix = conf$surveys$kefs$v2$merged,
provider = conf$storage$google$key,
options = conf$storage$google$options
)
# Count matched vs unmatched
table(
survey = !is.na(merged_data$submission_id),
trip = !is.na(merged_data$trip)
)
} # }