
Preprocess KEFS (CATCH ASSESSMENT QUESTIONNAIRE) Survey Data
preprocess_kefs_surveys_v2.RdThis function preprocesses raw KEFS (CATCH ASSESSMENT QUESTIONNAIRE) survey data from Google Cloud Storage. It performs data cleaning, transformation, standardization of field names, type conversions, and mapping to standardized taxonomic and gear names using Airtable reference tables.
Usage
preprocess_kefs_surveys_v2(log_threshold = logger::DEBUG)Value
No return value. Function processes the data and uploads the result as a Parquet file to Google Cloud Storage.
Details
The function performs the following main operations:
Fetches metadata assets: Retrieves taxonomic, gear, vessel, and landing site mappings from Airtable based on the KEFS Kobo form asset ID
Downloads raw data: Retrieves raw survey data from Google Cloud Storage
Extracts trip information: Selects and renames relevant trip-level fields including:
Landing details (date, site, district, BMU)
Fishing ground and JCMA (Joint Community Management Area) information
Vessel details (type, name, registration, motorization, horsepower)
Trip details (crew size, start/end times, gear, mesh size, fuel)
Catch outcome indicators
Reshapes catch data: Transforms catch details from wide to long format using
reshape_priority_species()andreshape_overall_sample()Type conversions and calculations:
Converts date/time fields to proper datetime format
Calculates trip duration in hours from start and end times
Converts numeric fields (hp, fishers, mesh size, fuel) to appropriate types
Joins trip and catch data: Combines trip information with catch records using full join on submission_id
Standardizes names: Maps survey labels to standardized names using
map_surveys():Taxonomic names to scientific names and alpha3 codes
Gear types to standardized gear names
Vessel types to standardized vessel categories
Landing site codes to full site names
Uploads processed data: Saves preprocessed data as a Parquet file to Google Cloud Storage
Data Structure
The preprocessed output includes the following key fields:
Trip identifiers: submission_id
Temporal: landing_date, fishing_trip_start, fishing_trip_end, trip_duration
Spatial: district, BMU, landing_site, fishing_ground, jcma, jcma_site
Vessel: vessel_type, boat_name, vessel_reg_number, motorized, hp
Crew: captain_name, no_of_fishers
Gear: gear, mesh_size
Catch: scientific_name, alpha3_code, total_catch_weight, price_per_kg, total_value
Operations: fuel, catch_outcome, catch_shark
Pipeline Integration
This function is part of the KEFS data pipeline sequence:
ingest_kefs_surveys_v2()- Downloads raw data from Kobopreprocess_kefs_surveys_v2()- Cleans and standardizes data (this function)Validation step (to be implemented)
Export step (to be implemented)
Examples
if (FALSE) { # \dontrun{
# Preprocess KEFS survey data
preprocess_kefs_surveys_v2()
# Run with custom logging level
preprocess_kefs_surveys_v2(log_threshold = logger::INFO)
} # }