Skip to contents

Data Pipeline Workflow

Core functions that execute each step in the data pipeline, from data ingestion through validation, analysis, and export to MongoDB and cloud storage.

export_api_raw()
Export Raw API-Ready Trip Data
export_api_validated()
Export Validated API-Ready Trip Data
export_validation_flags()
Export Validation Flags to MongoDB
export_wf_data()
Export WorldFish Summary Data to MongoDB
get_validation_status()
Get Validation Status from KoboToolbox
ingest_wcs_surveys()
Ingest WCS Catch Survey Data
ingest_wf_surveys()
Ingest WF Catch Survey Data
match_surveys_to_gps_trips()
Match Catch Surveys to GPS Trips
merge_trips()
Merge Survey and GPS Trip Data
preprocess_ba_surveys()
Pre-process Blue Alliance Surveys
preprocess_wcs_surveys()
Pre-process Zanzibar WCS Surveys
preprocess_wf_surveys()
Pre-process and Combine WorldFish Surveys - Both Versions
sync_validation_submissions()
Synchronize Validation Statuses with KoboToolbox
update_validation_status()
Update Validation Status in KoboToolbox
validate_ba_surveys()
Validate Blue Alliance (BA) Surveys Data
validate_wcs_surveys()
Validate WCS Surveys Data
validate_wf_surveys()
Validate worldfish Survey Data

Data Ingestion

Functions for pulling data from external sources (KoboToolbox, Pelagic Data Systems) and transforming it into standardized formats.

ingest_wcs_surveys()
Ingest WCS Catch Survey Data
ingest_wf_surveys()
Ingest WF Catch Survey Data

Cloud Storage Management

Functions for interacting with cloud storage providers (Google Cloud Storage, MongoDB), uploading, downloading, and managing data files in various formats.

Data Preprocessing

Functions for cleaning, transforming, and structuring raw data into standardized formats ready for analysis, including data nesting, reshaping, and trip processing.

calculate_catch()
Calculate Catch Weight from Length-Weight Relationships or Bucket Measurements
getLWCoeffs()
Get Length-Weight Coefficients and Morphological Data for Species
get_airtable_form_id()
Get Airtable Form ID from KoBoToolbox Asset ID
get_fao_groups()
Extract and Format FAO Taxonomic Groups
get_length_weight_batch()
Get Length-Weight and Morphological Parameters for Species (Batch Version)
get_species_areas_batch()
Get FAO Areas for Species (Batch Version)
load_taxa_databases()
Load Taxa Data from FishBase and SeaLifeBase
map_surveys()
Map Survey Labels to Standardized Taxa, Gear, and Vessel Names
match_species_from_taxa()
Match Species from Taxa Databases
preprocess_ba_surveys()
Pre-process Blue Alliance Surveys
preprocess_wcs_surveys()
Pre-process Zanzibar WCS Surveys
preprocess_wf_surveys()
Pre-process and Combine WorldFish Surveys - Both Versions
process_species_list()
Process Species List with Taxonomic Information
reshape_catch_data()
Reshape Catch Data with Length Groupings
reshape_catch_data_v2()
Reshape Catch Data with Length Groupings - Version 2
reshape_species_groups()
Reshape Species Groups from Wide to Long Format

Data Mining & Summarization

Functions for enriching fisheries data with scientific information, taxonomic classification, biological parameters, and creating summary datasets for analysis.

calculate_catch()
Calculate Catch Weight from Length-Weight Relationships or Bucket Measurements
expand_taxa()
Expand Taxonomic Vectors into a Data Frame
getLWCoeffs()
Get Length-Weight Coefficients and Morphological Data for Species
get_fao_groups()
Extract and Format FAO Taxonomic Groups
get_length_weight_batch()
Get Length-Weight and Morphological Parameters for Species (Batch Version)
get_species_areas_batch()
Get FAO Areas for Species (Batch Version)
load_taxa_databases()
Load Taxa Data from FishBase and SeaLifeBase
match_species_from_taxa()
Match Species from Taxa Databases
process_species_list()
Process Species List with Taxonomic Information

Data Modeling & Analysis

Functions for statistical modeling, fleet activity estimation, and scaling sample-based GPS data to fleet-wide estimates using boat registry information.

Data Validation

Functions for validating fisheries data through quality checks, statistical outlier detection, and applying domain-specific validation rules.

add_validation_flags()
Add validation flags to catch data
aggregate_survey_data()
Aggregate survey data and calculate metrics
calculate_catch_revenue()
Calculate catch revenue from validated data
export_validation_flags()
Export Validation Flags to MongoDB
extract_trips_info()
Extract trip information from preprocessed surveys
get_catch_bounds()
Get catch bounds for survey data
get_length_bounds()
Get length bounds for survey data
get_validation_status()
Get Validation Status from KoboToolbox
process_catch_data()
Process catch data from surveys
sync_validation_submissions()
Synchronize Validation Statuses with KoboToolbox
update_validation_status()
Update Validation Status in KoboToolbox
validate_ba_surveys()
Validate Blue Alliance (BA) Surveys Data
validate_catches()
Validate catches using quality flags
validate_prices()
Validate market prices
validate_wcs_surveys()
Validate WCS Surveys Data
validate_wf_surveys()
Validate worldfish Survey Data

Data Export & Visualization

Functions for exporting processed data to MongoDB collections, creating geographic visualizations, and preparing data for portals and reporting.

export_api_raw()
Export Raw API-Ready Trip Data
export_api_validated()
Export Validated API-Ready Trip Data
export_wf_data()
Export WorldFish Summary Data to MongoDB
kepler_mapper()
Generate a Kepler.gl map

Pipeline Orchestration

High-level functions that orchestrate complete analysis pipelines, combining multiple processing steps into integrated workflows.

Helper Functions

Utility functions that support the main pipeline operations, providing common data manipulation and processing capabilities.

add_version()
Add timestamp and sha string to a file name
get_airtable_form_id()
Get Airtable Form ID from KoBoToolbox Asset ID
map_surveys()
Map Survey Labels to Standardized Taxa, Gear, and Vessel Names
read_config()
Read configuration file

Airtable Integration

Functions for interacting with Airtable API.

airtable_to_df()
Get All Records from Airtable with Pagination
get_writable_fields()
Get Writable Fields from Airtable Table
update_airtable_record()
Update Single Airtable Record
bulk_update_airtable()
Bulk Update Multiple Airtable Records
df_to_airtable()
Create New Airtable Records
device_sync()
Sync Data with Airtable (Update + Create)