Skip to contents

Data Pipeline Workflow

Core functions that execute each step in the data pipeline, from data ingestion to validation and export.

export_data()
Export Processed Fisheries Data to MongoDB
export_wf_data()
Export WorldFish Survey Data
get_validation_status()
Get Validation Status from KoboToolbox
ingest_pds_tracks()
Ingest Pelagic Data Systems (PDS) Track Data
ingest_pds_trips()
Ingest Pelagic Data Systems (PDS) Trip Data
ingest_surveys()
Ingest WCS and WF Catch Survey Data
preprocess_ba_surveys()
Pre-process Blue Alliance Surveys
preprocess_pds_tracks()
Preprocess Pelagic Data Systems (PDS) Track Data
preprocess_wcs_surveys()
Pre-process Zanzibar WCS Surveys
preprocess_wf_surveys()
Pre-process WorldFish Surveys
sync_validation_submissions()
Synchronize Validation Statuses with KoboToolbox
update_validation_status()
Update Validation Status in KoboToolbox
validate_ba_surveys()
Validate Blue Alliance (BA) Surveys Data
validate_wcs_surveys()
Validate WCS Surveys Data
validate_wf_surveys()
Validate Wild Fishing Survey Data

Data Ingestion

Functions for pulling data from external sources (KoboToolbox, Pelagic Data Systems) and transforming it into standardized formats.

get_trip_points()
Get Trip Points from Pelagic Data Systems API
get_trips()
Retrieve Trip Details from Pelagic Data API
ingest_pds_tracks()
Ingest Pelagic Data Systems (PDS) Track Data
ingest_pds_trips()
Ingest Pelagic Data Systems (PDS) Trip Data
ingest_surveys()
Ingest WCS and WF Catch Survey Data
retrieve_surveys()
Retrieve Surveys from Kobotoolbox

Cloud Storage Management

Functions for interacting with cloud storage providers (Google Cloud Storage, MongoDB), uploading, downloading, and managing data files in various formats.

cloud_object_name()
Retrieve Full Name of Versioned Cloud Object
cloud_storage_authenticate()
Authenticate to a Cloud Storage Provider
download_cloud_file()
Download Object from Cloud Storage
download_parquet_from_cloud()
#' Download Parquet File from Cloud Storage
get_metadata()
Get metadata tables
get_preprocessed_surveys()
Download Preprocessed Surveys
get_validated_surveys()
Download Validated Surveys
mdb_collection_pull()
Retrieve Data from MongoDB
mdb_collection_push()
Upload Data to MongoDB and Overwrite Existing Content
upload_cloud_file()
Upload File to Cloud Storage
upload_parquet_to_cloud()
Upload Processed Data to Cloud Storage

Data Preprocessing

Functions for cleaning, transforming, and structuring raw data into standardized formats ready for analysis, including data nesting and reshaping.

calculate_catch()
Calculate Catch Weight from Length-Weight Relationships or Bucket Measurements
generate_track_summaries()
Generate Grid Summaries for Track Data
getLWCoeffs()
Get Length-Weight Coefficients and Morphological Data for Species
get_fao_groups()
Extract and Format FAO Taxonomic Groups
get_length_weight_batch()
Get Length-Weight and Morphological Parameters for Species (Batch Version)
get_species_areas_batch()
Get FAO Areas for Species (Batch Version)
load_taxa_databases()
Load Taxa Data from FishBase and SeaLifeBase
match_species_from_taxa()
Match Species from Taxa Databases
preprocess_ba_surveys()
Pre-process Blue Alliance Surveys
preprocess_pds_tracks()
Preprocess Pelagic Data Systems (PDS) Track Data
preprocess_track_data()
Preprocess Track Data into Spatial Grid Summary
preprocess_wcs_surveys()
Pre-process Zanzibar WCS Surveys
preprocess_wf_surveys()
Pre-process WorldFish Surveys
process_species_list()
Process Species List with Taxonomic Information
reshape_catch_data()
Reshape Catch Data with Length Groupings
reshape_species_groups()
Reshape Species Groups from Wide to Long Format

Data Mining & Enrichment

Functions for enriching fisheries data with scientific information, taxonomic classification, and biological parameters (length-weight relationships).

calculate_catch()
Calculate Catch Weight from Length-Weight Relationships or Bucket Measurements
expand_taxa()
Expand Taxonomic Vectors into a Data Frame
getLWCoeffs()
Get Length-Weight Coefficients and Morphological Data for Species
get_fao_groups()
Extract and Format FAO Taxonomic Groups
get_length_weight_batch()
Get Length-Weight and Morphological Parameters for Species (Batch Version)
get_species_areas_batch()
Get FAO Areas for Species (Batch Version)
load_taxa_databases()
Load Taxa Data from FishBase and SeaLifeBase
match_species_from_taxa()
Match Species from Taxa Databases
process_species_list()
Process Species List with Taxonomic Information

Data Validation

Functions for validating fisheries data through quality checks, statistical outlier detection, and applying domain-specific validation rules.

add_validation_flags()
Add validation flags to catch data
aggregate_survey_data()
Aggregate survey data and calculate metrics
calculate_catch_revenue()
Calculate catch revenue from validated data
extract_trips_info()
Extract trip information from preprocessed surveys
get_catch_bounds()
Get catch bounds for survey data
get_length_bounds()
Get length bounds for survey data
get_validation_status()
Get Validation Status from KoboToolbox
process_catch_data()
Process catch data from surveys
sync_validation_submissions()
Synchronize Validation Statuses with KoboToolbox
update_validation_status()
Update Validation Status in KoboToolbox
validate_ba_surveys()
Validate Blue Alliance (BA) Surveys Data
validate_catches()
Validate catches using quality flags
validate_prices()
Validate market prices
validate_wcs_surveys()
Validate WCS Surveys Data
validate_wf_surveys()
Validate Wild Fishing Survey Data

Data Export & Visualization

Functions for exporting processed data to various formats and creating visualizations for reporting and analysis.

create_geos()
Generate Geographic Regional Summaries of Fishery Data
export_data()
Export Processed Fisheries Data to MongoDB
export_wf_data()
Export WorldFish Survey Data
kepler_mapper()
Generate a Kepler.gl map

Helper Functions

Utility functions that support the main pipeline operations, providing common data manipulation and processing capabilities.

add_version()
Add timestamp and sha string to a file name
read_config()
Read configuration file