Package index
Data Pipeline Workflow
Core functions that execute each step in the data pipeline, from data ingestion through validation, analysis, and export to MongoDB and cloud storage.
-
calculate_district_totals()
- Calculate District-Level Total Catch and Revenue
-
calculate_monthly_trip_stats()
- Calculate Monthly Trip Statistics by District
-
estimate_fleet_activity()
- Estimate Fleet-Wide Activity from Sample Data
-
export_wf_data()
- Export WorldFish Summary Data to MongoDB
-
generate_fleet_analysis()
- Generate Complete Fleet Activity Analysis Pipeline
-
get_validation_status()
- Get Validation Status from KoboToolbox
-
ingest_pds_tracks()
- Ingest Pelagic Data Systems (PDS) Track Data
-
ingest_pds_trips()
- Ingest Pelagic Data Systems (PDS) Trip Data
-
ingest_surveys()
- Ingest WCS and WF Catch Survey Data
-
prepare_boat_registry()
- Prepare Boat Registry Data from Metadata
-
preprocess_ba_surveys()
- Pre-process Blue Alliance Surveys
-
preprocess_pds_tracks()
- Preprocess Pelagic Data Systems (PDS) Track Data
-
preprocess_wcs_surveys()
- Pre-process Zanzibar WCS Surveys
-
preprocess_wf_surveys()
- Pre-process WorldFish Surveys
-
process_trip_data()
- Process Trip Data with District Information
-
summarize_data()
- Summarize WorldFish Survey Data
-
sync_validation_submissions()
- Synchronize Validation Statuses with KoboToolbox
-
update_validation_status()
- Update Validation Status in KoboToolbox
-
validate_ba_surveys()
- Validate Blue Alliance (BA) Surveys Data
-
validate_wcs_surveys()
- Validate WCS Surveys Data
-
validate_wf_surveys()
- Validate Wild Fishing Survey Data
Data Ingestion
Functions for pulling data from external sources (KoboToolbox, Pelagic Data Systems) and transforming it into standardized formats.
-
get_trip_points()
- Get Trip Points from Pelagic Data Systems API
-
get_trips()
- Retrieve Trip Details from Pelagic Data API
-
ingest_pds_tracks()
- Ingest Pelagic Data Systems (PDS) Track Data
-
ingest_pds_trips()
- Ingest Pelagic Data Systems (PDS) Trip Data
-
ingest_surveys()
- Ingest WCS and WF Catch Survey Data
-
retrieve_surveys()
- Retrieve Surveys from Kobotoolbox
Cloud Storage Management
Functions for interacting with cloud storage providers (Google Cloud Storage, MongoDB), uploading, downloading, and managing data files in various formats.
-
cloud_object_name()
- Retrieve Full Name of Versioned Cloud Object
-
cloud_storage_authenticate()
- Authenticate to a Cloud Storage Provider
-
download_cloud_file()
- Download Object from Cloud Storage
-
download_parquet_from_cloud()
- #' Download Parquet File from Cloud Storage
-
get_metadata()
- Get metadata tables
-
get_preprocessed_surveys()
- Download Preprocessed Surveys
-
get_validated_surveys()
- Download Validated Surveys
-
mdb_collection_pull()
- Retrieve Data from MongoDB
-
mdb_collection_push()
- Upload Data to MongoDB and Overwrite Existing Content
-
upload_cloud_file()
- Upload File to Cloud Storage
-
upload_parquet_to_cloud()
- Upload Processed Data to Cloud Storage
Data Preprocessing
Functions for cleaning, transforming, and structuring raw data into standardized formats ready for analysis, including data nesting, reshaping, and trip processing.
-
calculate_catch()
- Calculate Catch Weight from Length-Weight Relationships or Bucket Measurements
-
generate_track_summaries()
- Generate Grid Summaries for Track Data
-
getLWCoeffs()
- Get Length-Weight Coefficients and Morphological Data for Species
-
get_fao_groups()
- Extract and Format FAO Taxonomic Groups
-
get_length_weight_batch()
- Get Length-Weight and Morphological Parameters for Species (Batch Version)
-
get_species_areas_batch()
- Get FAO Areas for Species (Batch Version)
-
load_taxa_databases()
- Load Taxa Data from FishBase and SeaLifeBase
-
match_species_from_taxa()
- Match Species from Taxa Databases
-
prepare_boat_registry()
- Prepare Boat Registry Data from Metadata
-
preprocess_ba_surveys()
- Pre-process Blue Alliance Surveys
-
preprocess_pds_tracks()
- Preprocess Pelagic Data Systems (PDS) Track Data
-
preprocess_track_data()
- Preprocess Track Data into Spatial Grid Summary
-
preprocess_wcs_surveys()
- Pre-process Zanzibar WCS Surveys
-
preprocess_wf_surveys()
- Pre-process WorldFish Surveys
-
process_species_list()
- Process Species List with Taxonomic Information
-
process_trip_data()
- Process Trip Data with District Information
-
reshape_catch_data()
- Reshape Catch Data with Length Groupings
-
reshape_species_groups()
- Reshape Species Groups from Wide to Long Format
Data Mining & Summarization
Functions for enriching fisheries data with scientific information, taxonomic classification, biological parameters, and creating summary datasets for analysis.
-
calculate_catch()
- Calculate Catch Weight from Length-Weight Relationships or Bucket Measurements
-
expand_taxa()
- Expand Taxonomic Vectors into a Data Frame
-
getLWCoeffs()
- Get Length-Weight Coefficients and Morphological Data for Species
-
get_fao_groups()
- Extract and Format FAO Taxonomic Groups
-
get_length_weight_batch()
- Get Length-Weight and Morphological Parameters for Species (Batch Version)
-
get_species_areas_batch()
- Get FAO Areas for Species (Batch Version)
-
load_taxa_databases()
- Load Taxa Data from FishBase and SeaLifeBase
-
match_species_from_taxa()
- Match Species from Taxa Databases
-
process_species_list()
- Process Species List with Taxonomic Information
-
summarize_data()
- Summarize WorldFish Survey Data
Data Modeling & Analysis
Functions for statistical modeling, fleet activity estimation, and scaling sample-based GPS data to fleet-wide estimates using boat registry information.
-
estimate_fleet_activity()
- Estimate Fleet-Wide Activity from Sample Data
-
calculate_district_totals()
- Calculate District-Level Total Catch and Revenue
-
calculate_monthly_trip_stats()
- Calculate Monthly Trip Statistics by District
-
generate_fleet_analysis()
- Generate Complete Fleet Activity Analysis Pipeline
Data Validation
Functions for validating fisheries data through quality checks, statistical outlier detection, and applying domain-specific validation rules.
-
add_validation_flags()
- Add validation flags to catch data
-
aggregate_survey_data()
- Aggregate survey data and calculate metrics
-
calculate_catch_revenue()
- Calculate catch revenue from validated data
-
extract_trips_info()
- Extract trip information from preprocessed surveys
-
get_catch_bounds()
- Get catch bounds for survey data
-
get_length_bounds()
- Get length bounds for survey data
-
get_validation_status()
- Get Validation Status from KoboToolbox
-
process_catch_data()
- Process catch data from surveys
-
sync_validation_submissions()
- Synchronize Validation Statuses with KoboToolbox
-
update_validation_status()
- Update Validation Status in KoboToolbox
-
validate_ba_surveys()
- Validate Blue Alliance (BA) Surveys Data
-
validate_catches()
- Validate catches using quality flags
-
validate_prices()
- Validate market prices
-
validate_wcs_surveys()
- Validate WCS Surveys Data
-
validate_wf_surveys()
- Validate Wild Fishing Survey Data
Data Export & Visualization
Functions for exporting processed data to MongoDB collections, creating geographic visualizations, and preparing data for portals and reporting.
-
create_geos()
- Generate Geographic Regional Summaries of Fishery Data
-
create_geos_v1()
- Generate Geographic Regional Summaries of Fishery Data (Version 1)
-
export_wf_data()
- Export WorldFish Summary Data to MongoDB
-
kepler_mapper()
- Generate a Kepler.gl map
Pipeline Orchestration
High-level functions that orchestrate complete analysis pipelines, combining multiple processing steps into integrated workflows.
-
generate_fleet_analysis()
- Generate Complete Fleet Activity Analysis Pipeline
-
summarize_data()
- Summarize WorldFish Survey Data
Helper Functions
Utility functions that support the main pipeline operations, providing common data manipulation and processing capabilities.
-
add_version()
- Add timestamp and sha string to a file name
-
read_config()
- Read configuration file