Skip to contents

peskas.mozambique.data.pipeline 1.0.0

Major Changes

  • GPS Tracking Integration with Pelagic Data Systems (PDS): Full support for vessel tracking data ingestion and preprocessing.
  • Airtable Integration Module: Complete suite of functions for two-way synchronization with Airtable.
  • Comprehensive Data Validation Framework: Implemented multi-stage validation adapted from Peskas Zanzibar pipeline.
    • Redesigned validate_landings() with 10 validation flags across two stages
    • Stage 1: Basic data quality checks (form completeness, catch info, length validation, bucket/individual counts)
    • Stage 2: Composite economic indicators (price per kg, CPUE, RPUE) following Zanzibar thresholds
    • Created modular validation functions: validate_catch_taxa(), validate_price(), validate_total_catch()
    • Validation results exclude flagged submissions from final dataset while preserving flags for monitoring
  • Taxa Modeling and Species Intelligence: New module for automated species identification and biological data enrichment.

Improvements

  • Storage System Enhancements:
  • Configuration Management:
    • Switched to dotenv package for environment variable management
    • Added load_dotenv() function with configurable .env file paths
    • Updated read_config() to automatically load environment variables
    • Expanded configuration schema to support PDS, Airtable, and multi-cloud storage
    • Added support for separate storage buckets for different data types (surveys vs. tracks)
  • Data Preprocessing Pipeline:
    • Enhanced preprocess_landings() with metadata table joins (landing sites, boats, enumerators)
    • Implemented process_species_group() for handling species group disaggregation
    • Added species validation and enrichment with FishBase/SeaLifeBase data
    • Integrated length-weight conversion using local coefficient database
    • Added habitat information from species area data
    • Improved catch weight calculation with multiple estimation methods
  • Export Functionality:
    • Expanded export_landings() to generate multiple analytical outputs
    • Added calculate_fishery_metrics() for aggregated statistics
    • Created MongoDB portal collections for dashboard integration
    • Implemented trip-level summarization for GPS track data
    • Enhanced data transformation for consumption by visualization tools
  • Workflow Automation:
    • Added GitHub Actions workflow for automated releases (release.yaml)
    • Updated data pipeline workflow with improved error handling and notifications
    • Integrated cloud authentication in CI/CD pipeline
    • Added support for scheduled and manual workflow triggers

Bug Fixes

  • Fixed price validation logic that was incorrectly flagging valid entries (#PR/issue reference if applicable)
  • Corrected global variable bindings in validation functions to prevent R CMD check warnings
  • Removed invalid geo parameter from mdb_collection_push() function call
  • Fixed customer_name and submission_id variable scoping issues using .data$ notation

Infrastructure & Dependencies

  • Added new package dependencies: furrr, future, glue, readr for enhanced functionality
  • Updated .Rbuildignore to exclude development files (.env, .claude, CLAUDE.md)
  • Package now passes R CMD check with no warnings or notes
  • Improved documentation coverage with 34 new exported functions
  • Enhanced type safety and code consistency throughout codebase

peskas.mozambique.data.pipeline 0.2.0

New features

  • Updates data ingestion and preprocessing workflows
  • Renames ingest_surveys to ingest_landings
  • Adds new metadata joins and data transformations in preprocess_landings
  • Introduces calculate_catch function for catch weight estimation
  • Updates configuration to include Google Cloud Storage and additional metadata tables

peskas.mozambique.data.pipeline 0.1.0

  • Initial CRAN submission.