Changelog
peskas.zanzibar.data.pipeline 4.3.0
New Features
-
Survey-GPS Trip Matching Pipeline: Added comprehensive fuzzy matching system to link catch survey records with GPS trip data
- New
merge_trips()workflow function for Kenya and Zanzibar sites -
match_surveys_to_gps_trips(): Universal two-step matching (surveys → registry → trips) - Fuzzy string matching using Levenshtein distance on registration numbers, boat names, and fisher names
- Conservative one-trip-per-day constraint to avoid ambiguous matches
- Support for both explicit device registries (Kenya) and implicit registries built from trip data (Zanzibar)
- Configurable matching thresholds (registration: 15%, names: 25% difference allowed)
- Exports merged dataset with matched pairs plus all unmatched surveys and trips
- Helper functions:
standardize_column_names(),clean_matching_fields(),build_registry_from_trips()
- New
Improvements
-
PDS Data Ingestion:
- Updated
ingest_pds_trips()to load device registry from cloud storage instead of Airtable metadata - Added device info retrieval and proper filtering for Zanzibar devices
- Improved configuration variable naming (pars → conf)
- Updated
-
GitHub Actions Workflow:
- Added new
merge-tripsjob to automated pipeline - Runs after survey preprocessing and before summarization
- Integrated with production environment configuration
- Added new
peskas.zanzibar.data.pipeline 4.2.0
New Features
-
API Data Export Pipeline: Added new
export_api_raw()function to export raw preprocessed survey data in API-friendly format- Exports raw/preprocessed trip data (before validation) to cloud storage
- Part of a two-stage API export pipeline (raw and validated exports)
- Transforms nested survey data into flat structure with standardized trip-level records
- Generates unique trip IDs using xxhash64 algorithm
- Integrates with Airtable metadata for form-specific asset lookups
- Exports versioned parquet files to
zanzibar/raw/path for external API consumption - Includes comprehensive output schema with 14 standardized fields (trip_id, landing_date, gear, catch metrics, etc.)
-
Airtable Integration: New helper functions for managing Airtable metadata and form configurations
-
get_airtable_form_id(): Retrieves Airtable record IDs from KoBoToolbox asset IDs -
airtable_to_df(): Downloads complete Airtable tables with automatic pagination handling -
get_writable_fields(): Identifies updatable fields in Airtable tables (excludes computed fields) -
update_airtable_record(): Updates individual records with field validation -
bulk_update_airtable(): Batch updates multiple records efficiently (up to 10 records per request) -
device_sync(): Synchronizes GPS device metadata between Airtable and MongoDB
-
Improvements
-
Configuration Enhancements:
- Added
apiconfiguration section for trip data exports with separate raw/validated paths - Configured cloud storage paths for API exports (zanzibar/raw, zanzibar/validated)
- Added Airtable base ID and token configuration for metadata management
- Enhanced
options_apistorage configuration for peskas-coasts bucket
- Added
-
GitHub Actions Workflow:
- Added new
export-api-datajob to automated pipeline workflow - Integrated Airtable authentication with GitHub Secrets (AIRTABLE_TOKEN, AIRTABLE_BASE_ID_FRAME, AIRTABLE_BASE_ID_ASSETS)
- Configured API export job to run after survey preprocessing step
- Added production environment configuration for API data exports
- Added new
-
Code Quality:
- Improved documentation with comprehensive roxygen2 comments for all new functions
- Added detailed examples and cross-references in function documentation
- Enhanced error handling and input validation in Airtable operations
- Implemented proper cleanup of temporary local files after cloud uploads
peskas.zanzibar.data.pipeline 4.1.1
Major Changes
-
Streamlined Validation Workflow: Replaced KoboToolbox API updates with direct MongoDB storage to improve performance.
- New
export_validation_flags()function exports validation flags directly to MongoDB - Validation status queries now only identify manually edited submissions, not update them
- Disabled
sync_validation_submissions()workflow steps in GitHub Actions - Significantly reduced pipeline execution time by avoiding slow KoboToolbox API calls
- New
Improvements
-
Validation System:
- Validation functions now preserve manual human approvals while updating system-generated statuses
-
Code Quality:
- Fixed SeaLifeBase API calls by pinning to version 24.07 to avoid server errors
- Standardized function parameter formatting across validation and preprocessing modules
peskas.zanzibar.data.pipeline 4.1.0
Major Changes
-
Integration of New KoBoToolbox Survey Form Version: Added support for a new version of the WorldFish survey form (
wf_surveys_v2) alongside the existing form (wf_surveys_v1). Data from both survey versions is now processed together in the preprocessing pipeline and handled properly throughout the validation workflow.
Improvements
-
Multi-Asset Validation Support:
- Updated validation system to query approval statuses from both survey form versions
- Enhanced
validate_wf_surveys()andsync_validation_submissions()to handle submissions from multiple KoBoToolbox assets - Ensured manually approved submissions from either form version are protected from automated flagging
-
Configuration Updates:
- Added configuration for the new survey form version with shared credentials
- Cleaned up redundant configuration entries
- Updated code references to use versioned asset configurations
peskas.zanzibar.data.pipeline 4.0.0
Major Changes
- Fleet Activity Analysis Pipeline: Introduced a comprehensive pipeline for estimating and analyzing fishing fleet activity using GPS-tracked boats and boat registry data. This includes new functions for preparing boat registries, processing trip data, calculating monthly trip statistics, estimating fleet-wide activity, and calculating district-level total catch and revenue.
-
New Modeling and Summarization Functions:
-
prepare_boat_registry(): Summarizes boat registry data by district. -
process_trip_data(): Processes trip data with district information and filters outliers. -
calculate_monthly_trip_stats(): Computes monthly fishing activity statistics by district. -
estimate_fleet_activity(): Scales up sample-based trip statistics to fleet-wide estimates. -
calculate_district_totals(): Combines fleet activity and catch data for district-level totals. -
generate_fleet_analysis(): Orchestrates the full analysis pipeline and uploads results. -
summarize_data(): Generates and uploads summary datasets (monthly, taxa, district, gear, grid) for WorldFish survey data.
-
-
Enhanced Data Export and Integration:
-
export_wf_data(): Exports summarized WorldFish survey data and modeled estimates to MongoDB, including new geographic regional summaries. -
create_geos(): Generates geospatial regional summaries and exports as GeoJSON for spatial visualization.
-
- Expanded Documentation: New and updated Rd files for all major new functions, with improved examples and cross-references.
Improvements
- Consistent Time Series and Grouping: All summary tables (taxa, districts, gear) now include a ‘date’ (monthly) column and are grouped by month, with missing months filled as NA for consistent time series exports.
-
Parallel Processing: Improved use of parallelization (via
futureandfurrr) for validation and summarization steps, enhancing performance for large datasets. -
Data Quality and Validation:
- Enhanced filtering and validation of survey data before summarization and export.
- Improved handling of flagged/invalid submissions.
peskas.zanzibar.data.pipeline 3.3.0
Major Changes
- All summary tables (taxa, districts, gear) now include a ‘date’ (monthly) column and are grouped by month. Missing months are filled as NA for consistent time series exports.
peskas.zanzibar.data.pipeline 3.1.0
New Features
- Added
create_geos()function to generate geospatial regional summaries of fishery data - Added support for GPS track data visualization through new grid-based analytics
- Added
generate_track_summaries()function to process GPS tracks into 1km grid cells
peskas.zanzibar.data.pipeline 2.6.0
peskas.zanzibar.data.pipeline 2.5.0
Major Changes
- Enhanced validation workflow with KoboToolbox integration:
- Added
update_validation_status()function to update submission status via API - Added
sync_validation_submissions()for parallel processing of validation flags - Updated Kobo URL endpoint from kf.kobotoolbox.org to eu.kobotoolbox.org
- Added
New Features
- Implemented parallel processing for validation operations using future/furrr packages
- Added progress reporting during validation operations via progressr package
- Enhanced validation status synchronization between local system and KoboToolbox
Improvements
- Updated data preprocessing to handle flying fish estimates and taxa corrections (TUN→TUS, SKH→CVX)
- Updated export workflow to use validation status instead of flags for data filtering
- Added taxa information to catch export data
- Added Zanzibar SSF report template with visualization examples
- Improved package documentation structure with better categorization
peskas.zanzibar.data.pipeline 2.4.0
Major Changes
- Implemented support for multiple survey data sources:
- Refactored
get_validated_surveys()to handle WCS, WF, and BA sources - Added source parameter to specify which datasets to retrieve
- Improved handling of data sources with different column structures
- Refactored
New Features
- Added
export_wf_data()function for WorldFish-specific data export - Enhanced validation with additional composite metrics:
- Price per kg validation
- CPUE (Catch Per Unit Effort) validation
- RPUE (Revenue Per Unit Effort) validation
Improvements
- Added min_length parameter for better length validation thresholds
- Updated LW coefficient filtering logic in model-taxa.R
- Enhanced alert flag handling with combined flags from different validation steps
- Improved catch price and catch weight handling for zero-catch outcomes
- Enhanced data preprocessing with better field type conversion
peskas.zanzibar.data.pipeline 2.3.0
Major Changes
- Enhanced KoboToolbox integration:
- Implemented new validation status retrieval from KoboToolbox API
- Updated validation workflow to incorporate submission validation status
- Improved data validation process through direct API integration
New Features
- New KoboToolbox interaction functions:
-
get_validation_status(): Retrieves submission validation status from KoboToolbox API
-
peskas.zanzibar.data.pipeline 2.2.0
Major Changes
- Completely restructured taxonomic data processing:
- Introduced new modular functions for taxa handling in model-taxa.R
- Added efficient batch processing for species matching
- Implemented optimized FAO area retrieval system
- Streamlined length-weight coefficient calculations
- Enhanced integration with FishBase and SeaLifeBase
New Features
- New taxonomic processing functions:
-
load_taxa_databases(): Unified database loading from FishBase and SeaLifeBase -
process_species_list(): Enhanced species list processing with taxonomic ranks -
match_species_from_taxa(): Improved species matching across databases -
get_species_areas_batch(): Efficient FAO area retrieval -
get_length_weight_batch(): Optimized length-weight parameter retrieval
-
Improvements
- Enhanced performance through batch processing
- Reduced API calls to external databases
- Better error handling and input validation
- More comprehensive documentation
- Improved code organization and modularity
peskas.zanzibar.data.pipeline 2.1.0
Major Changes
- Enhanced taxonomic and catch data processing capabilities:
- Added comprehensive functions for species and catch data processing
- Implemented length-weight coefficient retrieval from FishBase and SeaLifeBase
- Created functions for calculating catch weights using multiple methods
- Added new data reshaping utilities for species and catch information
- Extended Wild Fishing (WF) survey validation with detailed quality checks
- Updated cloud storage and data download/upload functions
peskas.zanzibar.data.pipeline 2.0.0
Major Changes
- Complete overhaul of the data pipeline architecture
- Added PDS (Pelagic Data Systems) integration:
- New trip ingestion and preprocessing functionality
- GPS track data processing capabilities
- Implemented MongoDB export and storage functions
- Removed renv dependency management for improved reliability
- Updated Docker configuration for more robust builds
peskas.zanzibar.data.pipeline 1.0.0
peskas.zanzibar.data.pipeline 0.2.0
New features
Added the validation step and updated the preprocessing step for wcs kobo surveys data, see preprocess_wcs_surveys() and validate_wcs_surveys() functions. Currently, validation for catch weight, length and market values are obtained using median absolute deviation method (MAD) leveraging on the k parameters of the univOutl::LocScaleB function.
In order to accurately spot any outliers, validation is performed based on gear type and species.
N.B. VALIDATION PARAMETERS ARE NOT YET TUNED
peskas.zanzibar.data.pipeline 0.1.0
Drop parent repository code (peskas.timor.pipeline), add infrastructure to download WCS survey data and upload it to cloud storage providers