Changelog • coasts

coasts 4.3.0

FIX export_pds_spatial() per-cell metrics — avg_hours_per_day and avg_visits_per_day were divided by the whole study period (n_total_days, ~850+ days), producing values near zero. They now divide by n_active_days (the number of days the cell was actually visited), so the metric matches its name: average fishing hours / trips on days the cell was active. constancy (fraction of study period the cell was active) still uses n_total_days and is unchanged. Same fix applied to derive_fishing_grounds() per-cell metrics.
FIX aggregate_pds_effort() — n_active_days was double-counted on incremental merge whenever the same calendar day was visited by trips from different aggregation runs (common: many boats fishing the same cell daily produce one parquet per trip, batched separately). The grid now stores active_dates as a list-column of Date per cell-year; merges take the unique union, and n_active_days is recomputed from it. derive_fishing_grounds() applies the same union semantics when collapsing years, rolling up to coarser resolutions, and aggregating cells into ground polygons. # coasts 4.2.1
IMPROVEMENT Filter out NAs in countries taxa summary (in export_portal()) to save storage space and loading time

coasts 4.2.0

FIX Fix critical bug in downloading versioned files

coasts 4.1.0

IMPROVEMENT Kenya matched trips now combine surveys from all Kenyan sources, not just KEFS — giving a more complete picture of fishing activity in the country.
FIX Restored the fishing-effort aggregation step of the automated pipeline, which had stopped running on the server due to a missing system component.

coasts 4.0.0

Spatial CPUE Model Pipeline

NEW model_cpue() - Estimates spatial Catch Per Unit Effort (CPUE) by joining matched survey trips with predicted PDS tracks. Supports two estimation methods: "weighted" (direct catch-to-effort ratio, robust for sparse data) and "nnls" (non-negative least squares, for denser datasets). Uploads results as a versioned parquet to cloud storage.
NEW run_weighted_cpue() - Computes CPUE as sum(catch_kg) / sum(fishing_hours) per H3 cell and country.
NEW run_nnls_cpue() - Solves a non-negative least squares system min ||Xq - y||² s.t. q ≥ 0 across all H3 cells simultaneously.
NEW join_effort_catch() - Builds the effort-catch matrix linking per-trip H3 effort vectors with catch records.
NEW load_matched_trips() - Downloads the trips-matched parquet and returns validated catch records for matched PDS trips.
NEW download_predicted_tracks() - Downloads predicted track files for a set of matched trip IDs from the PDS bucket.
NEW prepare_tracks_for_effort() - Projects predicted fishing points into an H3 effort matrix (fishing hours and pings per cell).
NEW get_combined_tbl() - Combines effort and catch into a single analysis table for CPUE modelling.
NEW build_catch_wide() - Pivots catch records to a wide matrix (trips × species) for the NNLS solver.
NEW .finalise_cpue() - Post-processes raw CPUE estimates: adds centroid coordinates, filters cells below min_trips, and attaches country labels.
NEW .top_species() - Selects the top-N species by total catch weight to focus CPUE estimation.

Web-Ready Spatial Export

NEW export_pds_spatial() - Reads H3 effort grid and CPUE parquet files from cloud storage, derives fishing grounds, and uploads three web-ready files for the DeckGL portal: H3 effort JSON, CPUE JSON, and fishing grounds GeoJSON.
NEW derive_fishing_grounds() - Converts an H3 effort grid to a GeoJSON FeatureCollection of discrete fishing ground polygons, enriched with area, constancy, and activity metrics.
NEW aggregate_trip_effort() - Aggregates per-trip H3 effort vectors into a cumulative effort grid across all trips.
NEW plot_effort_map() / plot_cpue_map() - Interactive Leaflet maps for visualising effort and CPUE grids during exploratory analysis. ## Taxa Enrichment
NEW enrich_taxa() - Augments catch records with FishBase and SeaLifeBase taxonomic backbone data (class, order, family, genus) for all species in the matched trips dataset.
NEW get_taxa_backbone() - Queries the GBIF taxonomic backbone to resolve species names to canonical taxonomy.
NEW expand_taxonomic_info() - Expands the taxa lookup table with full higher classification.

Bug Fixes

FIX aggregate_pds_effort() - Manifest was silently uploaded to a temp-dir GCS path instead of the correct {grid_prefix}/aggregated_manifest.rds key, causing incremental processing to always rebuild the entire grid from scratch. Fixed by passing name = manifest_name explicitly to upload_cloud_file() in both the main and early-return paths.
FIX model_cpue() - Removed dead code left from an earlier refactor (map_effort, map_cpue, out_dir block) that caused an R error at runtime: “object ‘map_effort’ not found”.
FIX export_pds_spatial() - No longer crashes with a cryptic 404 when the effort grid parquet does not yet exist in GCS (e.g. first run or after manual deletion). The function now logs a warning and returns early, matching the existing behaviour for the CPUE file.
FIX HTTP/2 PROTOCOL_ERROR failures on GCS uploads in CI — upload_cloud_file() now calls cloud_storage_authenticate(force = TRUE) unconditionally before every upload. Service-account tokens expire after 1 hour; long upstream jobs (e.g. predict_pds_tracks) can exhaust this window, causing gargle (which uses httr2) to attempt a mid-flight token refresh over a stale HTTP/2 connection. Forcing fresh re-auth before the upload avoids this path entirely.

CI / Workflow

Merged predict-pds-tracks and aggregate-pds-effort pipeline jobs into a single job — they are always sequential and sharing a container saves startup overhead.
Deleted superseded model-tracks.yaml workflow (its steps are fully covered by data-pipeline.yaml).
Fixed pkgdown.yaml deploy step: added required environment: name: github-pages block (needed by actions/deploy-pages@v4); bumped actions/upload-pages-artifact to @v4 (native Node 24 support); added Changelog to pkgdown navbar.

Naming & Versioning Coherence

CPUE parquet files are now stored under pds-cpue_r{h3_res} (e.g. pds-cpue_r9) to match the effort grid naming convention (predicted-pds-h3_grid_r9). This ensures that running the pipeline at different H3 resolutions never silently mixes effort and CPUE data from different resolutions.
Portal CPUE JSON files follow the same pattern: pds-cpue-r{h3_res}__timestamp__json.
inst/conf.yml portal.cpue.file_prefix updated from pds-cpue to pds-cpue-r.

Documentation & Website

Vignettes (pipeline.Rmd, metrics-and-models.Rmd) moved from project root to vignettes/ so pkgdown can discover them correctly.
pkgdown CI workflow (pkgdown.yaml) fixed: system dependencies (GDAL, GEOS, PROJ, udunits2) now installed before r-lib/actions/setup-r-dependencies@v2.
_pkgdown.yml articles section re-enabled now that vignettes are in the correct location.

coasts 3.0.1

Align export functions according to countries API schema

coasts 3.0.0

Fishing Activity Prediction Pipeline

A new end-to-end pipeline for classifying GPS boat tracks into fishing and non-fishing activity using the ssfaitk statistical model, and aggregating the results into spatial effort maps.

New Workflow Functions

NEW predict_pds_tracks() - Downloads GPS tracks for all active vessels, applies the ssfaitk fishing activity model to each trip, and uploads fishing-only point files to cloud storage. Implements version-aware incremental processing: trips already classified with the current model version are skipped, and files from outdated model versions are automatically replaced when the model is updated.
NEW aggregate_pds_effort() - Consolidates all classified fishing tracks into a single H3 hexagonal grid representing cumulative fishing effort across the fleet. Counts fishing pings and unique trips per cell and uploads the grid as a versioned parquet file ready for portal consumption.

New Spatial Analysis Utilities

NEW assign_h3_indices() - Maps GPS coordinates to H3 hexagon cell IDs at any resolution
NEW aggregate_h3_effort() - Summarises fishing pings and unique vessel counts per H3 cell
NEW rollup_h3_resolution() - Re-aggregates effort from a fine H3 resolution to any coarser level for multi-scale analysis
NEW create_spatial_grid() - Converts an H3 effort table to an sf polygon object for mapping
NEW prep_fishing_points() - Projects raw GPS coordinates to a metric CRS for distance-based spatial operations
NEW create_reference_grid() - Generates a deterministic square or hexagonal reference grid over a study area
NEW aggregate_daily_effort() - Counts fishing pings per reference grid cell via spatial join

Automated Pipeline

NEW GitHub Actions workflow (model-tracks.yaml) - Runs the full fishing activity prediction and effort aggregation pipeline every two days. Always fetches the latest ssfaitk model version at runtime, so improvements to the underlying model are picked up automatically without rebuilding the Docker image.

Infrastructure

ENHANCED Docker image - Added Python environment support required by the ssfaitk model, including automatic Python path configuration for reticulate

coasts 2.2.7

IMPROVEMENT Use scientific names rather than FAO alpha3 codes for dashboard data

coasts 2.2.6

FIX Update PDS ingestion and preprocessing according to new config paths

coasts 2.2.5

FIX Clarify resolve_storage_opts() arguments

coasts 2.2.4

FIX Export resolve_storage_opts()

coasts 2.2.3

FIX Refine pds ingestion functions to improve comatibility with country pipelines

coasts 2.2.2

NEW Upgrade and optimize all functions related to pds ingestion and preprocessing in order to be compatible with current countries pipelines. This means countries data flows follow the same data processing enhancing processes mangeabiity and data consistency

coasts 2.2.1

Minor fix

Add “version” argument to download_parquet_from_cloud()

coasts 2.2.0

New Features

NEW Upgrade and optimize all functions related to storage (google cloud and mogodb auth, download and upload). These will then replace the exsisting related functions for all the countries pipeline for improved manageability and centralization of common processes # coasts 2.1.0

New Features

NEW get_kobo_data() - upgraded function to pull data from kobotoolbox according to Kobo API changes. The new function will replace the exisitn g data pulling process in all the pipeline for improved manageability and centralization of common processes
BUG FIX Fixed bug related to the automatic generation of credentials of Peskas Tracks App.

coasts 2.0.0

Refactoring

Optmize package to export and process data from countries pipelines

coasts 1.5.0

New Features

Survey & Fleet Analysis Pipeline

NEW summarize_data() - End-to-end summarization of WorldFish survey data into five output tables (monthly, taxa, district, gear, grid summaries) uploaded to cloud storage as versioned parquet files
NEW calculate_fishery_metrics() - Transforms catch-level records into normalized fishery indicators (site-level CPUE/RPUE, predominant gear, species composition) in long format for portal consumption
NEW generate_fleet_analysis() - Orchestrates full fleet activity estimation pipeline and uploads aggregated results to cloud storage
NEW prepare_boat_registry() - Constructs a boat registry from asset metadata for scaling GPS-tracked data to fleet-wide estimates
NEW process_trip_data() - Processes PDS API trip records by device IMEI into per-trip summaries
NEW calculate_monthly_trip_stats() - Aggregates trip data to monthly statistics per district
NEW estimate_fleet_activity() - Scales GPS-sampled trips to fleet-wide activity estimates using boat registry sampling rates
NEW calculate_district_totals() - Joins fleet estimates with survey summaries to produce district-level catch and revenue totals

Data Export

NEW export_portal() - Downloads WorldFish summary datasets from cloud storage, joins modelled aggregate estimates, pivots to long format, and uploads all tables to MongoDB portal collections

Enhancements

Multi-Package Architecture

ENHANCED read_config() - Added package argument (default "coasts"). Downstream packages that ship their own inst/conf.yml can now call read_config(package = "mypackage") to load their own configuration instead of the coasts defaults
ENHANCED All 12 top-level pipeline functions now accept a package argument threaded through to read_config(): ingest_pds_trips(), ingest_pds_tracks(), backup_tracks(), ingest_assets(), preprocess_pds_tracks(), merge_survey_trips(), get_metadata(), summarize_data(), export_geos(), export_fishers_stats(), export_portal(), generate_fleet_analysis()

Automated Workflows

ENHANCED app-usage-report.yaml, sync-devices-users.yaml, tracks-backup.yaml - All jobs now carry an explicit if: github.ref == 'refs/heads/main' guard, ensuring workflows triggered via workflow_dispatch on non-main branches are safely skipped

Package Infrastructure

ENHANCED DESCRIPTION - Migrated Author/Maintainer fields to Authors@R: person(...) format (fixes R CMD check WARNING); removed spurious LazyData: true (no data/ directory); added URL and BugReports fields pointing to the GitHub repository
ENHANCED _pkgdown.yml - Added new “Survey & Fleet Analysis” reference section; added export_portal() to “Data Export & Storage”; removed two non-exported internal helpers that would have caused build errors

coasts 1.4.0

New Features

GPS-Survey Trip Matching

NEW merge_survey_trips() - Downloads matched GPS and survey trip data across regions, harmonizes columns, and combines into a single dataset

Automated Workflows

ENHANCED GitHub Actions data pipeline with new match-trips job

Multi-Bucket Regional Storage

ENHANCED download_parquet_from_cloud() and upload_parquet_to_cloud() - Added bucket_name parameter to download/upload from regional buckets (Kenya, Mozambique, Zanzibar)
ENHANCED inst/conf.yml - Regional bucket configuration with environment-specific bucket names (dev vs prod)

Code Organization

Refactored PDS ingestion and API functions into dedicated files (R/ingestion-pds.R, R/pds-api.R)
Updated pkgdown reference index with tracks app and preprocessing sections

coasts 1.3.0

Fisher Performance Analytics

NEW export_fishers_stats() - Comprehensive fisher performance analysis and export
- Integrates catch events from tracks-app with GPS tracking data from PDS API
- Matches fisher-reported landings with automated trip tracking by date and device
- Calculates fishing efficiency metrics: CPUE (kg/hour, kg/km), search efficiency ratios
- Estimates fuel consumption and catch per liter efficiency
- Categorizes trips by distance (nearshore, mid-range, offshore)
- Exports aggregated fisher statistics and trip-level performance metrics to MongoDB

Automated Workflows

ENHANCED GitHub Actions data pipeline workflow
- Added export-fishers-stats job to automated pipeline
- Runs after track preprocessing to ensure data availability
- Automatically exports fisher performance data on every pipeline run

Development Experience

NEW .Rprofile - Interactive environment switching for local development
- Added helper functions: use_prod(), use_local(), use_default()
- Visual environment indicator on R session startup
- Quick commands reference displayed in interactive sessions
- Simplified testing across different configuration profiles

Configuration Updates

MongoDB Collections

ENHANCED tracks-app MongoDB configuration in inst/conf.yml
- Added fishers-stats collection for aggregated fisher summaries
- Added fishers-performance collection for trip-level efficiency metrics
- Improved data organization for analytics and reporting

coasts 1.2.0

Breaking Changes

Configuration System Migration

BREAKING CHANGE - Migrated from auth folder to .env-based credentials management
- Removed local configuration profile from inst/conf.yml
- All environments now use environment variables loaded via .env file in local development
- Added dotenv package dependency for automatic .env file loading
- Created .env.example template with all required environment variables
- Updated .gitignore to properly handle .env files while tracking .env.example
- Migration Guide: Copy .env.example to .env and fill in credentials (see updated README)

New Features

Asset Management

ENHANCED ingest_assets() - Comprehensive fisheries asset metadata ingestion
- Added log_threshold parameter for configurable logging
- Now includes PDS device metadata from Airtable (pds_devices table)
- Retrieves 6 asset types: taxa, gear, vessels, landing sites, forms, and devices
- Changed output format from parquet to RDS for better R object serialization
- Added complete roxygen documentation following package standards

Data Ingestion Improvements

ENHANCED ingest_pds_trips() - Improved trip data ingestion workflow
- Now downloads device metadata from cloud storage instead of Google Sheets
- Filters devices by last_seen date (>= 2023-01-01) for active devices only
- Enhanced PDS API calls with deviceInfo and withLastSeen parameters
- Client-side IMEI filtering for reliable data retrieval
- Updated documentation with detailed configuration examples and notes

Documentation

Package Documentation

ENHANCED README with .env-based configuration instructions
- Added step-by-step local development setup guide
- Documented all required environment variables with descriptions
- Separated local and production deployment instructions
UPDATED CLAUDE.md with new configuration system details
- Revised Configuration System section to explain .env approach
- Updated Configuration Requirements with clear setup steps
- Added dotenv to Key Dependencies section
NEW .env.example - Template file for local development credentials
- Includes all 12 required environment variables with helpful comments
- Proper formatting examples for complex values (JSON keys, connection strings)

Function Documentation

ENHANCED ingest_assets() with comprehensive roxygen documentation
- Detailed @description with step-by-step operations
- Complete @details section with YAML configuration structure
- Asset type descriptions (Taxa, Gear, Vessels, Landing Sites, Forms, Devices)
- Added @examples, @seealso, and @keywords tags

Technical Improvements

Configuration Loading

ENHANCED read_config() function in R/utils.R
- Automatic detection and loading of .env file if present
- Seamless integration with existing config::get() workflow
- Informative logging when .env file is loaded

Development Experience

Simplified credentials management for local development
Consistent approach with other peskas packages (e.g., peskas.kenya.data.pipeline)
Improved security with proper .gitignore configuration
Easier onboarding for new developers with template file

coasts 1.1.0

NEW - Integrate (Beta) Cabo Delgado (Mozambique) estimates
NEW - Ddeveloping code to integrate catch events records from tracks-app

coasts 1.0.0

Major New Features

Airtable Integration System

NEW airtable_to_df() - Convert Airtable tables to R data frames with pagination support
NEW df_to_airtable() - Create new records in Airtable tables with batch processing
NEW bulk_update_airtable() - Update multiple Airtable records efficiently (10 record batches)
NEW update_airtable_record() - Update individual Airtable records
NEW get_writable_fields() - Identify writable fields in Airtable tables (excludes computed fields)
NEW device_sync() - Comprehensive sync function for device data (updates existing, creates new)
NEW ingest_pelagic_boats() - Complete workflow for PDS boat data ingestion and Airtable sync
NEW sync_device_users() - Sync device users to MongoDB with password generation and Airtable updates

Enhanced PDS API Integration

NEW pelagic_auth() - Authentication with Pelagic Analytics API
NEW pelagic_refresh_token() - Token refresh functionality for sustained API access
NEW get_pelagic_boats() - Retrieve boat information with server-side filtering and column selection
NEW get_pelagic_devices() - Retrieve device information with advanced filtering capabilities
Enhanced ingest_pds_tracks() with improved error handling and parallel processing

Automated Workflows

NEW GitHub Actions workflow: ingest-pelagic-boats.yaml (runs every 15 days)
NEW GitHub Actions workflow: sync-device-users.yaml (runs every 10 days)
Enhanced main data pipeline workflow with improved container management

Configuration System Improvements

BREAKING CHANGE Restructured MongoDB configuration to support dual databases:
- mongodb.coasts_portal - For main coasts geospatial data
- mongodb.tracks_app - For tracks application user data
BREAKING CHANGE Enhanced Airtable configuration with separate base IDs:
- airtable.frame - For device and country metadata
- airtable.tracks_app - For user management
Updated environment variable requirements for production deployments

Documentation and Development

NEW Professional pkgdown website with enhanced theming and navigation
Enhanced README with status badges and improved structure
Fixed pkgdown configuration issues with pipe operators and tidy evaluation functions
Updated function documentation with detailed examples and use cases

Bug Fixes and Improvements

Data Processing

Fixed KES to USD conversion units in export_geos()
Improved MongoDB collection references to use new dual-database configuration
Enhanced error handling in data ingestion functions
Better logging and progress tracking across all functions

API and Authentication

Robust token refresh mechanisms for long-running processes
Improved error messages for authentication failures
Server-side filtering for PDS API calls to reduce data transfer

Workflow and Deployment

Streamlined Docker image build process with better caching
Enhanced GitHub Actions workflows with proper credential management
Improved container registry integration

Technical Improvements

Password generation system for new users with reproducible seeding
Comprehensive data validation and duplicate handling
Enhanced country mapping for global fisheries data (13 countries supported)
Improved spatial data processing with WGS84 coordinate system standardization
Advanced MongoDB operations with geospatial indexing (2dsphere)

Geographic Coverage Expansion

Enhanced support for multi-country deployments
Improved regional data harmonization
Currency conversion support for multiple regions (KES, TZS to USD)

coasts 0.1.0

Initial release of the coastal fisheries data pipeline for Western Indian Ocean region.

New Features

Data Ingestion

ingest_pds_trips() - Automated ingestion of GPS boat trip data from Pelagic Data Systems (PDS) API
ingest_pds_tracks() - Parallel processing of detailed GPS track data with batch processing capabilities
get_metadata() - Retrieval of fishery metadata from Google Sheets

Data Preprocessing

preprocess_pds_tracks() - Spatial gridding and summarization of fishing activity patterns
Multi-scale spatial analysis support (100m, 250m, 500m, 1000m grid cells)
Parallel processing for efficient handling of large datasets
preprocess_track_data() - Core function for converting GPS tracks to spatial grid summaries

Data Export and Storage

export_geos() - Comprehensive export of geospatial data and regional metrics to MongoDB
MongoDB integration with 2dsphere geospatial indexing
Currency conversion for Kenya (KES to USD) and Zanzibar (TZS to USD) economic indicators
Support for regional boundary data and time series metrics

Cloud Storage Integration

upload_cloud_file() and download_cloud_file() - Google Cloud Storage integration
cloud_object_name() - Versioned object naming and retrieval
upload_parquet_to_cloud() and download_parquet_from_cloud() - Optimized parquet file handling
Automatic file compression using LZ4 algorithm

Database Operations

mdb_collection_push() and mdb_collection_pull() - MongoDB collection management
Geospatial indexing support for spatial queries
Bulk data operations with error handling

API Integration

get_trips() - PDS API integration for trip data retrieval
get_trip_points() - Detailed GPS point data from PDS API
Authentication and token management for external APIs

Automation and Workflow

GitHub Actions workflow for automated data pipeline execution
Runs every 2 days with complete data processing pipeline
Docker containerization for reproducible execution environment
Configuration management through conf.yml files

Geographic Coverage

Kenya coastal fisheries data processing
Zanzibar fisheries data integration
Regional harmonization and standardization

Technical Features

Parallel processing using future and furrr packages
Efficient data formats using Apache Arrow/Parquet
Comprehensive logging with configurable thresholds
Error handling and recovery mechanisms
Versioned data management system