Changelog
Source:NEWS.md
coasts 4.3.0
FIX
export_pds_spatial()per-cell metrics —avg_hours_per_dayandavg_visits_per_daywere divided by the whole study period (n_total_days, ~850+ days), producing values near zero. They now divide byn_active_days(the number of days the cell was actually visited), so the metric matches its name: average fishing hours / trips on days the cell was active.constancy(fraction of study period the cell was active) still usesn_total_daysand is unchanged. Same fix applied toderive_fishing_grounds()per-cell metrics.FIX
aggregate_pds_effort()—n_active_dayswas double-counted on incremental merge whenever the same calendar day was visited by trips from different aggregation runs (common: many boats fishing the same cell daily produce one parquet per trip, batched separately). The grid now storesactive_datesas a list-column ofDateper cell-year; merges take the unique union, andn_active_daysis recomputed from it.derive_fishing_grounds()applies the same union semantics when collapsing years, rolling up to coarser resolutions, and aggregating cells into ground polygons. # coasts 4.2.1IMPROVEMENT Filter out NAs in countries taxa summary (in
export_portal()) to save storage space and loading time
coasts 4.1.0
- IMPROVEMENT Kenya matched trips now combine surveys from all Kenyan sources, not just KEFS — giving a more complete picture of fishing activity in the country.
- FIX Restored the fishing-effort aggregation step of the automated pipeline, which had stopped running on the server due to a missing system component.
coasts 4.0.0
Spatial CPUE Model Pipeline
-
NEW
model_cpue()- Estimates spatial Catch Per Unit Effort (CPUE) by joining matched survey trips with predicted PDS tracks. Supports two estimation methods:"weighted"(direct catch-to-effort ratio, robust for sparse data) and"nnls"(non-negative least squares, for denser datasets). Uploads results as a versioned parquet to cloud storage. -
NEW
run_weighted_cpue()- Computes CPUE assum(catch_kg) / sum(fishing_hours)per H3 cell and country. -
NEW
run_nnls_cpue()- Solves a non-negative least squares systemmin ||Xq - y||² s.t. q ≥ 0across all H3 cells simultaneously. -
NEW
join_effort_catch()- Builds the effort-catch matrix linking per-trip H3 effort vectors with catch records. -
NEW
load_matched_trips()- Downloads thetrips-matchedparquet and returns validated catch records for matched PDS trips. -
NEW
download_predicted_tracks()- Downloads predicted track files for a set of matched trip IDs from the PDS bucket. -
NEW
prepare_tracks_for_effort()- Projects predicted fishing points into an H3 effort matrix (fishing hours and pings per cell). -
NEW
get_combined_tbl()- Combines effort and catch into a single analysis table for CPUE modelling. -
NEW
build_catch_wide()- Pivots catch records to a wide matrix (trips × species) for the NNLS solver. -
NEW
.finalise_cpue()- Post-processes raw CPUE estimates: adds centroid coordinates, filters cells belowmin_trips, and attaches country labels. -
NEW
.top_species()- Selects the top-N species by total catch weight to focus CPUE estimation.
Web-Ready Spatial Export
NEW
export_pds_spatial()- Reads H3 effort grid and CPUE parquet files from cloud storage, derives fishing grounds, and uploads three web-ready files for the DeckGL portal: H3 effort JSON, CPUE JSON, and fishing grounds GeoJSON.NEW
derive_fishing_grounds()- Converts an H3 effort grid to a GeoJSONFeatureCollectionof discrete fishing ground polygons, enriched with area, constancy, and activity metrics.NEW
aggregate_trip_effort()- Aggregates per-trip H3 effort vectors into a cumulative effort grid across all trips.NEW
plot_effort_map()/plot_cpue_map()- Interactive Leaflet maps for visualising effort and CPUE grids during exploratory analysis. ## Taxa EnrichmentNEW
enrich_taxa()- Augments catch records with FishBase and SeaLifeBase taxonomic backbone data (class, order, family, genus) for all species in the matched trips dataset.NEW
get_taxa_backbone()- Queries the GBIF taxonomic backbone to resolve species names to canonical taxonomy.NEW
expand_taxonomic_info()- Expands the taxa lookup table with full higher classification.
Bug Fixes
-
FIX
aggregate_pds_effort()- Manifest was silently uploaded to a temp-dir GCS path instead of the correct{grid_prefix}/aggregated_manifest.rdskey, causing incremental processing to always rebuild the entire grid from scratch. Fixed by passingname = manifest_nameexplicitly toupload_cloud_file()in both the main and early-return paths. -
FIX
model_cpue()- Removed dead code left from an earlier refactor (map_effort,map_cpue,out_dirblock) that caused an R error at runtime: “object ‘map_effort’ not found”. -
FIX
export_pds_spatial()- No longer crashes with a cryptic 404 when the effort grid parquet does not yet exist in GCS (e.g. first run or after manual deletion). The function now logs a warning and returns early, matching the existing behaviour for the CPUE file. -
FIX HTTP/2
PROTOCOL_ERRORfailures on GCS uploads in CI —upload_cloud_file()now callscloud_storage_authenticate(force = TRUE)unconditionally before every upload. Service-account tokens expire after 1 hour; long upstream jobs (e.g.predict_pds_tracks) can exhaust this window, causinggargle(which useshttr2) to attempt a mid-flight token refresh over a stale HTTP/2 connection. Forcing fresh re-auth before the upload avoids this path entirely.
CI / Workflow
- Merged
predict-pds-tracksandaggregate-pds-effortpipeline jobs into a single job — they are always sequential and sharing a container saves startup overhead. - Deleted superseded
model-tracks.yamlworkflow (its steps are fully covered bydata-pipeline.yaml). - Fixed
pkgdown.yamldeploy step: added requiredenvironment: name: github-pagesblock (needed byactions/deploy-pages@v4); bumpedactions/upload-pages-artifactto@v4(native Node 24 support); added Changelog to pkgdown navbar.
Naming & Versioning Coherence
- CPUE parquet files are now stored under
pds-cpue_r{h3_res}(e.g.pds-cpue_r9) to match the effort grid naming convention (predicted-pds-h3_grid_r9). This ensures that running the pipeline at different H3 resolutions never silently mixes effort and CPUE data from different resolutions. - Portal CPUE JSON files follow the same pattern:
pds-cpue-r{h3_res}__timestamp__json. -
inst/conf.ymlportal.cpue.file_prefixupdated frompds-cpuetopds-cpue-r.
Documentation & Website
- Vignettes (
pipeline.Rmd,metrics-and-models.Rmd) moved from project root tovignettes/so pkgdown can discover them correctly. - pkgdown CI workflow (
pkgdown.yaml) fixed: system dependencies (GDAL, GEOS, PROJ, udunits2) now installed beforer-lib/actions/setup-r-dependencies@v2. -
_pkgdown.ymlarticles section re-enabled now that vignettes are in the correct location.
coasts 3.0.0
Fishing Activity Prediction Pipeline
A new end-to-end pipeline for classifying GPS boat tracks into fishing and non-fishing activity using the ssfaitk statistical model, and aggregating the results into spatial effort maps.
New Workflow Functions
NEW
predict_pds_tracks()- Downloads GPS tracks for all active vessels, applies thessfaitkfishing activity model to each trip, and uploads fishing-only point files to cloud storage. Implements version-aware incremental processing: trips already classified with the current model version are skipped, and files from outdated model versions are automatically replaced when the model is updated.NEW
aggregate_pds_effort()- Consolidates all classified fishing tracks into a single H3 hexagonal grid representing cumulative fishing effort across the fleet. Counts fishing pings and unique trips per cell and uploads the grid as a versioned parquet file ready for portal consumption.
New Spatial Analysis Utilities
-
NEW
assign_h3_indices()- Maps GPS coordinates to H3 hexagon cell IDs at any resolution -
NEW
aggregate_h3_effort()- Summarises fishing pings and unique vessel counts per H3 cell -
NEW
rollup_h3_resolution()- Re-aggregates effort from a fine H3 resolution to any coarser level for multi-scale analysis -
NEW
create_spatial_grid()- Converts an H3 effort table to ansfpolygon object for mapping -
NEW
prep_fishing_points()- Projects raw GPS coordinates to a metric CRS for distance-based spatial operations -
NEW
create_reference_grid()- Generates a deterministic square or hexagonal reference grid over a study area -
NEW
aggregate_daily_effort()- Counts fishing pings per reference grid cell via spatial join
Automated Pipeline
-
NEW GitHub Actions workflow (
model-tracks.yaml) - Runs the full fishing activity prediction and effort aggregation pipeline every two days. Always fetches the latestssfaitkmodel version at runtime, so improvements to the underlying model are picked up automatically without rebuilding the Docker image.
coasts 2.2.5
-
FIX Clarify
resolve_storage_opts()arguments
coasts 2.2.4
-
FIX Export
resolve_storage_opts()
coasts 2.2.2
- NEW Upgrade and optimize all functions related to pds ingestion and preprocessing in order to be compatible with current countries pipelines. This means countries data flows follow the same data processing enhancing processes mangeabiity and data consistency
coasts 2.2.1
Minor fix
Add “version” argument to download_parquet_from_cloud()
coasts 2.2.0
New Features
- NEW Upgrade and optimize all functions related to storage (google cloud and mogodb auth, download and upload). These will then replace the exsisting related functions for all the countries pipeline for improved manageability and centralization of common processes # coasts 2.1.0
New Features
NEW
get_kobo_data()- upgraded function to pull data from kobotoolbox according to Kobo API changes. The new function will replace the exisitn g data pulling process in all the pipeline for improved manageability and centralization of common processesBUG FIX Fixed bug related to the automatic generation of credentials of Peskas Tracks App.
coasts 1.5.0
New Features
Survey & Fleet Analysis Pipeline
-
NEW
summarize_data()- End-to-end summarization of WorldFish survey data into five output tables (monthly, taxa, district, gear, grid summaries) uploaded to cloud storage as versioned parquet files -
NEW
calculate_fishery_metrics()- Transforms catch-level records into normalized fishery indicators (site-level CPUE/RPUE, predominant gear, species composition) in long format for portal consumption -
NEW
generate_fleet_analysis()- Orchestrates full fleet activity estimation pipeline and uploads aggregated results to cloud storage -
NEW
prepare_boat_registry()- Constructs a boat registry from asset metadata for scaling GPS-tracked data to fleet-wide estimates -
NEW
process_trip_data()- Processes PDS API trip records by device IMEI into per-trip summaries -
NEW
calculate_monthly_trip_stats()- Aggregates trip data to monthly statistics per district -
NEW
estimate_fleet_activity()- Scales GPS-sampled trips to fleet-wide activity estimates using boat registry sampling rates -
NEW
calculate_district_totals()- Joins fleet estimates with survey summaries to produce district-level catch and revenue totals
Data Export
-
NEW
export_portal()- Downloads WorldFish summary datasets from cloud storage, joins modelled aggregate estimates, pivots to long format, and uploads all tables to MongoDB portal collections
Enhancements
Multi-Package Architecture
-
ENHANCED
read_config()- Addedpackageargument (default"coasts"). Downstream packages that ship their owninst/conf.ymlcan now callread_config(package = "mypackage")to load their own configuration instead of thecoastsdefaults -
ENHANCED All 12 top-level pipeline functions now accept a
packageargument threaded through toread_config():ingest_pds_trips(),ingest_pds_tracks(),backup_tracks(),ingest_assets(),preprocess_pds_tracks(),merge_survey_trips(),get_metadata(),summarize_data(),export_geos(),export_fishers_stats(),export_portal(),generate_fleet_analysis()
Package Infrastructure
-
ENHANCED
DESCRIPTION- MigratedAuthor/Maintainerfields toAuthors@R: person(...)format (fixes R CMD check WARNING); removed spuriousLazyData: true(nodata/directory); addedURLandBugReportsfields pointing to the GitHub repository -
ENHANCED
_pkgdown.yml- Added new “Survey & Fleet Analysis” reference section; addedexport_portal()to “Data Export & Storage”; removed two non-exported internal helpers that would have caused build errors
coasts 1.4.0
New Features
GPS-Survey Trip Matching
-
NEW
merge_survey_trips()- Downloads matched GPS and survey trip data across regions, harmonizes columns, and combines into a single dataset
Multi-Bucket Regional Storage
-
ENHANCED
download_parquet_from_cloud()andupload_parquet_to_cloud()- Addedbucket_nameparameter to download/upload from regional buckets (Kenya, Mozambique, Zanzibar) -
ENHANCED
inst/conf.yml- Regional bucket configuration with environment-specific bucket names (dev vs prod)
coasts 1.3.0
Fisher Performance Analytics
-
NEW
export_fishers_stats()- Comprehensive fisher performance analysis and export- Integrates catch events from tracks-app with GPS tracking data from PDS API
- Matches fisher-reported landings with automated trip tracking by date and device
- Calculates fishing efficiency metrics: CPUE (kg/hour, kg/km), search efficiency ratios
- Estimates fuel consumption and catch per liter efficiency
- Categorizes trips by distance (nearshore, mid-range, offshore)
- Exports aggregated fisher statistics and trip-level performance metrics to MongoDB
Automated Workflows
-
ENHANCED GitHub Actions data pipeline workflow
- Added
export-fishers-statsjob to automated pipeline - Runs after track preprocessing to ensure data availability
- Automatically exports fisher performance data on every pipeline run
- Added
Development Experience
-
NEW
.Rprofile- Interactive environment switching for local development- Added helper functions:
use_prod(),use_local(),use_default() - Visual environment indicator on R session startup
- Quick commands reference displayed in interactive sessions
- Simplified testing across different configuration profiles
- Added helper functions:
coasts 1.2.0
Breaking Changes
Configuration System Migration
-
BREAKING CHANGE - Migrated from auth folder to
.env-based credentials management- Removed
localconfiguration profile frominst/conf.yml - All environments now use environment variables loaded via
.envfile in local development - Added
dotenvpackage dependency for automatic.envfile loading - Created
.env.exampletemplate with all required environment variables - Updated
.gitignoreto properly handle.envfiles while tracking.env.example -
Migration Guide: Copy
.env.exampleto.envand fill in credentials (see updated README)
- Removed
New Features
Asset Management
-
ENHANCED
ingest_assets()- Comprehensive fisheries asset metadata ingestion- Added
log_thresholdparameter for configurable logging - Now includes PDS device metadata from Airtable (
pds_devicestable) - Retrieves 6 asset types: taxa, gear, vessels, landing sites, forms, and devices
- Changed output format from parquet to RDS for better R object serialization
- Added complete roxygen documentation following package standards
- Added
Data Ingestion Improvements
-
ENHANCED
ingest_pds_trips()- Improved trip data ingestion workflow- Now downloads device metadata from cloud storage instead of Google Sheets
- Filters devices by
last_seendate (>= 2023-01-01) for active devices only - Enhanced PDS API calls with
deviceInfoandwithLastSeenparameters - Client-side IMEI filtering for reliable data retrieval
- Updated documentation with detailed configuration examples and notes
Documentation
Package Documentation
-
ENHANCED README with
.env-based configuration instructions- Added step-by-step local development setup guide
- Documented all required environment variables with descriptions
- Separated local and production deployment instructions
-
UPDATED CLAUDE.md with new configuration system details
- Revised Configuration System section to explain
.envapproach - Updated Configuration Requirements with clear setup steps
- Added
dotenvto Key Dependencies section
- Revised Configuration System section to explain
-
NEW
.env.example- Template file for local development credentials- Includes all 12 required environment variables with helpful comments
- Proper formatting examples for complex values (JSON keys, connection strings)
Function Documentation
-
ENHANCED
ingest_assets()with comprehensive roxygen documentation- Detailed @description with step-by-step operations
- Complete @details section with YAML configuration structure
- Asset type descriptions (Taxa, Gear, Vessels, Landing Sites, Forms, Devices)
- Added @examples, @seealso, and @keywords tags
Technical Improvements
Configuration Loading
-
ENHANCED
read_config()function inR/utils.R- Automatic detection and loading of
.envfile if present - Seamless integration with existing
config::get()workflow - Informative logging when
.envfile is loaded
- Automatic detection and loading of
coasts 1.1.0
- NEW - Integrate (Beta) Cabo Delgado (Mozambique) estimates
- NEW - Ddeveloping code to integrate catch events records from tracks-app
coasts 1.0.0
Major New Features
Airtable Integration System
-
NEW
airtable_to_df()- Convert Airtable tables to R data frames with pagination support -
NEW
df_to_airtable()- Create new records in Airtable tables with batch processing -
NEW
bulk_update_airtable()- Update multiple Airtable records efficiently (10 record batches) -
NEW
update_airtable_record()- Update individual Airtable records -
NEW
get_writable_fields()- Identify writable fields in Airtable tables (excludes computed fields) -
NEW
device_sync()- Comprehensive sync function for device data (updates existing, creates new) -
NEW
ingest_pelagic_boats()- Complete workflow for PDS boat data ingestion and Airtable sync -
NEW
sync_device_users()- Sync device users to MongoDB with password generation and Airtable updates
Enhanced PDS API Integration
-
NEW
pelagic_auth()- Authentication with Pelagic Analytics API -
NEW
pelagic_refresh_token()- Token refresh functionality for sustained API access -
NEW
get_pelagic_boats()- Retrieve boat information with server-side filtering and column selection -
NEW
get_pelagic_devices()- Retrieve device information with advanced filtering capabilities - Enhanced
ingest_pds_tracks()with improved error handling and parallel processing
Automated Workflows
-
NEW GitHub Actions workflow:
ingest-pelagic-boats.yaml(runs every 15 days) -
NEW GitHub Actions workflow:
sync-device-users.yaml(runs every 10 days) - Enhanced main data pipeline workflow with improved container management
Configuration System Improvements
-
BREAKING CHANGE Restructured MongoDB configuration to support dual databases:
-
mongodb.coasts_portal- For main coasts geospatial data -
mongodb.tracks_app- For tracks application user data
-
-
BREAKING CHANGE Enhanced Airtable configuration with separate base IDs:
-
airtable.frame- For device and country metadata -
airtable.tracks_app- For user management
-
- Updated environment variable requirements for production deployments
Documentation and Development
- NEW Professional pkgdown website with enhanced theming and navigation
- Enhanced README with status badges and improved structure
- Fixed pkgdown configuration issues with pipe operators and tidy evaluation functions
- Updated function documentation with detailed examples and use cases
Bug Fixes and Improvements
Data Processing
- Fixed KES to USD conversion units in
export_geos() - Improved MongoDB collection references to use new dual-database configuration
- Enhanced error handling in data ingestion functions
- Better logging and progress tracking across all functions
Technical Improvements
- Password generation system for new users with reproducible seeding
- Comprehensive data validation and duplicate handling
- Enhanced country mapping for global fisheries data (13 countries supported)
- Improved spatial data processing with WGS84 coordinate system standardization
- Advanced MongoDB operations with geospatial indexing (2dsphere)
coasts 0.1.0
- Initial release of the coastal fisheries data pipeline for Western Indian Ocean region.
New Features
Data Ingestion
-
ingest_pds_trips()- Automated ingestion of GPS boat trip data from Pelagic Data Systems (PDS) API -
ingest_pds_tracks()- Parallel processing of detailed GPS track data with batch processing capabilities -
get_metadata()- Retrieval of fishery metadata from Google Sheets
Data Preprocessing
-
preprocess_pds_tracks()- Spatial gridding and summarization of fishing activity patterns - Multi-scale spatial analysis support (100m, 250m, 500m, 1000m grid cells)
- Parallel processing for efficient handling of large datasets
-
preprocess_track_data()- Core function for converting GPS tracks to spatial grid summaries
Data Export and Storage
-
export_geos()- Comprehensive export of geospatial data and regional metrics to MongoDB - MongoDB integration with 2dsphere geospatial indexing
- Currency conversion for Kenya (KES to USD) and Zanzibar (TZS to USD) economic indicators
- Support for regional boundary data and time series metrics
Cloud Storage Integration
-
upload_cloud_file()anddownload_cloud_file()- Google Cloud Storage integration -
cloud_object_name()- Versioned object naming and retrieval -
upload_parquet_to_cloud()anddownload_parquet_from_cloud()- Optimized parquet file handling - Automatic file compression using LZ4 algorithm
Database Operations
-
mdb_collection_push()andmdb_collection_pull()- MongoDB collection management - Geospatial indexing support for spatial queries
- Bulk data operations with error handling
API Integration
-
get_trips()- PDS API integration for trip data retrieval -
get_trip_points()- Detailed GPS point data from PDS API - Authentication and token management for external APIs