Skip to contents

Downloads a Parquet file from cloud storage and loads it into memory. The local file is automatically cleaned up after reading.

Usage

download_parquet_from_cloud(
  prefix,
  provider,
  options,
  version = "latest",
  bucket_name = NULL
)

Arguments

prefix

A character string specifying the file prefix path in cloud storage.

provider

A character string specifying the cloud storage provider key.

options

A named list of cloud storage provider options.

version

A character string specifying the version to retrieve. Default is "latest", which returns the most recently updated object.

bucket_name

Optional character string specifying the GCS bucket name. If provided, overrides the bucket in `options`. If NULL (default), uses the bucket defined in `options`.

Value

A tibble containing the data from the Parquet file.

Examples

if (FALSE) { # \dontrun{
# Download latest version from default bucket
data <- download_parquet_from_cloud(
  prefix = "raw-data/survey-data",
  provider = conf$storage$google$key,
  options = conf$storage$google$options
)

# Download a specific version
data <- download_parquet_from_cloud(
  prefix = "raw-data/survey-data",
  provider = conf$storage$google$key,
  options = conf$storage$google$options,
  version = "20250101T120000"
)

# Download from a specific bucket
data <- download_parquet_from_cloud(
  prefix = "pds-trips",
  provider = conf$storage$google$key,
  options = conf$storage$google$options,
  bucket_name = conf$storage$google$buckets$mozambique
)
} # }