3.5 Data Visualization in R: Common Plot Types for Fisheries Data

R
Visualization
ggplot2
Author

Lorenzo Longobardi

Published

2024-11-06

Why Visualize Data? 🎯

Before diving into complex data preprocessing and analysis, understanding your data through visualization is crucial. Think of it like preparing for a fishing trip - just as you would survey the waters, check weather conditions, and observe fish behavior before casting your nets, visualizing your data helps you:

  • Understand Patterns: Just as ripples on water might indicate fish movement, patterns in your data can reveal important trends
  • Identify Issues: Like spotting damaged gear before fishing, visualization helps detect data problems early
  • Generate Insights: Similar to how experienced fishers read the water, good visualization helps you “read” your data
  • Communicate Findings: Visual representations make it easier to share insights with colleagues and stakeholders

In Small-Scale Fisheries (SSF) research, visualization is particularly valuable because:

  1. Complex Data Structure
    • Multiple species caught simultaneously
    • Various gear types used
    • Different fishing grounds
    • Seasonal variations
  2. Quality Control
    • Field data collection errors
    • Missing or incorrect values
    • Outliers and unusual patterns
    • Inconsistent measurements
  3. Communication Needs
    • Sharing findings with fishers
    • Reporting to management
    • Contributing to research
    • Informing policy decisions
Learning Objectives

By the end of this tutorial, you will: - Understand the principles of effective data visualization - Master the fundamentals of ggplot2 syntax and structure - Create basic to advanced plots for exploring fisheries data - Learn to customize visualizations for different audiences - Use plots to identify data quality issues - Develop skills for creating publication-ready graphics

Understanding ggplot2: The Grammar of Graphics 📊

What is ggplot2?

ggplot2 is not just another plotting package - it’s a complete system for creating visualizations based on the “Grammar of Graphics.” This approach breaks down any visualization into fundamental components, similar to how we break down language into grammar components.

Think of it like constructing a sentence: - A sentence needs a subject, verb, and object - A ggplot2 visualization needs data, aesthetics, and geometries

The Basic Components

Let’s understand each component using a fishing analogy:

  1. Data: Your dataset
    • Like your catch records or survey data
    • Must be in a tidy format (each variable in a column)
  2. Aesthetics (aes): How variables map to visual elements
    • Like deciding how to sort fish (by size, species, etc.)
    • Maps variables to visual properties (position, color, size)
  3. Geometries (geom): The type of plot
    • Like choosing how to display fish (in boxes, on a scale, etc.)
    • Determines how data is visually represented (points, lines, bars)

Let’s see this in practice with some sample data:

Now, let’s build our first plot step by step:

Let’s break down what happened:

  1. Step 1: Created a blank canvas
    • Like preparing your workspace
    • No visual elements yet
  2. Step 2: Added aesthetic mappings
    • Defined what goes on x and y axes
    • Set up the coordinate system
    • Still no visible elements
  3. Step 3: Added geometric elements
    • Points appear on the plot
    • Each point represents one day’s catch

Understanding Layers

One of ggplot2’s most powerful features is its layering system. You can add multiple layers to create complex visualizations:

Each layer adds new information: 1. Points show individual observations 2. Line connects points to show sequence 3. Trend line shows overall pattern 4. Labels provide context 5. Theme controls overall appearance

Essential Aesthetic Mappings

Let’s explore the main types of aesthetic mappings:

Key aesthetic mappings include:

  1. Position (x, y)
    • Most fundamental mapping
    • Determines location of elements
    • Usually maps to continuous variables
  2. Color
    • Great for categorical variables
    • Can show differences between groups
    • Use carefully for color-blind accessibility
  3. Size
    • Works well for continuous variables
    • Can show importance or magnitude
    • Don’t use for too many different values
  4. Shape
    • Best for categorical variables
    • Limited to 6-7 distinct shapes
    • Combines well with color

Practice Exercise: Creating Basic Plots

Let’s practice what we’ve learned with some exercises:

Click to see solution
Tips for Effective Basic Plots
  1. Always include clear axis labels
  2. Choose appropriate scales for your data
  3. Use colors meaningfully, not just for decoration
  4. Consider your audience when adding complexity
  5. Start simple and add layers as needed

This concludes Part 1 of our tutorial on data visualization with ggplot2. In the next part, we’ll explore different types of plots commonly used in fisheries data analysis, along with more advanced customization options.

Essential Plot Types for Fisheries Data 📊

Different aspects of fisheries data require different visualization approaches. Let’s explore the most common and useful plot types you’ll need in your work. We’ll start by creating a realistic dataset that represents typical small-scale fisheries data:

1. Time Series Plots: Tracking Temporal Patterns 📈

Time series plots are essential for understanding patterns over time, such as: - Daily/monthly catch trends - Seasonal variations - Long-term changes in CPUE - Price fluctuations

Let’s create several types of time series visualizations:

For monthly patterns, we can aggregate the data:

Time Series Visualization Tips
  1. Use lines to show continuity in data
  2. Add points to show actual observations
  3. Consider smoothing for trend visualization
  4. Group by relevant time periods (day, week, month)
  5. Account for seasonal patterns

2. Distribution Plots: Understanding Variation 📊

Distribution plots help us understand how values are spread out, which is crucial for: - Catch size distributions - Price variations - Effort patterns - Environmental variables

Let’s explore different ways to visualize distributions:

Each type of distribution plot serves a different purpose: - Histograms: Show frequency of values in bins - Density plots: Smooth representation of distribution - Box plots: Show median, quartiles, and outliers - Violin plots: Combine box plot with density distribution - Point overlays: Show actual data points

3. Relationship Plots: Understanding Connections 🔄

Understanding relationships between variables is crucial in fisheries research. Common relationships to explore include: - Catch vs. effort - Price vs. quantity - Environmental factors vs. catch - Temporal patterns vs. catch

Let’s create various relationship plots:

4. Compositional Plots: Understanding Proportions 📊

Compositional plots are valuable for showing: - Species composition of catches - Proportion of catch by site - Value distribution across species - Effort allocation

Let’s explore different ways to show composition:

Notes on Pie Charts

While pie charts can be visually appealing: 1. They’re harder to compare than bar charts 2. Use only for 2-6 categories 3. Consider alternatives like stacked bars 4. Best for showing part-to-whole relationships

Practice Exercise: Creating Common Plots 💪

Let’s practice creating these plot types with our fisheries data:

Click to see solution
Click to see solution

Key Points to Remember 🗝️

When choosing plot types for fisheries data:

  1. Time Series Plots
    • Best for showing trends over time
    • Consider seasonality
    • Use appropriate time aggregation
    • Include confidence intervals when relevant
  2. Distribution Plots
    • Use histograms for single variables
    • Box plots for group comparisons
    • Violin plots for detailed distributions
    • Consider sample size when choosing
  3. Relationship Plots
    • Scatter plots for two continuous variables
    • Add trend lines to show patterns
    • Use facets for additional grouping
    • Consider adding uncertainty
  4. Compositional Plots
    • Stacked bars for absolute values
    • 100% stacks for proportions
    • Use pie charts sparingly
    • Consider the number of categories

Applying ggplot2 to Fisheries Analysis 📊

Now that we understand how to build plots using layers and aesthetic mappings, let’s apply these concepts to real fisheries analysis. While our previous examples helped us learn the basics, actual fisheries data involves multiple interrelated variables that tell important stories about fishing patterns, catch composition, and temporal trends.

To explore these relationships effectively, let’s first expand our dataset to better represent the complexity of small-scale fisheries data:

This dataset represents three months of daily fishing activities across different sites, including: - Temporal information (dates) - Spatial information (fishing sites) - Catch composition (species) - Effort metrics (number of boats, trip duration) - Economic data (prices) - Environmental conditions (water temperature, moon phase)

Using our understanding of aesthetic mappings and layers, we can now explore different aspects of this data through various visualization approaches.

Understanding Temporal Patterns with Multiple Layers

Remember how we used geom_point() and geom_line() in our basic examples? We can combine these with additional aesthetics and geometries to reveal rich temporal patterns in our fishing data:

This plot combines multiple layers we learned about earlier: 1. geom_line(): Shows the continuous nature of time series data 2. geom_point(): Marks actual daily observations 3. geom_smooth(): Reveals underlying trends 4. Color aesthetic: Distinguishes between species

Adding Complexity: Multiple Variables and Faceting

Now let’s build on our understanding of aesthetics to explore relationships between multiple variables. We’ll use faceting to create small multiples, a powerful technique for comparing patterns across groups:

This visualization demonstrates several advanced concepts: 1. Multiple variable relationships (CPUE, temperature, species) 2. Faceting to compare across sites 3. Trend lines to show relationships 4. Thoughtful use of transparency and spacing

Exploring Distributions with Combined Geometries

Let’s extend our knowledge of geometries to create more sophisticated distribution visualizations:

This plot combines multiple geometries to show different aspects of the data: 1. geom_violin(): Shows the full distribution shape 2. geom_boxplot(): Displays summary statistics 3. geom_jitter(): Reveals individual observations 4. Faceting: Allows comparison across sites

Next: Preprocessing data 1