---
title: "Analyzing discharge at Kings Creek"
vignette: >
  %\VignetteIndexEntry{Analyzing discharge at Kings Creek}
  %\VignetteEngine{quarto::html}
  %\VignetteEncoding{UTF-8}
editor: visual
---

```{r}
#| label: setup
#| include: false

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Introduction

This vignette demonstrates several preMetabolizer utilities for working with daily streamflow records. We use `kings_discharge`, a built-in dataset that contains daily USGS discharge and water temperature for Kings Creek (monitoring location `USGS-06879650`) at the Konza Prairie Biological Station in northeastern Kansas during water year 2025 (2024-10-01 through 2025-09-30).

Kings Creek is a well-studied tallgrass prairie stream with a highly episodic flow regime: many days record zero discharge, with brief but intense pulses following rainfall events.

```{r}
#| label: libraries

library(preMetabolizer)
library(dplyr)
library(ggplot2)
```

## Load and reshape the data

```{r}
#| label: load-data

data(kings_discharge)
glimpse(kings_discharge)
```

The data are in long format with one row per date × parameter combination. We pivot to wide format so each row represents a single day with discharge (`00060`, cfs) and water temperature (`00010`, °C) as separate columns.

```{r}
#| label: pivot-wide

kings <- kings_discharge |>
  tidyr::pivot_wider(
    id_cols = time,
    names_from = parameter_code,
    values_from = value
  ) |>
  rename(discharge_cfs = `00060`, temp_C = `00010`)

glimpse(kings)
```

## Convert units

USGS reports discharge in cubic feet per second (cfs). `convert_flow()` converts between cfs, cubic meters per second (cms), and liters per second (lps). For comparison with international literature we add a cms column.

```{r}
#| label: convert-flow

kings <- kings |>
  mutate(discharge_cms = convert_flow(discharge_cfs, from = "cfs", to = "cms"))

summary(kings[, c("discharge_cfs", "discharge_cms")])
```

## Add season

`get_season()` assigns an astronomical season (Spring, Summer, Autumn, Winter) to each date and returns an ordered factor, which is useful for stratified summaries and plots.

```{r}
#| label: seasons

kings <- kings |>
  mutate(season = get_season(time))

count(kings, season)
```

## Time series overview

```{r}
#| label: plot-timeseries
#| fig-alt: "Time series of daily discharge and water temperature at Kings Creek during water year 2025."

kings |>
  tidyr::pivot_longer(
    c(discharge_cms, temp_C),
    names_to = "variable",
    values_to = "value"
  ) |>
  mutate(variable = recode(
    variable,
    discharge_cms = "Discharge (m³/s)",
    temp_C = "Temperature (°C)"
  )) |>
  ggplot(aes(time, value)) +
  geom_line(linewidth = 0.4) +
  facet_wrap(~variable, ncol = 1, scales = "free_y") +
  labs(x = NULL, y = NULL) +
  theme_bw()
```

## Flag anomalous values

`flag_z()` uses a moving-window robust Z-score to identify values that deviate unusually from their local neighborhood. Here we apply it to water temperature to check for sensor spikes or data-entry errors.

```{r}
#| label: flag-z

z_result <- flag_z(kings$temp_C, width = 7, threshold = 3, return_z = TRUE)

kings <- kings |>
  mutate(
    temp_z = z_result$z,
    temp_flag = z_result$flag
  )

cat(
  "Flagged temperature values:",
  sum(!is.na(kings$temp_flag)),
  "\n"
)
```

## Flow duration curve

A flow duration curve (FDC) shows what fraction of time discharge equals or exceeds a given value. `calc_exceedance_prob()` implements the Weibull plotting-position formula. Setting `rm.zero = TRUE` excludes days with zero discharge from the ranking, so the curve reflects the non-zero portion of the flow regime.

```{r}
#| label: fdc

kings <- kings |>
  mutate(exceed_prob = calc_exceedance_prob(discharge_cms, rm.zero = TRUE))

zero_pct <- mean(kings$discharge_cms == 0, na.rm = TRUE) * 100
cat(sprintf("Days with zero discharge: %.1f%%\n", zero_pct))
```

```{r}
#| label: plot-fdc
#| fig-alt: "Flow duration curve for Kings Creek. The x-axis shows exceedance probability and the y-axis shows discharge on a log scale."

kings |>
  filter(!is.na(exceed_prob)) |>
  ggplot(aes(exceed_prob, discharge_cms, color = season)) +
  geom_point(size = 1.5, alpha = 0.8) +
  scale_x_continuous(
    labels = \(x) paste0(round(x * 100), "%"),
    limits = c(0, 1)
  ) +
  scale_y_log10() +
  scale_color_manual(values = c(
    Winter = "#4575b4",
    Spring = "#74add1",
    Summer = "#fdae61",
    Autumn = "#d73027"
  )) +
  labs(
    x     = "Exceedance probability",
    y     = "Discharge (m³/s)",
    color = "Season"
  ) +
  theme_bw()
```

## Discharge variability by season

`calc_cv()` computes the coefficient of variation (CV), a normalized measure of variability. A high CV indicates a flashy, episodic regime; a low CV indicates more stable baseflow conditions.

```{r}
#| label: cv-by-season

kings |>
  filter(discharge_cms > 0) |>
  group_by(season) |>
  summarise(
    n_days   = n(),
    mean_cms = mean(discharge_cms, na.rm = TRUE),
    cv_pct   = calc_cv(discharge_cms, na.rm = TRUE, as_percent = TRUE),
    .groups  = "drop"
  )
```

## Discharge histogram

`calc_bin_width()` suggests a histogram bin width using classical rules. The `"fd"` (Freedman-Diaconis) method is robust to outliers and works well for right-skewed flow data.

```{r}
#| label: plot-histogram
#| fig-alt: "Histogram of non-zero daily discharge at Kings Creek, colored by season."

non_zero_q <- kings |> filter(discharge_cms > 0)

bw <- calc_bin_width(non_zero_q$discharge_cms, method = "fd")

ggplot(non_zero_q, aes(discharge_cms, fill = season)) +
  geom_histogram(binwidth = bw, color = "white", linewidth = 0.2) +
  scale_fill_manual(values = c(
    Winter = "#4575b4",
    Spring = "#74add1",
    Summer = "#fdae61",
    Autumn = "#d73027"
  )) +
  labs(
    x    = "Discharge (m³/s)",
    y    = "Days",
    fill = "Season"
  ) +
  theme_bw()
```
