Correlation alternatives / How to go about testing this relationship?

Question

I have a large set of temperature data from upstream and downstream gauges. I am trying to find the influence of dam release on downstream temperatures. To do this, I am comparing correlation between the tailwater gauge (placed right below the dam) against various downstream sites. Here is my dataset.

> dput(head(TravelTimeAdjustedSaltData))
structure(list(Date = structure(c(1709942400, 1709943300, 1709944200, 
1709945100, 1709946000, 1709946900), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), S1 = c(12.824443359375, 12.824443359375, 12.824443359375, 
12.824443359375, 12.824443359375, 12.78154296875), S2 = c(12.86734375, 
12.86734375, 12.86734375, 12.910244140625, 12.86734375, 12.824443359375
), S3 = c(12.223837890625, 12.223837890625, 12.26673828125, 12.26673828125, 
12.223837890625, 12.26673828125), S4 = c(NA, NA, NA, NA, 7.8908984375, 
7.847998046875), S5 = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), S6 = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), S7 = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), S8 = c(12.309638671875, 12.309638671875, 
12.26673828125, 12.3525390625, 12.3525390625, 12.309638671875
), S9 = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), S10 = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_), S11 = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_), GaugeTemp = c(8.2, 8.2, 8.2, 8.2, 8.2, 8.2), GaugeHeight = c(70.83, 
70.84, 70.84, 70.85, 70.83, 70.83)), row.names = c(NA, 6L), class = c("tbl_df", 
"tbl", "data.frame"))

The data is adjusted for the waters travel time downstream and that is why there are NAs on the first few rows in some columns. Anyways, I ran Spearman correlations, but found a pattern that was not expected and therefore, think that the correlation is not indeed testing what I am actually wanting to find out. What I found is that sites further downstream (S10 / S11) actually had higher or around the same correlation with the Tailwater Gauge as the first site downstream (S4). I have included both the S4 (closest site downstream from tailwater) and S11 (furthest from tailwater) to show what I am saying. This should not be the case as the influence of the dam releases on downstream temperature should decrease over distance. This leads me to believe that the correlation test is not the answer to my question.

cor.test(TravelTimeAdjustedSaltData$GaugeTemp, TravelTimeAdjustedSaltData$S11, method = "spearman", na.rm=TRUE)

cor.test(TravelTimeAdjustedSaltData$GaugeTemp, TravelTimeAdjustedSaltData$S4, method = "spearman", na.rm=TRUE)

I do not know how to go about testing the cause of dam release (aka the tailwater gauge temperature readings) with the effect (downstream temperature readings). I am looking into some sort of non-parametric regression (LOESS) between the two but not sure if that is the correct way to go about it and also am not very familiar with local regression analysis. Any help would be very much appreciated. I am just wanting a statistical way to show that the dam release is indeed having an effect downstream or not. The correlation is not seeming to serve that purpose (not exactly sure why though).

If I get it right it simply takes some time till the water flows to S11, so you can not correlate the data as it is but need take this into account. See about cross correlation. Further, this is repeated measurement data. You need to consider that — LulY, Commented Jul 3 at 14:28
I have shifted the dataset to account for the water travel time. Repeated mesaurment data? Not sure what you mean by that. — Matt Schaaf, Commented Jul 3 at 15:22
repeated measurement: online.stat.psu.edu/stat505/lesson/…. — LulY, Commented Jul 9 at 6:58
So basically setting each time interval as its own column and treating it as it’s own variable? — Matt Schaaf, Commented Jul 9 at 21:50
Trivial and marginal but FYI the correct spelling is Celsius. More awkward: serial correlation vitiates standard P-value calculations for any kind of correlation here, as successive observations are certainly not independent. — Nick Cox, Commented Jul 10 at 10:21

LulY · Accepted Answer · 2024-07-10 07:04:24Z

If I get it right what you do is measuring water temperature at different spots of a river, beginging right behind a dam and repeating the measurement at multiple spots further downstream. This means you have repeated measurement: You measure the water temperature at place 0 (GaugeTemp), then a little bit later at place 1 (S1), and so on. Therefore, what you could do in R is

# Assuming your data is named df.
# Bring your data in long format.
df_long <- df %>%
  dplyr::mutate(id= 1:dplyr::n()) %>%
  dplyr::rename(S0= GaugeTemp) %>%
  tidyr::pivot_longer(cols= -c("Date", "id", "GaugeHeight"),
                      names_to= "time",
                      values_to= "temperature") %>%
  dplyr::mutate(id= as.factor(id),
                time= factor(time, levels= c(paste0("S", 0:(length(unique(time))-1)))))

# Calculate repeated measures ANOVA
rstatix::anova_test(data= df_long, dv= temperature, wid= id, within= time)

Some points:

You wonder what effect dam release has on the temperature change from the dam across the river. But I see no variable for dam release in the data you provided. You should provide the variable with the between argument in anova_test() function. After doing so you look for the time*factor interaction.
The approach I suggest here treats each "water part" as a subject (water part A is measured at the dam, then at S1, then at S2, ... A littble bit later water part B is measured at the dam, then at S1, then at S2, and so on). This means every row of the data you provided corresponds to a subject. The problem is here that the "subjects" are not independent in your data (dam temperature at the dam in row 2 was measured 15 minutes after measuring the temperature at the very same spot and therefore should correlate with it. Same is true for all other measurements in S1 to S11). I don't know how this effects the results. So repeated measures ANOVA is not really ideal here, but the only thing I know about. Maybe someone other has a better answer.
You write "I have shifted the dataset to account for the water travel time". Is this why you have missings in your data? If so, my approach does not work on your data as described in the code above. Again, the code above assumes each row to be a subject, i.e. in each row should be the results of the temperture measurement of the same water at different time points. Imagine you are at the dam and you throw a stick in the river and 15 minutes later you throw a leaf in the river: Row i should have all the temperature measurements of the water when the stick passed by and row i+1 all the measurements of the water when the leaf passed by. There should be no missings and each temperature column actually has its own date column (which one can ignore if the time intervals between the measurement are equal).

I appreciate your response. I did think about shifting my data to a long format. This seems to be what you are suggesting. I do not think ANOVA is necessarily what I want here because that is comparison of means, when really what I want is time-series relationship test. I will try out your method though and let you know what I find. To your last point, I do have missing data because I have shifted the values by the approximate time it takes for the water to travel downstream to the points. This was for correlation purposes so I could have accurate paired measures. — Matt Schaaf, Commented Jul 10 at 18:40
To your first point, dam release would be indicated by the gauge temperature. This is placed right below the dam, so therefore it serves as the variable which everything should be compared to. I appreciate your help and will try your method but not sure if this fully helps me. Either way, it gives me a direction for the data analysis so thank you. — Matt Schaaf, Commented Jul 10 at 18:43

Stack Exchange Network

Correlation alternatives / How to go about testing this relationship?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
r
correlation
biostatistics
ggplot2
loess
or ask your own question.

Linked

Hot Network Questions

Correlation alternatives / How to go about testing this relationship?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged rcorrelationbiostatisticsggplot2loess or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
r
correlation
biostatistics
ggplot2
loess
or ask your own question.