$Causality Question$

Question

$Causality Question$

I have a strong regression trend but would like some additional information related to causality. I believe the X is "largely" the independant variable however based on deep domain knowledge and because the trend line is asymetric (inverse relationship in negative X territory). If the Y was the independant variable, the regression trend would be linear throughout.

Causality in both directions is supported by the fact that a polynomial trend seems to have a nice fit.

Question - Is there a way (method) to quantify the nature of the two variables having causality in both directions?

Follow-up Question - Can you quantify the nature of the "dual-direction-causality" by measuring the extent to which the polynomial trend line is pulled away from the original hypothesis line (asymimetric line) towards the null hypothesis (linear line)?

(See attached file #3 which include edits)

Statistics

Mcyoungdad

16

Matchmaticians Causality Question File #1

File #1 (pdf)

Matchmaticians Causality Question File #2

File #2 (pdf)

Matchmaticians Causality Question File #3

File #3 (pdf)

Report

Mathe

+1

In Statistics, there is either independence between variables or dependence. If you are interested in measuring the form and strength of dependency, you could do a regression analysis. You would still need to argue why one variable affects the other (maybe provide a channel of influence) and check for different assumptions so that your final results are not spurious.
Mcyoungdad

0

Maybe I am using the wrong term. I guess what I am getting at is causality. I will try to edit to make more clear.
Savionf

+1

The bounty is too low for the level of the question.
- Mcyoungdad
  
  0
  
  Ahhhh, that sounds hopeful! I figured the answer would be something simple basically ending in.... "correlation does not equal causality." I will add more money momentarilly.
Mathe

0

"proof of causality". I don't think you can establish that with your data. Unless you had done a carefully controlled experiment, or unless you made a strong argument using quasi-experiments, or unless you were controlling for many potential mediators, which doesn't seem to be the case, I don't think you can argue about causality or direction of causality. I'm sorry to say this, but I think you may have inappropriate expectations with your question.
- Mcyoungdad
  
  0
  
  I have edited my post again in reaction to your comment. I should not have written "proof of causality." This is because I believe I already have very high confidence in the causality based on the fact that the trend line is asymetric (changes direction in negative X territory). The nature of the two variables dictate (based on simple common sense) that the trend would be linear if the current Y was the independent variable (instead of the current X).
Mathe

0

Could you elaborate on how you came to this data?
- Mcyoungdad
  
  0
  
  The data is related to my job. Which I would like not to disclose much info about. I can assure you it is high quality data. I can also assure you that I am an expert in the nature of the two variables (20+ years of hands-on experience with how the 2 variables are related in real life). If you would like to talk on the phone, I might be willing.
Mcyoungdad

0

Instead, what I am after is... more information on how much the two variables have causality in both directions. It may be the case that this is unanswerable with what I have provided. If that is the case, I am open to that answer. However, it seems logical that, since the polynomial trend is "somewhere in between" the two competing hypothesis (for which variable is Indep). Therefor it seems that there is insight to be gained from comparing the polynomial trend line to the other two lines.
Mcyoungdad

0

I am working on one paper that will claim: Y variable is dependent and X is independent (which as mentioned, I feel confident about). However, a second paper could dive into the question we are discussing... does the Y variable also have some limited influence on the X? Obviously, I would need help from someone better at statistics, but I thought I would gauge the "promise" of the second paper here before I pursue it much.
- Mathe
  
  0
  
  With numbers along, this is impossible to answer. Check this website to see examples of how, looking at the numbers alone, ii would be impossible to establish causation: https://www.tylervigen.com/spurious-correlations
- Mathe
  
  0
  
  Also, check this examples to see how a single statistic (correlation coefficient) can be misleading https://en.wikipedia.org/wiki/Correlation#/media/File:Correlation_examples2.svg
Mcyoungdad

0

I also have other variables that I use that confirm that the Y is dependent and the X is independent... but I didn't want to get into that in fear that I would muddy the waters.
- Mathe
  
  0
  
  If you believe there are other variables at play that could result in changes of Y, and that were not being held constant as X took on different values, your analysis and conclusions would be useless.
Mcyoungdad

0

When I say other variables... I meant other data sets that are proxies for what I claim the Z score of the relationship between the two variables really "means." If my hypothesis is true (X is independent, Y dependent), than the Z score "means something specific." If my hypothesis in not true (Y is indep, and X is dep), than the Z score "Means something very different." I have confirmed correllation between these other "proxies" and the Z score to confirm hypothesis. Again, afraid to muddy..
Mcyoungdad

0

Mathe - Thank you for the link - I will read. However, please note that I fully understand correlation coefficients can be misleading. That is why my causation conclusion is based on (a) a deep understanding of the nature of the two variables, (2) the shape of the regression line, (3) other proxy datasets. I am hopeful that the link will explain why comparing trend lines does not provide more insight into the level of dual-direction-causality. Thank you so much!

Answer

The answer is accepted.

Join Matchmaticians Affiliate Marketing Program to earn up to a 50% commission on every question that your affiliated users ask or answer.

Mathe

+1

In Statistics, there is either independence between variables or dependence. If you are interested in measuring the form and strength of dependency, you could do a regression analysis. You would still need to argue why one variable affects the other (maybe provide a channel of influence) and check for different assumptions so that your final results are not spurious.
Mcyoungdad

0

Maybe I am using the wrong term. I guess what I am getting at is causality. I will try to edit to make more clear.
Savionf

+1

The bounty is too low for the level of the question.

Mcyoungdad

0

Ahhhh, that sounds hopeful! I figured the answer would be something simple basically ending in.... "correlation does not equal causality." I will add more money momentarilly.
Mcyoungdad

0

Ahhhh, that sounds hopeful! I figured the answer would be something simple basically ending in.... "correlation does not equal causality." I will add more money momentarilly.
Mathe

0

"proof of causality". I don't think you can establish that with your data. Unless you had done a carefully controlled experiment, or unless you made a strong argument using quasi-experiments, or unless you were controlling for many potential mediators, which doesn't seem to be the case, I don't think you can argue about causality or direction of causality. I'm sorry to say this, but I think you may have inappropriate expectations with your question.

Mcyoungdad

0

I have edited my post again in reaction to your comment. I should not have written "proof of causality." This is because I believe I already have very high confidence in the causality based on the fact that the trend line is asymetric (changes direction in negative X territory). The nature of the two variables dictate (based on simple common sense) that the trend would be linear if the current Y was the independent variable (instead of the current X).
Mcyoungdad

0

I have edited my post again in reaction to your comment. I should not have written "proof of causality." This is because I believe I already have very high confidence in the causality based on the fact that the trend line is asymetric (changes direction in negative X territory). The nature of the two variables dictate (based on simple common sense) that the trend would be linear if the current Y was the independent variable (instead of the current X).
Mathe

0

Could you elaborate on how you came to this data?

Mcyoungdad

0

The data is related to my job. Which I would like not to disclose much info about. I can assure you it is high quality data. I can also assure you that I am an expert in the nature of the two variables (20+ years of hands-on experience with how the 2 variables are related in real life). If you would like to talk on the phone, I might be willing.
Mcyoungdad

0

The data is related to my job. Which I would like not to disclose much info about. I can assure you it is high quality data. I can also assure you that I am an expert in the nature of the two variables (20+ years of hands-on experience with how the 2 variables are related in real life). If you would like to talk on the phone, I might be willing.
Mcyoungdad

0

Instead, what I am after is... more information on how much the two variables have causality in both directions. It may be the case that this is unanswerable with what I have provided. If that is the case, I am open to that answer. However, it seems logical that, since the polynomial trend is "somewhere in between" the two competing hypothesis (for which variable is Indep). Therefor it seems that there is insight to be gained from comparing the polynomial trend line to the other two lines.
Mcyoungdad

0

I am working on one paper that will claim: Y variable is dependent and X is independent (which as mentioned, I feel confident about). However, a second paper could dive into the question we are discussing... does the Y variable also have some limited influence on the X? Obviously, I would need help from someone better at statistics, but I thought I would gauge the "promise" of the second paper here before I pursue it much.

Mathe

0

With numbers along, this is impossible to answer. Check this website to see examples of how, looking at the numbers alone, ii would be impossible to establish causation: https://www.tylervigen.com/spurious-correlations

Mathe

0

Also, check this examples to see how a single statistic (correlation coefficient) can be misleading https://en.wikipedia.org/wiki/Correlation#/media/File:Correlation_examples2.svg
Mathe

0

With numbers along, this is impossible to answer. Check this website to see examples of how, looking at the numbers alone, ii would be impossible to establish causation: https://www.tylervigen.com/spurious-correlations
Mathe

0

Also, check this examples to see how a single statistic (correlation coefficient) can be misleading https://en.wikipedia.org/wiki/Correlation#/media/File:Correlation_examples2.svg
Mcyoungdad

0

I also have other variables that I use that confirm that the Y is dependent and the X is independent... but I didn't want to get into that in fear that I would muddy the waters.

Mathe

0

If you believe there are other variables at play that could result in changes of Y, and that were not being held constant as X took on different values, your analysis and conclusions would be useless.
Mathe

0

If you believe there are other variables at play that could result in changes of Y, and that were not being held constant as X took on different values, your analysis and conclusions would be useless.
Mcyoungdad

0

When I say other variables... I meant other data sets that are proxies for what I claim the Z score of the relationship between the two variables really "means." If my hypothesis is true (X is independent, Y dependent), than the Z score "means something specific." If my hypothesis in not true (Y is indep, and X is dep), than the Z score "Means something very different." I have confirmed correllation between these other "proxies" and the Z score to confirm hypothesis. Again, afraid to muddy..
Mcyoungdad

0

Mathe - Thank you for the link - I will read. However, please note that I fully understand correlation coefficients can be misleading. That is why my causation conclusion is based on (a) a deep understanding of the nature of the two variables, (2) the shape of the regression line, (3) other proxy datasets. I am hopeful that the link will explain why comparing trend lines does not provide more insight into the level of dual-direction-causality. Thank you so much!

Answer 1

Answers can only be viewed under the following conditions:

The questioner was satisfied with and accepted the answer, or
The answer was evaluated as being 100% correct by the judge.

View the answer

Hi again Mcyoungdad,

I understand that you want to explore dual-directional causality. The first thing comes to my mind is that you may consider "Granger" causality tests, which assess whether past values of one variable provide information about future values of another. This may help evaluate the causal relationship in both directions.

In general, to quantify the extent of dual-directional causality, you might compare the goodness of fit of your polynomial trend model with a simpler linear model using metrics (e.g. R-squared). A significant improvement in fit for the polynomial model could indicate the presence and strength of dual-directional causality. I cannot see clearly, the equations and R-squared values in your last slide, but it looks like that's the case. When comparing a polynomial trend model to a simpler linear model, a higher R-squared or a lower value for information criteria in the polynomial model suggests a better fit, indicating that the polynomial model explains more variance in the data.

As I mentioned in your other question, doing this for 15+ years, careful consideration of domain knowledge is key, and which you seem to have done. Interpreting and quantifying causality is even more complex. Correlation does not imply causation.

While Granger causality tests and model comparison metrics provide valuable insights, it's essential to be really cautious when approaching the causation and causality. Correlation and statistical tests can suggest relationships, but establishing true causation often requires additional evidence and context.

Granger causality tests are statistical methods used in time-series analysis to assess whether one time series can predict future values of another. They are based on the idea that if variable X Granger-causes variable Y, past values of X should contain information that helps predict Y. However, it's important to note that Granger causality doesn't imply true causation in the broader sense; it only detects predictive relationships within the data. While there isn't a direct metric to quantify the "pull" of the polynomial trend line away from the original hypothesis line toward the null hypothesis, you can use various statistical measures and visualization techniques to gain insights, as I mentioned above.

So, you can fit your original hypothesis model (asymmetric line) and the null hypothesis model (linear line) to your data. you can also fit the polynomial trend model. Then, evaluate and compare those in terms of the goodness of fit statistics. Generally, lower AIC and BIC values or higher R-squared values indicate better model fit. Plotting and visually inspecting the models can give you a sense of how well each model captures the observed patterns. You have already done these and the models fit seem to be OK (not so great)! Now, examining other things becomes more important.

You can also examine the residuals for each model. A good model should have residuals that are randomly distributed and exhibit no systematic patterns. You can also conduct hypothesis tests to assess the significance of coefficients in each model, like in the polynomial model, test whether the coefficients for higher-order terms are significantly different from zero. You can also use cross-validation techniques to assess how well each model generalizes to new data.

And the last and most important thing in all this is the "domain knowledge" which you already possess a good bunch! Your deep domain knowledge is the deciding factor here, as what I can see is OK model fits, etc. If one model aligns more closely with your understanding of the system, that could strengthen your argument for a particular causal direction.

No statistical model can prove causation definitively (as Mathe also mentioned)! However, a combination of statistical evidence, theoretical reasoning, and domain expertise can build a compelling case for a causal relationship. It's also crucial to be aware of potential confounding variables that might influence the results.

Again, please note that causation is a complex concept and often involves a combination of statistical evidence, theoretical reasoning, and practical understanding of the subject matter. Always interpret results with caution and seek validation through multiple approaches.

Kav10

2.1K

$Causality Question$

Answer

Related Questions

Search