Combinations of factors not observed, non-full rank design matrix. How to explain to investigator?
So I am helping someone with a differential expression analysis, there are only 10 samples, two variables with two levels each. Let's say Sex (M/F) and Age (Old/Young). They originally wanted to model: ~ sex + age + sex*age
However Sex = F & Age = Young does not exist in
the data (no sample with that combination observed), so model matrix is not full rank and DESEQ model can't be specified.
I warned them of this and their solution was to concat the Sex and Age variables to a new var (let's just say V3) and run the model with only ~
V3
I know this technically works... (as in the design matrix is full rank), but I also know it isn't a great idea, basically bc we are extrapolating and are now unable to make any claims about M vs For Old vs. Young.
Any tips on how to explain this to the investigators?
Answer
Answers can only be viewed under the following conditions:
- The questioner was satisfied with and accepted the answer, or
- The answer was evaluated as being 100% correct by the judge.
Kav10
2K
The answer is accepted.
Join Matchmaticians Affiliate Marketing
Program to earn up to a 50% commission on every question that your affiliated users ask or answer.
- answered
- 539 views
- $8.00
Related Questions
- Bayesian Statistics - Zero Inflated Binomial Model - Calculate Posterior Conditional Distribution
- Correlation of Normal Random Variables
- Prove that $\lim_{n\rightarrow \infty} \int_{[0,1]^n}\frac{|x|}{\sqrt{n}}=\frac{1}{\sqrt{3}}$
- Causality Help!?!?
- Determining which excel T-Test to use
- Why is the t-test for two independent samples $\ t^* = \frac{(\bar{x}_1 - \bar{x}_2)}{\sqrt{\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2}}}$?
- Choosing the right statistical tests and how to organize the data accourdingly (student research project)
- Statistics- Probability, Hypotheses , Standard Error
The offered bounty is low for the level of the question.