Combinations of factors not observed, non-full rank design matrix. How to explain to investigator?



So I am helping someone with a differential expression analysis, there are only 10 samples, two variables with two levels each. Let's say Sex (M/F) and Age (Old/Young). They originally wanted to model: ~ sex + age + sex*age

However Sex = F & Age = Young does not exist in

the data (no sample with that combination observed), so model matrix is not full rank and DESEQ model can't be specified.

I warned them of this and their solution was to concat the Sex and Age variables to a new var (let's just say V3) and run the model with only ~

V3

I know this technically works... (as in the design matrix is full rank), but I also know it isn't a great idea, basically bc we are extrapolating and are now unable to make any claims about M vs For Old vs. Young.

Any tips on how to explain this to the investigators?
  • Savionf Savionf
    +3

    The offered bounty is low for the level of the question.

Answer

Answers can be viewed only if
  1. The questioner was satisfied and accepted the answer, or
  2. The answer was disputed, but the judge evaluated it as 100% correct.
View the answer
Kav10 Kav10
1.4K
The answer is accepted.
Join Matchmaticians Affiliate Marketing Program to earn up to 50% commission on every question your affiliated users ask or answer.