$How do I compare categorical data with multiple uneven populations?$

Question

$How do I compare categorical data with multiple uneven populations?$

Is there a difference in lineage occurrence/proportion between study 1 compared to the other 4 studies?

There are 5 populations. We need to compare population 1 to each of the other 4 populations (4 previous studies).
We are comparing the occurrence of Lineages (4 Levels- A, B, C, D) in population 1 compared to their occurrence in each of the other 4 populations.

The Lineages are categorical (A,B,C, D) every sample will fit within one of these lineages. The sample sizes are uneven between the different studies, therefore I thought it best to work with proportions.

I thought that I could just use a proportion test or chi-square test but I'm a bit confused since I have the 4 lineages and the 5 studies. Do I just look at one lineage at a time and since it's a proportion, it factors in the other lineages?

i.e.: LineageB <- prop.test(x = c(198,140,31,35,205), n = c(213,140,31,39,250)); LineageB
p-value = 3.502e-08 (This isn't specific enough, tells me there is likely a different between all studies)

Since I'm only concerned with population 1 compared to each of the others should I only include my study and one other at a time?

Lineages	Pop 1	Pop 2	Pop 3	Pop 4	Pop 5
A	4	0	0	0	0
B	198	140	31	35	205
C	6	0	0	4	18
D	5	0	0	0	27
Total	213	140	31	39	250

Count of Lineage by Study.

Statistics Real Analysis

Sizzledee

13

Report

Answer

The answer is accepted.

Join Matchmaticians Affiliate Marketing Program to earn up to a 50% commission on every question that your affiliated users ask or answer.

Answer 1

Answers can only be viewed under the following conditions:

The questioner was satisfied with and accepted the answer, or
The answer was evaluated as being 100% correct by the judge.

View the answer

1. I would consider testing population 1 vs each other population separately. Because of this, you want to correct for multiple comparisons. You could use the Bonferroni correction.

2. A classic chi-square test for cells requires, as a rule of thumb, at least 5 observations in each category. Because of this, one typically combines categories (A-B, C-D) to increase the number of observations per cell.

3. If you follow 2., since there would only be two categories left to compare, it would be just a simple proportion test. This is good because it allows to make easier interpretations of the differences between populations.

Mathe

3.7K

$How do I compare categorical data with multiple uneven populations?$

Answer

Related Questions

Search