# Statistical analysis on a 2 dimensional data set

Hi all,

I work in computational biology field, and I was recentedly given this particular data:
The data itself is generated by identifying and scoring residue interactions of a protein over time. Briefly, if two residues make contacts within a frame (unit time), then this pair is given a score based on the type and the strength of the contact.
An example data set would look like this:

 Frame Pair Score 0 A_B 0.31 1 B_D 2.16 2 A_C 5.30
Scores are all positive floats. A pair that does not have a score within the frame will automatically have a score of 0.
The problem given by my PI is to come up with a metric to score every residue (not residue pair) based on this dataset. In another word, I'd like to know a way to score each residue based on the frequency and strength of contacts said residue makes.

This problem is very difficult for me because each residue has multiple scores (from multiple contact pairs) within a frame, and across multiple frames. In my very primitive understanding, for every residue, there are two dimensions that require statistical analysis: its contacting pairs and time. My training before this has been purely biological and hence I have little knowledge about statistics. I tried using Z scores, but it does not seem to work well with two dimensions.
I have attached a portion of the dataset. It contains 25 frames for demonstration purpose. I would like a detailed explanation of which method to use, why said method is appropriate, and how to implement them.

This is my first time posting; I hope the bounty amount is appropriate for such a question.

Thank you very much!
• I am a statistician and I would gladly work on this problem because it seems interesting. This would involve some back and forth discussion between the two. For example, how many residues, residues pair and frames will there be in the real dataset. This is relevant because it tells us how much information is available for this variables or sources of information. Because of the back and forth nature and the expertise it requires, I would suggest increasing the bounty as well.

• Should we assume the real dataset looks very much like the sample dataset you provided?