Taking the matrix product of an asymmetrically resolved edge weight matrix and it's transpose to impute a symmetric high-resolution edge weight matrix

Suppose there are 2 regions and each region contains 3 cities (6 cities total, cities are nested within regions, regional identity is exclusive, i.e. cities are in exactly 1 region). I am interested in the number of daily flights between all pairs of cities, but I don't care about the direction of the flights, i.e. I want a city by city weighted undirected graph, or a 6 x 6 symmetric edge weight matrix.

I have a 6 x 2 (city x region) matrix N of flight counts. My intuition is that if I take the matrix product of N and its transpose to yield M (N*N' = M) the result will be a scaled estimate of the city x city matrix. If I normalize M with the scalar ?N/?M (? = sum of all elements), then ?N = ?M, i.e. the total number of flights does not change (which must be true, logically).

For my empirical dataset, the result looks plausible. Is it correct in theory? I also intuit that the accuracy of this "upsampling" procedure is dependent on the ratio of cities to regions. As the number of defined regions increases, the accuracy improves until there is exactly one city per region in which case you are already starting with the ground truth. Also if the regions are not a regular tessellation then the imputed matrix will be biased by the regional definitions in a way not readily apparent to an observer of M with no knowledge of the regions or N. Are these intuitions also correct? I feel like I've stumbled upon something so simple it must have been rigorously examined, but I lack the mathematical vocabulary to look it up properly.

edit: A citation for a reputable paper that address my questions and describes the procedure would be just as valuable as an examination from first principles. 

1. My empirical dataset is a ~10k x ~400 matrix of neural connectivity but I think the air travel network metaphor is apt.
2. I have a hunch that this procedure is related to deconvolution (in a signal processing context).

  • Suppose you only have 3 cities contained in 1 region. Input: a matrix size 3x1, counting the numbers of flights from each city to another city (within the region). Example: (2,1,1), 2 departures from city A, 1 arriving at city B and 1 arriving at city C. Desired output: a symmetric 3 x 3 matrix (with zeros in the diagonal), telling us the number of flights for each pair of cities.

  • Comments: 1. Notice that the number of flights is the sum of the inputs divided by 2. In the example there were only two flights but the sum of the inputs is 4. Every arrival and departure is going to be counted independently in the input. 2. The symmetric matrix has three unknowns (elements above the diagonal). But we only have one equation, the total number of flights. We can't solve the unknown unless we introduce (two) more equations or assumptions.

  • 3. You propose to assume that the number of flights between two cities is proportional to the product of their total flights (needing an appropriate scaling). There is no way to say if an assumption is 'correct'. Consider in the previous example (2,1,1) that the rule (1*1)/2 = 0.5 would tell us there was non-zero flights between city B and C, even though there wasn't any. 0.5 can be approximate to 1 or 0, but the impact of this error will depend on the context, so we cant say if its correct.

  • As the other person says, when you are doing this you are making an explicit assumption about the form of the connectivity data. You can only do two things: (1) Show that this is consistent (ie show for any given "partial" data there is a set of complete data having the form you specify). (2) Show that it is plausible, for example take a bunch of neural networks from the internet of wherever and check whether they asymptotically satisfy the assumption you make.

  • The only way I see that this question can be answered is then one of the following: (1) Show that any partial connectivity data between regions can be induced by a more complete picture satisfying your assumptions. (2) Give some explicit numerical measure that tells you how far away a given (complete) connectivity matrix is from satisfying the assumption you make.