How to find all structures with the most equidistribution for a specific $f$ and $A$?
Suppose $f:A\to\mathbb{R}$ and $A$ is countably infinite subset of the real numbers. We can enumerate $A$ as $\left\{a_n\right\}_{n=1}^{\infty}$ and set $t$ as a natural number but the average $$\lim\limits_{t\to\infty}\frac{f(a_1)+f(a_2)+\cdot\cdot\cdot+f(a_t)}{t}$$ can vary depending on the enumeration. In order, to reduce the number of values the average varies with, I did the following:
Split $A$ into a sequence of finite subsets $F_1,F_2,...$ such that $F_1\subset F_2\subset...$ and $\bigcup\limits_{n=1}^{\infty}F_n=A$. The sequence of subsets will be denoted as a "structure of $A$" and a natural example (such as for $A=\left\{\frac{1}{m}:m\in\mathbb{N}\right\}$) would be $$\left\{F_n\right\}=\left\{\frac{1}{m}:m\in\mathbb{N},m\le n\right\}$$ Following this, I want a structure that has an average and the most equidistribution as defined below.
Section 1: Definition Of Average From Structure
There exists an average or $\text{avg}\left(\left\{F_n\right\},f\right)$, if for every arbitrarily small positive $\epsilon$ there exists a sufficiently large integer $N$ such for all $n\ge N$. $$\left|\frac{1}{|F_n|}\sum\limits_{x\in F_{n}}f(x)-\text{avg}\left(\left\{F_n\right\},f\right)\right|\le \epsilon$$
Depending on the structure we choose, we may get a different average. However, for an intuitive average, we want a structure with most equidistribution as shown below:
Section 2: Steps To Determining The Structure With The Most Equidistriubtion
1. Arrange the values in $F_n$ from least to greatest and take the difference between consecutive elements. Call this $\Delta F_n$. (Note $\Delta F_n$ could be a multi-set.)
Ex 1.1: If $A=\left\{\frac{1}{m}:m\in\mathbb{N}\right\}$ and $\left\{F_n\right\}=\left\{\frac{1}{m}:m\in\mathbb{N},m\le n\right\}$ then $F_4=\left\{1,1/2,1/3,1/4\right\}$, and $\Delta F_4=\left\{1-1/2,1/2-1/3,1/3-1/4\right\}=\left\{1/2,1/6,1/12\right\}$. (While this is not a multi-set, the example below is)
Ex 1.2: If $A=\mathbb{Q}\cap[0,1]$ and $\left\{F_n\right\}=\left\{\frac{j}{k}:j,k\in\mathbb{N},0\le j\le k, k\le n\right\}$ then the elements of $F_4$, arranged from least to greatest is, $F_4=\left\{0,1/4,1/3,1/2,2/3,3/4,1\right\}$ and $\Delta F_4=\left\{1/4-0,1/3-1/4,1/2-1/3,2/3-1/2,3/4-2/3,1-3/4\right\}=$ $\left\{1/4,1/12,1/6,1/6,1/12,1/4\right\}$. Here the elements repeat making this a multiset.
2. Divide $\Delta F_n$ by the sum of all its elements so we get a distribution where all the elements sum to 1. Call this $\Delta F_n/\sum\limits_{x\in\Delta F_n}x$ or "the information probability of the structure"
Ex 2.1: From example 1.1 note $\sum\limits_{x\in\Delta F_3}x=1/2+1/6+1/12=3/4$ and $\Delta F_3/\sum\limits_{x\in\Delta F_3}x=4/3\cdot \Delta F_3=\left\{2/3,2/9,1/9\right\}$. Note the elements in this set sum to $1$ and can hence act as a probability distribution (even though the elements are not acutal probabilties)
3. We now want to calculate the deviation of the information probability from being a discrete uniform distribution to find the structure with the most equidistribution. (The smaller the deviation the greater the structure's equidistribution.). Since elements of the multiset $\Delta F_n/\sum\limits_{x\in\Delta F_n}x$ always sum to $1$ we can calculate the deviation using Entropy (from Information Theory). This can be written as
$$E(F_n)=-\sum\limits_{j\in\Delta F_n/\sum\limits_{x\in \Delta F_n}x}j\log j$$
Ex 3.1: From example 2.1, $E(F_3)$ is the same as $-\sum\limits_{j\in\left\{2/3,2/9,1/9\right\}}j\log j=-\left(2/3\log \left(2/3\right)+2/9\log \left(2/9\right)+1/9\log\left( 1/9\right)\right)\approx .369$
The structure whose information probability gives the highest entropy has the greatest equidistribution. However, with this comes problems.
1. In order for an average to exist, some structures add more than one element as $n$ increases by one. This means when comparing different structures with a defined average, it is hard to trust which structure has the most equidistribution unless the structures have the same number of elements.
In order to fix this suppose $\mathbb{S}(A)$ represents all structures of $A$ with a defined average from function $f$ (see Section 1).
If $s,n,j\in\mathbb{N}$, I wish to find all $\left\{G_s\right\}\in \mathbb{S}(A)$ where (for each one):
$\forall\left(\left\{F_n\right\}\in\mathbb{S}(A)\right)\exists\left(s\in\mathbb{N}\right):\forall\left(j\ge s\right)\exists\left(n\in\mathbb{N}\right)\left(E(F_n)\le E(G_j)\Rightarrow|G_j|\le|F_n|\right)$.
Note structures $G_j$ and $F_n$ are equivelant when for $\left(\forall n\right)\left(\forall j\right) \left(\bigg[ \left|G_n\right|\le\left|F_j\right| \Rightarrow G_n\subseteq F_j \bigg] \lor \bigg[\left|F_j\right|\le\left|G_n\right|\Rightarrow F_j\subseteq G_n \bigg]\right)$We will call the statement above this one Statement A (not to be confused with set $A$).
Here is an example below:
Ex: I believe for $A=\left\{\frac{1}{m}:m\in\mathbb{N}\right\}$ and $$f(x)=\begin{cases} 2 & x\in\left\{\frac{1}{2m+1}:m\in\mathbb{N}\right\} \\ 1 & x\in\left\{\frac{1}{2m}:m\in\mathbb{N}\right\} \end{cases}$$ the only $G_s$ that satisfies Statement A is $\left\{1/\left[2^s/m\right]:1\le m\le 2^s\right\}$ (where $\left[\cdot\right]$ is the nearest integer function) but I'm not sure how to prove this? (If I'm not correct, I need to adjust Statement $A$.)
Questions
1) For $A=\left\{\frac{1}{m}:m\in\mathbb{N}\right\}$ and the $f$ shown above, is the assumption that the only $G_s$ that satisfies Statement A is $\left\{1/\left[2^s/m\right]:1\le m\le 2^s\right\}$, correct? If not, how do we correct Statement A so there is only one $G_s$?
2) For question 1), is there a simpler way to calculate the structure with the most equidistribution that gives the same result as my assumption for $G_s$ without using Entropy?
3) How do we determine $G_s$ for other types of $A$ and $f$?
- closed
- 615 views
- $100.00
Related Questions
- please use statistics to explain spooky phenomenon
- Ridge Regression for Panel Data Analysis
- A lower bound for an exponential series
- How to calculate how much profit a loan of a certain risk level (based on dice rolls) is likely to make?
- Statistics involving a numerical variable and a nonnumerical variable (date)
- joint continuous probability function finding covariance
- Probability question
- Combinatorics questions- can someone please help?
When you say: "In order for an average to exist, some structures add more than one element as t increases by one", I believe you mean "as n increases by one". where n is the index for the sets in the structure.
@Jip333 You’re right. I made the edits.
Correct me if I am misunderstanding, but it appears that G_s is not a structure on A. e.g. G_1 = {1, 2}, and 2 is not an element of A.
@Pxsort I thought G1={1,1/2} since G1={1/[2/1],1/[2/2]} where [] is the nearest integer function.
Oh I see.
You seem to be implicitly assuming a lot of properties about A. Really, A cannot be an arbitrary countable set, since its elements are totally ordered and can be added / subtracted / multiplied / divided. Effectively, this requires that A be a countable subset of the real numbers.
@Pxsort I made edits
Why do you keep asking the same question? What is it that motivates you?
@AlessandroIcari I thought if my question is clearer I would get a different response. Now my question is more mathematical. In most cases such an average would not exist but I do believe for the rationals and nowhere dense countably infinite sets one can find a unique G_s that has the most equidistribution.
I don't think the problem with your question is that it lacks mathematical precision, I think the problem with it is that there is no satisfactory solution. And I think I already showed you enough counterexamples and cases in which your problem can't be solved. Sometimes maths is like this, we don't always get what we hope for.
@AlexandroIsacri Can you apply your counterexample to the mathematics in my post. Show that a unique G_s cannot exist.
@AlexandroIscari I take that back, it’s clear for cases such as n*(-1)^n for integer n that an average cannot exist. But for other functions/their domain (such as the example in my post) if one or more averages exist (depending on the domains structure) does their exist a unique G_s?
I think the answer is "no" unless you add a lot of hypotheses. But we've already had a similar discussion, here the problem is essentially the same, the differences are not relevant enough to change the answer.