A methodology to determine the maximum value of weighted Gini–Simpson index

July 21, 2016

Sequential forward procedure

Next, it is outlined a sequential forward procedure to reach the maximum point (maximizer) of the index. First, one sorts the weights in a decreasing order: (w_{left( 1 right)} ge w_{left( 2 right)} ge cdots ge w_{left( m right)}). From Eq. (7), combined with the evaluation of limits and partial derivatives, it is known that it is guaranteed that the three highest weighted components will be in the optimal solution, with strictly positive proportions; the fourth highest weighted component is the first candidate to have a null value; then, one computes (alpha_{4}^{*} = 2/left( {sumnolimits_{i = 1}^{4} {1/w_{left( i right)} } } right)) and if (w_{left( 4 right)} le alpha_{4}^{*}) stop; hence (p_{left( 4 right)}^{*} = cdots = p_{left( m right)}^{*} = 0) and the number of vertices is set m? = 3; otherwise one has (w_{left( 4 right)} alpha_{4}^{*}) and proceeds computing (alpha_{5}^{*} = 3/left( {sumnolimits_{i = 1}^{5} {1/w_{left( i right)} } } right)); whether (w_{left( 5 right)} le alpha_{5}^{*}) stop, and reset the values (p_{left( 5 right)}^{*} = cdots = p_{left( m right)}^{*} = 0) with m? = 4; otherwise, (w_{left( 5 right)} alpha_{5}^{*}) and one proceeds until obtaining (w_{left( k right)} le alpha_{k}^{*}), then stop, setting (p_{left( k right)}^{*} = cdots = p_{left( m right)}^{*} = 0); hence m? = k ? 1; in any case, the maximizer is located in a m?-face of the original m ? 1 simplex.

Now, formulas (4) and (5) may be used replacing m by m? and calculating the optimal proportions and the maximum value of the index D
_w with the corresponding set of weights—all the remaining optimal proportions being null and the respective weights discarded from the evaluation.

Exemplifying with a relatively small dimension m = 5, which enables the lower bound of a pseudo-optimal coordinate to be ?1, as was shown in the calculus of the limits in the previous section. If the values of the weights are w
₍₁₎ = 5, w
₍₂₎ = 4, w
₍₃₎ = 3, w
₍₄₎ = 2 and w
₍₅₎ = 1, then the value of the Lagrange multiplier computed with (3) and all weights (m = 5) gives the result ?* = 1.3139 implying that w
₍₅₎ ?*; using the sequential procedure, one calculates (alpha_{ 4 }^{*} = 1. 5585) and as (w_{left( 4 right)} alpha_{4}^{*}) hence (p_{left( 5 right)}^{*} = 0) and formulas (4) and (5) may be applied with m? = 4, discarding w
₍₅₎ = 1 from the calculations, giving the results of the optimal proportions: (p_{left( 1 right)}^{*} = 0.3441), (p_{left( 2 right)}^{*} = 0.3052), (p_{left( 3 right)}^{*} = 0.2403), (p_{left( 4 right)}^{*} = 0.1104) and (p_{left( 5 right)}^{*} = 0). The maximum of the index in this case evaluates to (D_{w}^{*} = 2.7208).

Changing the weights to be: w
₍₁₎ = 50, w
₍₂₎ = 40, w
₍₃₎ = 30, w
₍₄₎ = 2 and w
₍₅₎ = 1, and using the sequential forward procedure one computes (alpha_{ 4 }^{*} = 3.4582) and verify that (w_{left( 4 right)} alpha_{4}^{*}) hence sets (p_{left( 4 right)}^{*} = p_{left( 5 right)}^{*} = 0) and m? = 3, thus discarding w
₍₄₎ and w
₍₅₎, proceeding to evaluate the non-null coordinates with Eq. (7), so obtaining the results: (p_{left( 1 right)}^{*} = 0.3724), (p_{left( 2 right)}^{*} = 0.3404) and (p_{left( 3 right)}^{*} = 0.2872). In this case, the maximum value is (D_{w}^{*} = 26.808) and, in this example, whether formula (5) was used blindly with all the original weights (m = 5) one would obtain the wrong pseudo-maximum value of 29.324 is misvalued about 10 % relative to the true value. When the dimension of the simplex increases, and the weights are disparate, this type of error could get worse in a kind of curse of dimensionality.

Sequential forward procedure

You May Also Like

E. coli cases in two states linked to Chipotle

To Pick a Great Gift, It’s Better to Give AND Receive

Research identifies barriers in tracking meals and what foodies want