## How Do I Consolidate and Spread-Out Values Evenly in a Matrix?

Maple

Problem:

Suppose you have a bunch of 2D data points which:

1. May include points with the same x-value but different y-values; and
2. May be unevenly-spaced with respect to the x-values.

How do you clean up the data so that, for instance, you are free to construct a connected data plot, or perform a Discrete Fourier Transform? Please note that Curve Fitting and the Lomb–Scargle Method, respectively, are more effective techniques for these particular applications. Let's start with a simple example for illustration. Consider this Matrix:

`A := < 2, 5; 5, 8; 2, 1; 7, 8; 10, 10; 5, 7 >;`

Consolidate:

First, sort the rows of the Matrix by the first column, and extract the sorted columns separately:

```P := sort( A[..,1], output=permutation ); # permutation to sort rows by the values in the first column
U := A[P,1]; # sorted column 1
V := A[P,2]; # sorted column 2```

We can regard the sorted bunches of distinct values in U as a step in a stair case, and the goal is replace each step with the average of the y-values in V located on each step.

Second, determine the indices for the first occurrences of values in U, by selecting the indices which give a jump in x-value:

```m := numelems( U );
K := [ 1, op( select( i -> ( U[i] > U[i-1] ), [ seq( j, j=2..m ) ] ) ), m+1 ];
n := numelems( K );```

The element m+1 is appended for later convenience. Here, we can quickly define the first column of the consolidated Matrix:

`X1 := U[K[1..-2]];`

Finally, to define the second column of the consolidated Matrix, we take the average of the values in each step, using the indices in K to tell us the ranges of values to consider:

`Y1 := Vector[column]( n-1, i -> add( V[ K[i]..K[i+1]-1 ] ) / ( K[i+1] - K[i] ) );`

Thus, the consolidated Matrix is given by:

`B := < X1 | Y1 >;`

Spread Evenly:

To spread-out the x-values, we can use a sequence with fixed step size:

`X2 := evalf( Vector[column]( [ seq( X1[1]..X1[-1], (X1[-1]-X1[1])/(m-1) ) ] ) );`

For the y-values, we will interpolate:

`Y2 := CurveFitting:-ArrayInterpolation( X1, Y1, X2, method=linear );`

This gives us a new Matrix, which has both evenly-spaced x-values and consolidated y-values:

`C := < X2 | Y2 >;`

Plot:

```plots:-display( Array( [
plots:-pointplot( A, view=[0..10,0..10], color=green, symbol=solidcircle, symbolsize=15, title="Original Data", font=[Verdana,15] ),
plots:-pointplot( B, view=[0..10,0..10], color=red, symbol=solidcircle, symbolsize=15, title="Consolidated Data", font=[Verdana,15] ),
plots:-pointplot( C, view=[0..10,0..10], color=blue, symbol=solidcircle, symbolsize=15, title="Spread-Out Data", font=[Verdana,15] )
] ) );```

Sample Data with Noise:

For another example, let’s take data points from a logistic curve, and add some noise:

```# Noise generators
f := 0.5 * rand( -1..1 ):
g := ( 100 - rand( -15..15 ) ) / 100:

# Actual x-values
X := [ seq( i/2, i=1..20 ) ];

# Actual y-values
Y := evalf( map( x -> 4 / ( 1 + 3 * exp(-x) ), X ) );

# Matrix of points with noise
A := Matrix( zip( (x,y) -> [x,y], map( x -> x + f(), X ), map( y -> g() * y, Y ) ) );```

Using the method outlined above, and the general procedures defined below, define:

```B := ConsolidatedMatrix( A );
C := EquallySpaced( B, 21, method=linear );```

Visually:

```plots:-display( Array( [
plots:-pointplot( A, view=[0..10,0..5], symbol=solidcircle, symbolsize=15, color=green, title="Original Data", font=[Verdana,15] ),
plots:-pointplot( B, view=[0..10,0..5], symbol=solidcircle, symbolsize=15, color=red, title="Consolidated Data", font=[Verdana,15]  ),
plots:-pointplot( C, view=[0..10,0..5], symbol=solidcircle, symbolsize=15, color=blue, title="Spread-Out Data", font=[Verdana,15] )
] ) );```

Generalization:

Below are more generalized custom procedures, which are used in the above example. These also account for special cases.

```# Takes a matrix with two columns, and returns a new matrix where the new x-values are unique and sorted,
# and each new y-value is the average of the old y-values corresponding to the x-value.
ConsolidatedMatrix := proc( A :: 'Matrix'(..,2), \$ )

local i, j, K, m, n, P, r, U, V, X, Y:

# The number of rows in the original matrix.
r := LinearAlgebra:-RowDimension( A ):

# Return the original matrix should it only have one row.
if r = 1 then
return A:
end if:

# Permutation to sort first column of A.
P := sort( A[..,1], ':-output'=permutation ):

# Sorted first column of A.
U := A[P,1]:

# Corresponding new second column of A.
V := A[P,2]:

# Return the sorted matrix should all the x-values be distinct.
if numelems( convert( U, ':-set' ) ) = r then
return < U | V >:
end if:

# Indices of first occurrences for values in U. The element m+1 is appended for convenience.
m := numelems( U ):
K := [ 1, op( select( i -> ( U[i] > U[i-1] ), [ seq( j, j=2..m ) ] ) ), m+1 ]:
n := numelems( K ):

# Consolidated first column.
X := U[K[1..-2]]:

# Determine the consolidated second column, using the average y-value.
Y := Vector[':-column']( n-1, i -> add( V[ K[i]..K[i+1]-1 ] ) / ( K[i+1] - K[i] ) ):

return < X | Y >:

end proc:

# Procedure which takes a matrix with two columns, and returns a new matrix of specified number of rows
# with equally-spaced x-values, and interpolated y-values.
# It accepts options that can be passed to ArrayInterpolation().
EquallySpaced := proc( M :: 'Matrix'(..,2), m :: posint )

local A, i, r, U, V, X, Y:

# Consolidated matrix, the corresponding number of rows, and the columns.
A := ConsolidatedMatrix( M ):
r := LinearAlgebra:-RowDimension( A ):
U, V := evalf( A[..,1] ), evalf( A[..,2] ):

# If the consolidated matrix has only one row, return it.
if r = 1 then
return A:
end if:

# If m = 1, i.e. only one equally-spaced point is requested, then return a matrix of the averages.
if m = 1 then
return 1/r * Matrix( [ [ add( U ), add( V ) ] ] ):
end if:

# Equally-spaced x-values.
X := Vector[':-column']( [ seq( U[1]..U[-1], (U[-1]-U[1])/(m-1), i=1..m ) ] ):

# Interpolated y-values.
Y := CurveFitting:-ArrayInterpolation( U, V, X, _rest ):

return < X | Y >:

end proc:```

Worth Checking Out:

﻿