Carl Love

Updated worksheet...

Answered: Carl Love 28070

April 14 2016

0 0

@Carl Love Here's an updated worksheet. I made the procedure faster, and it uses less memory. I also added several utility procedures, mostly related to plotting the clusters.

Fuzzy C-Means Clustering

>

restart:

>

FuzzyCMeans:= proc(
     X::Matrix(datatype= float[8]),   #The rows of X are the data points.
     #number of clusters and number of means:
     {clusters::And(posint, satisfies(x-> x > 1)):= 2},
     {fuzziness::satisfies(x-> x > 1):= 2},   #fuzziness factor
     {epsilon::positive:= 1e-4},   #convergence criterion
     {max_iters::posint:= 999},   #max number of iterations
     {Norm::procedure:= proc(V) local x; sqrt(add(x^2, x= V)) end proc}
)
option `Written by Carl J Love, 2016-Apr-13`;
description
    "Fuzzy C-Means clustering algorithm.
     See http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/cmeans.html
     and https://en.wikipedia.org/wiki/Fuzzy_clustering
    "
;
uses LA= LinearAlgebra;
local
     c:= clusters,     #number of clusters and number of means
     m:= fuzziness,    #fuzziness factor
     n:= op([1,1], X), #number of points
     d:= op([1,2], X), #dimension
     N:= Norm,         #d-dimensional vector norm
     #U[i,j] will be the degree (on a 0..1 scale) to which point i belongs to cluster j.
     U:= Matrix((n,c), datatype= float[8]),
     UM:= Matrix((n,c), datatype= float[8]), # U^~m
     #U1 is just an update copy of U.
     U1:= LA:-RandomMatrix((n,c), generator= 0..1, datatype= float[8]),
     C:= Matrix((c,d), datatype= float[8]), #The rows of C will be the cluster centers.
     e:= 2/(m-1), #fuzzy exponent used to update U
     err:= epsilon, #max matrix difference on current iteration
     i, j, k, jj #loop indices
;
     for jj to max_iters while err >= epsilon do
          ArrayTools:-Copy(U1, U);
          ArrayTools:-Copy(U, UM);
          map[evalhf, inplace](`^`, UM, m);
          for j to c do
               C[j,..]:= add(UM[i,j]*X[i,..], i= 1..n) / add(UM[i,j], i= 1..n)
          end do;
          for i to n do for j to c do
               U1[i,j]:= 1/add((N(X[i,..]-C[j,..])/N(X[i,..]-C[k,..]))^e, k= 1..c)
          end do end do;
          LA:-MatrixAdd(U,U1,1,-1,inplace); #U:= U-U1
          err:= max(map[evalhf, inplace](abs, U))
     end do;
     if jj > max_iters then
          error "Didn't converge; error = %1. Increase max_iters or epsilon.", err
     end if;
     userinfo(1, FuzzyCMeans, "Used", jj-1, "iterations.");
     (C,U1)
end proc:

Here are some utility procedures for working with the clusters. The first produces a list which gives for each point the cluster to which it most strongly belongs.

>	AssignToClusters:= (U::Matrix)-> map(proc(i) local p; member(max(U[i,..]), U[i,..], p); p end, [$1..op([1,1], U)]) :

The next three procedures are for plotting 2D clusters.

>

PlotClusters2D:= proc(X::Matrix, U::Matrix)
local A:= AssignToClusters(U), k, J:= [$1..op([1,1], U)], c:= op([1,2], U);
     plot(
          [seq(X[select(i-> A[i]=k, J), ..], k= 1..c)],
          color= [seq(COLOR(HUE, k/c), k= 1..c)],
          style= point, symbol= diagonalcross, symbolsize= 16, _rest
     )
end proc:

>	PlotClusterMeans2D:= proc(C::Matrix) local c:= op([1,1], C), k; plot( `[]`~(convert(C, listlist)), color= [seq(COLOR(HUE, k/c), k= 1..c)], style= point, symbol= circle, symbolsize= 32, _rest ) end proc:

>	PlotClustersAndMeans2D:= (X::Matrix, C::Matrix, U::Matrix)-> plots:-display( [PlotClusters2D(X,U), PlotClusterMeans2D(C)], gridlines= false, _rest ) :

>

#Test data:
X:=
     [[5.37,19.50],[5.73,19.44],[4.66,20.03],[5.66,20.38],[4.22,21.97],
      [5.08,20.67],[5.08,19.08],[4.54,20.06],[4.35,19.82],[5.19,20.72],
      [4.48,19.95],[5.76,19.81],[4.15,18.68],[6.37,20.60],[5.58,21.13],
      [5.76,19.91],[5.85,19.02],[5.00,19.71],[5.42,20.31],[4.77,21.45],
      [8.61,19.48],[10.70,20.31],[10.99,20.28],[11.68,21.28],[9.12,20.77],
      [10.30,20.07],[10.40,21.62],[10.95,20.34],[9.79,20.29],[9.69,20.86],
      [10.02,21.45],[11.05,20.19],[9.20,17.96],[10.49,19.88],[9.61,19.49],
      [10.33,19.59],[9.29,20.94],[10.17,19.64],[10.97,20.32],[10.08,19.16]
     ]
:
XM:= Matrix(X, datatype= float[8]):

>	infolevel[FuzzyCMeans]:= 1:

>	(C,U):= FuzzyCMeans( XM, #All options are set to their default values here: clusters= 2, fuzziness= 2, epsilon= 1e-4, max_iters= 999, Norm= proc(V) local x; sqrt(add(x^2, x= V)) end proc );

FuzzyCMeans: Used 8 iterations.

$C, U := Matrix(2, 2, {(1, 1) = 5.17622947731172, (1, 2) = 20.0938362193081, (2, 1) = 10.2161565054282, (2, 2) = 20.2331200701138}), Vector(4, {(1) = ` 40 x 2 `*Matrix, (2) = `Data Type: `*float[8], (3) = `Storage: `*rectangular, (4) = `Order: `*Fortran_order})$

(1)

>	PlotClustersAndMeans2D(XM, C, U);

The two clusters are obvious in this case; consequently, the number of iterations was low. Let's ask for three clusters:

>	(C,U):= CodeTools:-Usage(FuzzyCMeans(XM, clusters= 3));

FuzzyCMeans: Used 60 iterations.

memory used=60.99MiB, alloc change=0 bytes, cpu time=530.00ms, real time=534.00ms

$C, U := Matrix(3, 2, {(1, 1) = 9.59964568674007, (1, 2) = 19.8238795074764, (2, 1) = 10.6838151702604, (2, 2) = 20.4543667502028, (3, 1) = 5.14884499714405, (3, 2) = 20.0909363818171}), Vector(4, {(1) = ` 40 x 3 `*Matrix, (2) = `Data Type: `*float[8], (3) = `Storage: `*rectangular, (4) = `Order: `*Fortran_order})$

(2)

>	PlotClustersAndMeans2D(XM, C, U);

I had expected that the right cluster would be divided in two, but with see that it converged on a division of the left cluster. The algorithm only claims to converge to a local minimum. The initial solution guess is randomized, so perhaps another try:

>	(C,U):= CodeTools:-Usage(FuzzyCMeans(XM, clusters= 3));

FuzzyCMeans: Used 47 iterations.
memory used=47.79MiB, alloc change=0 bytes, cpu time=483.00ms, real time=481.00ms

$C, U := Matrix(3, 2, {(1, 1) = 5.27087202309135, (1, 2) = 19.5727941736791, (2, 1) = 5.20533446031621, (2, 2) = 20.7234689370591, (3, 1) = 10.2698266410416, (3, 2) = 20.2497388751727}), Vector(4, {(1) = ` 40 x 3 `*Matrix, (2) = `Data Type: `*float[8], (3) = `Storage: `*rectangular, (4) = `Order: `*Fortran_order})$

(3)

>	PlotClustersAndMeans2D(XM, C, U);

>

Download Fuzzy_clusters.mw

When to use k-means?...

Answered: Carl Love 28070

April 14 2016

0 1

@AndrewG Having read the link that you gave, k-means clustering seems to me like just a crude first attempt at C-means clustering. So, perhaps you can tell me when and why anyone would use k-means clustering given that they have access to C-means?

Need the bigger picture...

Answered: Carl Love 28070

April 13 2016

0 0

You wrote the above as if we were familiar with your assignment, as if we were your classmates who've already been working on it. But it's pretty much unintelligible without the bigger picture. How about posting the assignment itself?

Confirm general solution...

Answered: Carl Love 28070

April 13 2016

0 0

Please confirm via an attached worksheet that your Maple gives you a general solution. My Maple 16 doesn't.

I can't do it...

Answered: Carl Love 28070

April 13 2016

0 0

@vv I've not been able to do it. Have you?

Thanks...

Answered: Carl Love 28070

April 13 2016

0 1

@moses Thank you, Moses. The acknowledgement is much appreciated.

k-means clustering...

Answered: Carl Love 28070

April 13 2016

0 0

@AndrewG I'll have to research that. I had never heard of c-means or k-means clustering until you mentioned it above. My entire knowledge of it consists of the two websites I cite in the code and the Matlab help page that you cited. Please provide a link to the post to which you refer. I'll be happy to code in Maple any useful algorithm that I can find sufficient references for.

The independent case...

Answered: Carl Love 28070

April 13 2016

0 0

@Markiyan Hirnyk With your correction, I give it a vote up. It should be noted, however, that in the case where X and Y are independent with variance 1, the above code generates the same thing as did the OP's original suggestion.

map[inplace]...

Answered: Carl Love 28070

April 12 2016

0 0

@tomleslie The plain map recreates it once; in other words, it creates an output Array and leaves the input Array intact. If you use map[inplace], then the input is replaced with the output; in other words, a itself becomes the output, and there's no need to assign the result with :=. The inplace option is only available for rtables.

Regarding which is better practice, I'd say, in general, the list. Lists are more convenient for a variety of operations. Also, reading through the Maple library code, I don't recall many instances of a list being converted to an Array when a list operation would suffice.

Something to adjust...

Answered: Carl Love 28070

April 12 2016

0 0

@Markiyan Hirnyk

The Wikipedia article that you cited says that the final step to generate a sample point is mu + A.z. Your code is doing, essentially, z + mu + A.z.

map is okay...

Answered: Carl Love 28070

April 12 2016

0 0

@tomleslie It is only repeatedly recreating the list that is inefficient. Kitonum's first solution, using map, only recreates it once.

Some reading suggestions...

Answered: Carl Love 28070

April 12 2016

0 0

@Oak Staff It would take me writing a book to explain things at the level that you need. I suggest that you start with a beginner's book on computer programming in Python. Pay particular attention to any material dealing with procedures, functions, or recursion. IMO, Python is the freely available language that is most like Maple. Then read the Maple help pages such as ?proc, ?Statistics,Tally, ?table, ?nops, ?add, ?remember, ?elementwise, ?thisproc, ?list, ?local, ?if, and ?indices. Then move on to the Maple Programming Guide. The whole book is available through Maple's onboard help system.

Repeated list creation is inefficient...

Answered: Carl Love 28070

April 12 2016

0 0

@Kitonum Your second method, the modification of the OP's code, involves recreating the whole list for each distinct negative entry. That'll be quite inefficient for a long list.

I only translated...

Answered: Carl Love 28070

April 12 2016

0 0

@adel-00 I only translated the code. I don't have any overall understanding of it.

Yes...

Answered: Carl Love 28070

April 11 2016

0 0

@adel-00 Yes, I can translate this into Maple. It'll be a few hours though.

E-Mail Address:
Password:
Remember Me:	Automatically sign in on future visits

E-Mail Address:
Password:
Remember Me:	Automatically sign in on future visits

Ask a Question

Create a Post

28070 Reputation

25 Badges

MaplePrimes Activity

These are replies submitted by Carl Love

Updated worksheet...

When to use k-means?...

Need the bigger picture...

Confirm general solution...

I can't do it...

Thanks...

k-means clustering...

The independent case...

map[inplace]...

Something to adjust...

map is okay...

Some reading suggestions...

Repeated list creation is inefficient...

I only translated...

Yes...

Save this setting as your default sorting preference?

Ask a Question

Create a Post

Generating PDF…

Save this setting as your default sorting preference?
Note: You can change your preference any time in your account settings
Don't show this again

From:
To:

Custom Message (optional):