From: <tjk1@ri...>  20090529 19:57:39

Gurus, I am implementing some simple Principal Component Analysis (PCA) in Python but I have run into trouble with the graphical output. I have calculated my scores and my loadings (just matrices with meancentered, univariate values) and I want to scatterplot them. However, to make the graph more useful I want to label each dot in the scatter plot and also color it. I am using Matplotlib, Pylab, and Scipy. For example, given a 3x3 matrix of scores called T, I want to: T,P,E = PCA_svd( X, standardize = True ) t1, t2 = T[:,0], T[:,1] properties = dict( alpha = 0.75, c = some_colors ) s1 = scatter( t1, t2 ,s = 50, **properties ) legend() grid( True ) show() And the result should show three dots of various colors with a legend describing each color, and a datalabel (say a twocharacter code, like AA, BB, CC) for each datapoint. I understand that pylab.scatter objects are not formatted correctly to use the pylab.legend command, and I was wondering if a patch has been written for this yet. I use Python 2.5.3 I have found one workaround for the legend that plots each group in color and then hacks with a Rectangle object, as follows: props = dict( alpha = 0.75, faceted = False ) Scores = scatter( t1, t2, c = 'red', s = 50, **props ) Loadings = scatter( p1, p2, c = 'blue', s = 50, **props ) redp = Rectangle( ( 0,0 ), 1, 1, facecolor = 'red' ) bluep = Rectangle( ( 0,0 ), 1, 1, facecolor = 'blue' ) legend( ( redp,bluep ),( 'Scores','Loadings' ) ) grid( True ) show() This works for varying colors across two groups of points, but it doesn't work for single datapoints (it says "ValueError: First argument must be a sequence") and it also does not allow me to label each datapoint with a twochar code. Any shoves in the right direction would be very much appreciated. Links to online examples and sourcecode especially so. Timothy Kinney 