You can subscribe to this list here.
2000 |
Jan
(8) |
Feb
(49) |
Mar
(48) |
Apr
(28) |
May
(37) |
Jun
(28) |
Jul
(16) |
Aug
(16) |
Sep
(44) |
Oct
(61) |
Nov
(31) |
Dec
(24) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2001 |
Jan
(56) |
Feb
(54) |
Mar
(41) |
Apr
(71) |
May
(48) |
Jun
(32) |
Jul
(53) |
Aug
(91) |
Sep
(56) |
Oct
(33) |
Nov
(81) |
Dec
(54) |
2002 |
Jan
(72) |
Feb
(37) |
Mar
(126) |
Apr
(62) |
May
(34) |
Jun
(124) |
Jul
(36) |
Aug
(34) |
Sep
(60) |
Oct
(37) |
Nov
(23) |
Dec
(104) |
2003 |
Jan
(110) |
Feb
(73) |
Mar
(42) |
Apr
(8) |
May
(76) |
Jun
(14) |
Jul
(52) |
Aug
(26) |
Sep
(108) |
Oct
(82) |
Nov
(89) |
Dec
(94) |
2004 |
Jan
(117) |
Feb
(86) |
Mar
(75) |
Apr
(55) |
May
(75) |
Jun
(160) |
Jul
(152) |
Aug
(86) |
Sep
(75) |
Oct
(134) |
Nov
(62) |
Dec
(60) |
2005 |
Jan
(187) |
Feb
(318) |
Mar
(296) |
Apr
(205) |
May
(84) |
Jun
(63) |
Jul
(122) |
Aug
(59) |
Sep
(66) |
Oct
(148) |
Nov
(120) |
Dec
(70) |
2006 |
Jan
(460) |
Feb
(683) |
Mar
(589) |
Apr
(559) |
May
(445) |
Jun
(712) |
Jul
(815) |
Aug
(663) |
Sep
(559) |
Oct
(930) |
Nov
(373) |
Dec
|
From: Chris B. <Chr...@no...> - 2002-06-27 22:06:09
|
Norman Davis wrote: > How is a > non-contiguous array created? By slicing an array. Since slicing created a "view" into the same data, it may not represent a contiguous portion of memory. Example: >>> from Numeric import * >>> a = ones((3,4)) >>> a array([[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]) >>> a.iscontiguous() 1 # a newly created array will always be contiguous >>> b = a[3:3,:] >>> b.iscontiguous() 1 # sliced this way, you get a contiguous array >>> c = a[:,3:3] >>> c.iscontiguous() 0 #but sliced another way you don't -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chr...@no... |
From: Norman D. <nd...@sp...> - 2002-06-27 21:07:19
|
Hi All, In the "Copy on demand" discussion, the differences between ravel and flat were discussed with regards to contiguous/non-contiguous arrays. I want to experiment, but after looking/researching I can't figure it out: How is a non-contiguous array created? Thanks. Norman Davis Space Data Corporation |
From: Bernard F. <fra...@er...> - 2002-06-27 00:50:35
|
Bernard Frankpitt <fra...@er...> writes: >> My preference would be >> >> Copy semantics for a=b >> View semantics for a=b.view (or some other explicit syntax) > > And Alexander Schmolck Replies: > Although I have been arguing for copy semantics for a=b[c:d], what > you want is > not really possible (a=b creates and always will create an alias in > python -- Yes, you are right. In my haste I left out the slice notation Bernie |
From: Eric M. <e.m...@po...> - 2002-06-26 18:47:05
|
Hello Perry, On Wednesday 26 June 2002 19:29, Perry Greenfield wrote: > ... > > 2. Because I'm running two versions of Python (because Zope > > and a lot of Zope/C products depend on a particular version) > > the 'development' Python is installed in /usr/local/bin > > (whereas SuSE's python is in /usr/bin). > > It probably wouldn't do any harm if the manual would include > > a hint at the '--prefix' option and mention an alternative > > Python installation like: > > > > /usr/local/bin/python ./setup.py install --prefix=3D/usr/local > > Good idea. And perhaps another suggestion: no mention is made of the 'setupall.py' script... and setup.py does _not_ install the LinearAlgebra2 (including our favorite SVD ;-), Convolve, RandomArray2 and FFT2 packages. I successfully installed them with: python ./setupall.py install Other minor notes: #1: No FFT2.pth file is generated (the others are ok). It should just include the string 'FFT2'. #2: While RandomArray2 etc. nicely stay away from a concurrently imported Numeric.RandomArray, shouldn't Convolve, for orthogonality, be named Convolve2? (cuz who knows, numarray's Convolve may be backported to Numeric in the future, for comparative testing etc.). Of course in the end, when numarray is to replace Numeric, the '2' could be dropped altogether (breaking some programs then ;-) #3: LinearAlgebra2, RandomArray2 and Convolve have empty __doc__ 's. FFT and these 3 have no __version__ attributes, either (like numarray itself, too). Module sys uses a tuple 'version_info': >>> sys.version_info (2, 2, 1, 'final', 0) allowing fine-grained version testing and e.g. conditional importing etc. based on that. This may be a good idea for numarray, where interfaces may change and you could thus allow your code to support multiple (or rather, evolving) versions of numarray. Btw: imho __versioninfo__ or just __version__ would be a better standard attribute (for all modules) allowing a standard way of testing for major/minor version number, if __version__[0] >=3D 2: etc= () Ideally, numarray's sub-packages' numbers would be in sync with that of numarray itself. Numeric's __version__ is a string, which is not so handy, either. #4: It is very helpful that there are a large number of self-tests of the packages, together with expected values. E.g.: Average of 10000 chi squared random numbers with 11 degrees of freedom (should be about 11 ): 11.0404176623 Variance of those random numbers (should be about 22 ): 21.6517761217 Skewness of those random numbers (should be about 0.852802865422 ): 0.71= 8573002875 But sometimes you wonder (e.g. 0.85 / 0.71) if deviations are not too serious. Perhaps a 95%-int or std.dev. could be added? > >... > Thanks very much for the feedback. > > Perry You're welcome, they're just minor things one notices in the beginning and tends to ignore later; please say so if this kind of feedback should be postponed for later. Bye-bye, Eric --=20 Eric Maryniak <e.m...@po...> WWW homepage: http://pobox.com/~e.maryniak/ Mobile phone: +31 6 52047532, or (06) 520 475 32 in NL. Puzzle: what's another word for synonym? |
From: Perry G. <pe...@st...> - 2002-06-26 17:29:19
|
Hi Eric, Todd Miller should answer these but he is away for a few days. > > 1. When running 'python setup.py' and 'python setup.py --help' > I was surprised to see that already source generation > took place: > > Using EXTRA_COMPILE_ARGS = [] > generating new version of Src/_convmodule.c > ... > generating new version of Src/_ufuncComplex64module.c > > Normally, you would expect that at build/install time. > Yes, it looks like it does the code generation regardless of the option. We should change that. > 2. Because I'm running two versions of Python (because Zope > and a lot of Zope/C products depend on a particular version) > the 'development' Python is installed in /usr/local/bin > (whereas SuSE's python is in /usr/bin). > It probably wouldn't do any harm if the manual would include > a hint at the '--prefix' option and mention an alternative > Python installation like: > > /usr/local/bin/python ./setup.py install --prefix=/usr/local > Good idea. > 3. After installation, I usually test the success of a library's > import by looking at version info (especially with multiple > installations, see [2]). However, numarray does not seem to > have version info? : > > # python > Python 2.2.1 (#1, Jun 25 2002, 20:45:02) > [GCC 2.95.3 20010315 (SuSE)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import sys > >>> sys.version > '2.2.1 (#1, Jun 25 2002, 20:45:02) \n[GCC 2.95.3 20010315 (SuSE)]' > >>> sys.version_info > (2, 2, 1, 'final', 0) > > >>> import Numeric > >>> Numeric.__version__ > '21.3' > > >>> import numarray > >>> numarray.__version__ > Traceback (most recent call last): > File "<stdin>", line 1, in ? > AttributeError: 'module' object has no attribute '__version__' > >>> numarray.version > Traceback (most recent call last): > File "<stdin>", line 1, in ? > AttributeError: 'module' object has no attribute 'version' > > The __doc__ string: > 'numarray: The big enchilada numeric module\n\n > $Id: numarray.py,v 1.36 2002/06/17 14:00:20 jaytmiller Exp $\n' > does not seem to give a hint at the version (i.c. 0.3.4), either. > Well, I remember putting this on the to do list and thought it had been done, but obviously not. I'm sure Todd will take care of these. Thanks very much for the feedback. Perry |
From: Eric M. <e.m...@po...> - 2002-06-26 16:33:40
|
Dear crunchers, Please excuse me for dropping a feature request here as I'm new to the list and don't have the 'feel' of this list yet. Should feature requests be submitted to the bug tracker? Anyways, I installed Numarray on a SuSE/Linux box, following the Numarray PDF manual's directions. Having installed Python packages (like, ehm, Numeric) before, here are a few impressions: 1. When running 'python setup.py' and 'python setup.py --help' I was surprised to see that already source generation took place: Using EXTRA_COMPILE_ARGS =3D [] generating new version of Src/_convmodule.c =2E.. generating new version of Src/_ufuncComplex64module.c Normally, you would expect that at build/install time. 2. Because I'm running two versions of Python (because Zope and a lot of Zope/C products depend on a particular version) the 'development' Python is installed in /usr/local/bin (whereas SuSE's python is in /usr/bin). It probably wouldn't do any harm if the manual would include a hint at the '--prefix' option and mention an alternative Python installation like: /usr/local/bin/python ./setup.py install --prefix=3D/usr/local 3. After installation, I usually test the success of a library's import by looking at version info (especially with multiple installations, see [2]). However, numarray does not seem to have version info? : # python Python 2.2.1 (#1, Jun 25 2002, 20:45:02) [GCC 2.95.3 20010315 (SuSE)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.version '2.2.1 (#1, Jun 25 2002, 20:45:02) \n[GCC 2.95.3 20010315 (SuSE)]' >>> sys.version_info (2, 2, 1, 'final', 0) >>> import Numeric >>> Numeric.__version__ '21.3' >>> import numarray >>> numarray.__version__ Traceback (most recent call last): File "<stdin>", line 1, in ? AttributeError: 'module' object has no attribute '__version__' >>> numarray.version Traceback (most recent call last): File "<stdin>", line 1, in ? AttributeError: 'module' object has no attribute 'version' The __doc__ string: 'numarray: The big enchilada numeric module\n\n $Id: numarray.py,v 1.36 2002/06/17 14:00:20 jaytmiller Exp $\n' does not seem to give a hint at the version (i.c. 0.3.4), either. Well, enough nitpicking for now I guess. Thanks to the Numarray developers for this project, it's much appreciated. Bye-bye, Eric --=20 Eric Maryniak <e.m...@po...> WWW homepage: http://pobox.com/~e.maryniak/ Mobile phone: +31 6 52047532, or (06) 520 475 32 in NL. An error in the premise will appear in the conclusion. |
From: Alexander S. <a.s...@gm...> - 2002-06-26 13:29:46
|
Bernard Frankpitt <fra...@er...> writes: > My preference would be > > Copy semantics for a=b > View semantics for a=b.view (or some other explicit syntax) Although I have been arguing for copy semantics for a=b[c:d], what you want is not really possible (a=b creates and always will create an alias in python -- and this is really a good design decision; just compare it to other languages that do different things depending on what you are assigning). alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.S...@gm... http://www.dcs.ex.ac.uk/people/aschmolc/ |
From: Bernard F. <fra...@er...> - 2002-06-26 12:05:44
|
My preference would be Copy semantics for a=b View semantics for a=b.view (or some other explicit syntax) Bernie |
From: Magnus L. H. <ma...@he...> - 2002-06-25 21:00:46
|
Thanks for the input on k-means clustering, but the main questionw as actully this... If I have the following: for i in xrange(k): w[i] = average(compress(C == i, V, 0)) ... can that be expressed without the Python for loop? (I.e. without using compress etc.) I want w[i] to be the average of the vectors in V[x] for which C[x] == i... -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org |
From: Travis N. V. <tr...@en...> - 2002-06-25 19:24:37
|
---------------------------------------- Python for Scientific Computing Workshop ---------------------------------------- CalTech, Pasadena, CA Septemer 5-6, 2002 http://www.scipy.org/site_content/scipy02 This workshop provides a unique opportunity to learn and affect what is happening in the realm of scientific computing with Python. Attendees will have the opportunity to review the available tools and how they apply to specific problems. By providing a forum for developers to share their Python expertise with the wider industrial, academic, and research communities, this workshop will foster collaboration and facilitate the sharing of software components, techniques and a vision for high level language use in scientific computing. The two-day workshop will be a mix of invited talks and training sessions in the morning. The afternoons will be breakout sessions with the intent of getting standardization of tools and interfaces. The cost of the workshop is $50.00 and includes 2 breakfasts and 2 lunches on Sept. 5th and 6th, one dinner on Sept. 5th, and snacks during breaks. There is a limit of 50 attendees. Should we exceed the limit of 50 registrants, the 50 persons selected to attend will be invited individually by the organizers. Discussion about the conference may be directed to the SciPy-user mailing list: mailto:sci...@sc... http://www.scipy.org/MailList ------------- Co-Hosted By: ------------- The National Biomedical Computation Resource (NBCR, SDSC, San Diego, CA) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ http://nbcr.sdsc.edu The mission of the National Biomedical Computation Resource at the San Diego Supercomputer Center is to conduct, catalyze, and enable biomedical research by harnessing advanced computational technology. The Center for Advanced Computing Research (CACR, CalTech, Pasadena, CA) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ http://nbcr.sdsc.edu CACR is dedicated to the pursuit of excellence in the field of high-performance computing, communication, and data engineering. Major activities include carrying out large-scale scientific and engineering applications on parallel supercomputers and coordinating collaborative research projects on high-speed network technologies, distributed computing and database methodologies, and related topics. Our goal is to help further the state of the art in scientific computing. Enthought, Inc. (Austin, TX) ^^^^^^^^^^^^^^^ http://enthought.com Enthought, Inc. provides business and scientific computing solutions through software development, consulting and training. Enthought also fosters the development of SciPy (http://scipy.org), an open source library of scientific tools for Python. |
From: Konrad H. <hi...@cn...> - 2002-06-25 13:42:48
|
> that is certainly more black and white than reality. I am one > of those 100 users and I would (will) certainly go through the > code that I use on a daily basis (and the other code that I use I certainly appreciate any help, but this is not just a matter of amount of time, but also of risk, the risk of introducing bugs. The package that you are using, Scientific Python, is the lesser of my worries, as the individual parts are very independent. My other package, MMTK, is not only bigger, but also consists of many tightly coupled modules. Moreover, I am not aware of any user except for myself who knows the code well enough to be able to work on such an update project. Finally, this is not just my personal problem, there is lots of NumPy code out there, publically released or not, whose developers would face the same difficulties. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hi...@cn... Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- |
From: Magnus L. H. <ma...@he...> - 2002-06-25 13:29:27
|
Janne Sinkkonen <Jan...@hu...>: > [snip] > > Maybe this helps (old code, may contain some suboptimal or otherwise > weird things): Thanks :) -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org |
From: Janne S. <Jan...@hu...> - 2002-06-25 12:03:45
|
> Using argmin it should be relatively easy to assign each vector to the > cluster with the closest representative (using sum((x-y)**2) as the > distance measure), but how do I calculate the new representatives > effectively? (The representative of a cluster, e.g., 10, should be the > average of all vectors currently assigned to that cluster.) I could > always use a loop and then compress() the data based on cluster > number, but I'm looking for a way of calculating all the averages > "simultaneously", to avoid using a Python loop... I'm sure there's a > simple solution -- I just haven't been able to think of it yet. Any > ideas? Maybe this helps (old code, may contain some suboptimal or otherwise weird things): from Numeric import * from RandomArray import randint import sys def squared_distances(X,Y): return add.outer(sum(X*X,-1),sum(Y*Y,-1))- 2*dot(X,transpose(Y)) def kmeans(data,M, wegstein=0.2, r_convergence=0.001, epsilon=0.001, debug=0, minit=20): """Computes kmeans for DATA with M centers until convergence in the sense that relative change of the quantization error is less than the optional RCONV (3rd param). WEGSTEIN (2nd param), by default .2 but always between 0 and 1, stabilizes the convergence process. EPSILON is used to quarantee centers are initially all different. DEBUG causes some intermediate output to appear to stderr. Returns centers and the average (squared) quantization error. """ N,D=data.shape # Selecting the initial centers has to be done carefully. # We have to ensure all of them are different, otherwise the # algorithm below will produce empty classes. centers=[] if debug: sys.stderr.write("kmeans: Picking centers.\n") while len(centers)<M: # Pick one data item randomly candidate=data[randint(N)] if len(centers)>0: d=minimum.reduce(squared_distances(array(centers), candidate)) else: d=2*epsilon if d>epsilon: centers.append(candidate) if debug: sys.stderr.write("kmeans: Iterating.\n") centers=array(centers) qerror,old_qerror,counter=None,None,0 while (counter<minit or old_qerror==None or (old_qerror-qerror)/qerror>r_convergence): # Initialize # Not like this, you get doubles: centers=take(data,randint(0,N,(M,))) # Iterate: # Squared distances from data to centers (all pairs) distances=squared_distances(data,centers) # Matrix telling which data item is closest to which center x=equal.outer(argmin(distances), arange(centers.shape[0])).astype(Float32) # Compute new centers centers=( ( wegstein)*(dot(transpose(x),data)/sum(x)[...,NewAxis]) + (1.0-wegstein)*centers) # Quantization error old_qerror=qerror qerror=sum(minimum.reduce(distances,1))/N counter=counter+1 if debug: try: sys.stderr.write("%f %f %i\n" %(qerror,old_qerror,counter)) except TypeError: sys.stderr.write("%f None %i\n" %(qerror,counter)) return centers, qerror -- Janne |
From: Scott R. <ra...@ph...> - 2002-06-25 00:05:54
|
Hi Konrad, On Sun, Jun 23, 2002 at 10:20:35AM +0200, Konrad Hinsen wrote: > > be even more inefficient. If I really had large amounts of code that needed > > that conversion, I'd be tempted to write such a function with an additional > > twist: have it monitor the input argument type whenever the program is run and > > I have large amounts of code that would need conversion. However, it > is code that myself and about 100 other users rely on for their daily > work, so it won't be the subject of empirical fixing of any kind. > Either there will be an automatic procedure that is guaranteed to keep > the code working, or there won't be any update. I think you are painting an overly bleak picture -- and one that is certainly more black and white than reality. I am one of those 100 users and I would (will) certainly go through the code that I use on a daily basis (and the other code that I use less frequently) -- just as I have every time there is an update to the Python core or your code. Hell, some of those 30000 line of "your" code are actually _my_ code. And out of those 100 other users, I'd be willing to bet a beer or three that at least a couple would help to track down incompatibilities as well. Many (perhaps even most) of the problems will be able to be spotted by simply running the test codes provided with the individual modules. By generously releasing your code, you have made it possible for your code to become part of my -- and many others -- "standard library". And it is a part that I don't want to get rid of. I truly hope that this incompatibility (i.e. copy vs view) and the time that it will take to update older code will not cause many potentially beneficial (or at least requested) features/changes to be dropped. Scott -- Scott M. Ransom Address: McGill Univ. Physics Dept. Phone: (514) 398-6492 3600 University St., Rm 338 email: ra...@ph... Montreal, QC Canada H3A 2T8 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 |
From: Alexander S. <a.s...@gm...> - 2002-06-24 19:27:58
|
Aureli Soria Frisch <Aur...@ip...> writes: > Has someone made similar experiences by 'pickling' arrays? Could it be a > problem of the different computers running versions of Python from 2.0 to > 2.2.1? Or a problem of different versions of NumPy? Yes -- pickling isn't meant to work across different python versions (it might to some extent, but I wouldn't try it unless there is no way around it). Using netcdf as a data format instead of pickling might also be a solution (if intermediate storage on the disk is not too inefficient, but your original approach involved that anyway). Konrad Hinsen has written a nice wrapper for python that is quite easy to use: http://starship.python.net/crew/hinsen/scientific.html. alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.S...@gm... http://www.dcs.ex.ac.uk/people/aschmolc/ |
From: Tim C. <tc...@op...> - 2002-06-24 19:14:04
|
Aureli Soria Frisch wrote: > > Hi all, > > I am trying to make run a numerical computation (with arrays) in > different computers simultaneously (in parallel). The computation is > done under Linux. > > For that purpose a master organizes the process and send rexec > (remote execute) commands to the different slaves via the python > command spawnlp. The slaves execute the script specified through > rexec. > > Inside this script the slaves open a file with the arguments of the > process, which were serialized via pickle, then make the numerical > computation, and write the result (a NumPy array) again via pickle in > a file. This file is opened by the master, which uses the different > results. > > I am having the problem that the master sometimes (the problem does > not happen always!!!) open the result and load an object of <type > 'instance'> instead of the expected object of <type 'array'> (what > then produces an error). I have tested the type of the objects in the > slaves and it is always 'array'. > > Has someone made similar experiences by 'pickling' arrays? Could it > be a problem of the different computers running versions of Python > from 2.0 to 2.2.1? Or a problem of different versions of NumPy? > > Is there any other way for doing such a parallel computation? I am not sure what is causing the unpickling problem you are seeing, but I suggest that you consider MPI for what you are doing. There are a number of Python MPI interfaces around, but I can personally recommend PyPar by Ole Nielsen at the Australian National University. You can use PyPar with LAM/MPI, which runs in user mode and is very easy to install, and PyPar itself does not require any modifications to the Python interpreter. PyPar will automatically serialise Python objects for you (and deserialise them at the destination) but also has methods to send NumPy arrays directly which is very efficient. See http://datamining.anu.edu.au/~ole/pypar/ for more details. Tim C |
From: Aureli S. F. <Aur...@ip...> - 2002-06-24 18:11:28
|
Hi all, I am trying to make run a numerical computation (with arrays) in different computers simultaneously (in parallel). The computation is done under Linux. For that purpose a master organizes the process and send rexec (remote execute) commands to the different slaves via the python command spawnlp. The slaves execute the script specified through rexec. Inside this script the slaves open a file with the arguments of the process, which were serialized via pickle, then make the numerical computation, and write the result (a NumPy array) again via pickle in a file. This file is opened by the master, which uses the different results. I am having the problem that the master sometimes (the problem does not happen always!!!) open the result and load an object of <type 'instance'> instead of the expected object of <type 'array'> (what then produces an error). I have tested the type of the objects in the slaves and it is always 'array'. Has someone made similar experiences by 'pickling' arrays? Could it be a problem of the different computers running versions of Python from 2.0 to 2.2.1? Or a problem of different versions of NumPy? Is there any other way for doing such a parallel computation? Thanks for the time... Regards, Aureli -- ################################# Aureli Soria Frisch Fraunhofer IPK Dept. Pattern Recognition post: Pascalstr. 8-9, 10587 Berlin, Germany e-mail: au...@ip... fon: +49 30 39006-143 fax: +49 30 3917517 web: http://vision.fhg.de/~aureli/web-aureli_en.html ################################# |
From: Magnus L. H. <ma...@he...> - 2002-06-24 13:55:19
|
Hi! I've been looking for an implementation of k-means clustering in Python, and haven't really found anything I could use... I believe there is one in SciPy, but I'd rather keep the required number of packages as low as possible (already using Numeric/numarray), and Orange seems a bit hard to install in UNIX... So, I've fiddled with using Numeric/numarray for the purpose. Has anyone else done something like this (or some other clustering algorithm for that matter)? The approach I've been using (but am not completely finished with) is to use a two-dimensional multiarray for the data (i.e. a "set" of vectors) and a one-dimensional array with a cluster assignment for each vector. E.g. >>> data[42] array([1, 2, 3, 4, 5]) >>> cluster[42] 10 >>> reps[10] array([1, 2, 4, 5, 4]) Here reps is the representative of the cluster. Using argmin it should be relatively easy to assign each vector to the cluster with the closest representative (using sum((x-y)**2) as the distance measure), but how do I calculate the new representatives effectively? (The representative of a cluster, e.g., 10, should be the average of all vectors currently assigned to that cluster.) I could always use a loop and then compress() the data based on cluster number, but I'm looking for a way of calculating all the averages "simultaneously", to avoid using a Python loop... I'm sure there's a simple solution -- I just haven't been able to think of it yet. Any ideas? -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org |
From: Konrad H. <hi...@cn...> - 2002-06-23 08:23:33
|
> If you can tell at glance for most instances in you code whether the ``foo`` > in ``foo[a:b]`` is an array, then running a query replace isn't that much How could I? Moreover, even if I could, that's not enough. I need a program to spot those places for me, as I won't go through 30000 lines of code by hand. > trouble. Of course this might not be true. But the question really > is: to what extent would it be more difficult to tell than what you > need to find out already in all the other situations where code > needs changing because of the incompatibilities numarray already What are those? In general, changes related to NumPy functions or attributes of array objects are relatively easy to deal with, as one can use a text editor to search for the name and thereby capture most locations (not all though). Changes related to generic operatinos that many other types share are the worst. > If the answer is "not much", then you would have to regard these I am not aware of any other incompatibility in the "worst" category. If there is one, I will probably never use Numarray. > > A further challenge for your code convertor: > > > > f(a[0], b[2:3], c[-1, 1]) > > > > That makes eight type combination cases. > > I'd say 4 (since c[-1,1] can't be a list) but that is beside the point. This c[-1,1] can't be a list, but it needn't be an array. Any class can implement multiple-dimension indexing. My netCDF array objects do, for example. > be even more inefficient. If I really had large amounts of code that needed > that conversion, I'd be tempted to write such a function with an additional > twist: have it monitor the input argument type whenever the program is run and I have large amounts of code that would need conversion. However, it is code that myself and about 100 other users rely on for their daily work, so it won't be the subject of empirical fixing of any kind. Either there will be an automatic procedure that is guaranteed to keep the code working, or there won't be any update. > just means more bugs and less clear and general code. But language > warts are more like tumours, they grow over the years and become > increasingly difficult to excise (just look what tremendous redesign I don't see any evidence for this in NumPy. > now...")). Among the incompatible changes that I would strongly assume *were* > documented before and after are: exceptions (strings -> classes), automatic String exceptions still work. I am not aware of any code that was broken by the fact that the standard exceptions are now classes. > conversion of ints to longs (instead of an exception) and the new division > rules whose stepwise introduction has already started. There are also quite a The division rules are the only case of serious incompatibilities I know of, and I am in fact against them; although I agree that the proposed new rules are much better. On the other hand, the proposed transition procedure provides much more help for updating code than we would get from Numarray. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hi...@cn... Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- |
From: Alexander S. <a.s...@gm...> - 2002-06-21 23:41:10
|
[sorry for replying so late, an almost finished email got lost in a computer accident and I was rather busy.] Konrad Hinsen <hi...@cn...> writes: > > Wouldn't an (almost) automatic solution be to simply replace (almost) all > > instances of a[b:c] with a.view[b:c] in your legacy code? Even for unusual > > That would convert all slicing operations, even those working on > strings, lists, and user-defined sequence-type objects. Well that's where the "(almost)" comes in ;) If you can tell at glance for most instances in you code whether the ``foo`` in ``foo[a:b]`` is an array, then running a query replace isn't that much trouble. Of course this might not be true. But the question really is: to what extent would it be more difficult to tell than what you need to find out already in all the other situations where code needs changing because of the incompatibilities numarray already introduces? (I think I have for example already found a slicing-incompatibility -- unfortunately the list of the issues I hit upon so far has disappeared somewhere, so I'll have to try to reconstruct it sometime...) If the answer is "not much", then you would have to regard these incompatibilities as even less acceptable than the introduction of copy-slicing semantics (because as you've already agreed, these incompatibilities don't confer the same benefit) or otherwise it would be difficult to see why copy-slicing shouldn't be introduced as well (just as an example, I'm sure I've already come across a slicing incompatibility -- unfortunately I've lost my compilation of this and similar problems, but I'll try to reconstruct it). View semantics have always bothered me, but if it weren't for the fact that numarray is going to cause me not inconsiderable inconvenience through various incompatibilities anyway, I would have been satisfied with the status quo. As things are, however I must admit I feel a strong temptation to get this fixed as well, especially as most of the other laudable improvements of numarray wouldn't seem to be of great importance to me personally at the moment (much nicer C code base, better handling of byteswapped data and very large arrays etc.). So I fully admit to a selfish desire for either more gain or less pain (incompatibility) or maybe even a bit of both. Of course I don't think these subjective desires of mine are a good standard to go by, but I am convinced that offering attractive improvements or few compatibility problems (or both) to the widest possible audience of current Numeric users is important in order to replace Numeric, quickly and cleanly, without any splitting. > > > autoconvert by inserting ``if type(foo) == ArrayType:...``, although > > typechecks for every slicing or indexing operation (a[0] generates a > view as well for a multidimensional array). Guaranteed to render most > code unreadable, and of course slow down execution. > > A further challenge for your code convertor: > > f(a[0], b[2:3], c[-1, 1]) > > That makes eight type combination cases. I'd say 4 (since c[-1,1] can't be a list) but that is beside the point. This was mainly intended as a demonstration that you *can* do it automatically, if you really need to. A function call would help the readability but obviously be even more inefficient. If I really had large amounts of code that needed that conversion, I'd be tempted to write such a function with an additional twist: have it monitor the input argument type whenever the program is run and if it isn't an array, the wrapping in this particular line can be discarded (with less confidence, if it always seems to be an array it could be converted into ``a.view[b:c]``, but that might need additional checking). In code that isn't reached, the wrapper just stays forever. I've always been looking for an excuse to write some self-modifying code :) > > > Well, AFAIK there are actually three mutable sequence types in > > python core and all have copy-slicing behavior: list, UserList and > > array: > > UserList is not an independent type, it is merely a subclassable > wrapper around lists. As for the array module, I haven't seen any code > that uses it. It is AFAIK the only way to work efficiently with large strings, so I guess it is important also I agree that it is not that often used. > > > I would suppose that in the grand scheme of things numarray.array is intended > > as an eventual replacement for array.array, or not? > > In the interest of those who rely on the current array module, I hope not. As long as array is kept around for backwards-compatibility, why not? [...] > > But reliability to me also includes the ability for growth -- I not only want > > my old code to work in a couple of years, I also want the tool I wrote it in > > to remain competitive and this can conflict with backwards-compatibility. I > > In what way does the current slicing behaviour render your code > non-competitive? A single design decision obviously doesn't have such an immediate huge negative impact that it immediately renders all your code-noncompetive, unless it was a *really* bad design decision it just means more bugs and less clear and general code. But language warts are more like tumours, they grow over the years and become increasingly difficult to excise (just look what tremendous redesign effort the perl people go through at the moment). The closer warts come to the core language the worse, and since numarray aims for inclusion I think it must be measured to a higher standard than other modules that don't. > > > like the balance python strikes here so far -- the language has > > Me too. But there haven't been any incompatible changes in the > documented core language, and only very few in the standard library > (the to-be-abandoned re module comes to mind - anything else?). I don't think this is true (and the documented core language is not necessarily a good standard to go by as far as python is concerned, because not quite everything one has to rely upon is actually documented (instead one can find things like: "XXX Can't be bothered to spell this out right now...")). Among the incompatible changes that I would strongly assume *were* documented before and after are: exceptions (strings -> classes), automatic conversion of ints to longs (instead of an exception) and the new division rules whose stepwise introduction has already started. There are also quite a few things that used to work for all classes, but that now no longer work with new-style classes, some of which can be quite annoying (you loose quite a bit of introspective and interactive power), but I'm not sure to which extent they were documented. > > For a bad example, see the Python XML package(s). Lots of changes, > incompatibilities between parsers, etc. The one decision I really > regret is to have chosen an XML-based solution for documentation. Now > I spend two days at every new release of my stuff to adapt the XML > code to the fashion of the day. I didn't do much xml processing, but as far as I can remember I was happy with 4suite: http://4suite.org/index.xhtml. > > It is almost ironic that I appear here as the great anti-change > advocate, since in many other occasions I have argued for improvement > over excessive compatiblity. Basically I favour motivated incompatible I don't think a particularly conservative character is necessary to fill that role :) You've got a big code base, which automatically reduces the desire for incompatibilities because you have to pay a hefty cost that is difficult to offset by potential advantages for future code. But that side of the argument is clearly important and I think even if you don't like to be an anti-change advocate you still often make valuable points against changes you perceive as uncalled for. alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.S...@gm... http://www.dcs.ex.ac.uk/people/aschmolc/ |
From: Magnus L. H. <ma...@he...> - 2002-06-21 11:37:17
|
One quick question: Why does the MA module have an average function, but not Numeric? And what is the equivalent in numarray? -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org |
From: Konrad H. <hi...@cn...> - 2002-06-20 16:29:14
|
> Wouldn't an (almost) automatic solution be to simply replace (almost) all > instances of a[b:c] with a.view[b:c] in your legacy code? Even for unusual That would convert all slicing operations, even those working on strings, lists, and user-defined sequence-type objects. > cases (like if you heavily mix arrays and lists) you could still I do, and I don't consider it that unusual. Anyway, even if some function gets called only with array arguments, I don't see how a code analyzer could detect that. So it would be... > autoconvert by inserting ``if type(foo) == ArrayType:...``, although typechecks for every slicing or indexing operation (a[0] generates a view as well for a multidimensional array). Guaranteed to render most code unreadable, and of course slow down execution. A further challenge for your code convertor: f(a[0], b[2:3], c[-1, 1]) That makes eight type combination cases. > Well, AFAIK there are actually three mutable sequence types in > python core and all have copy-slicing behavior: list, UserList and > array: UserList is not an independent type, it is merely a subclassable wrapper around lists. As for the array module, I haven't seen any code that uses it. > I would suppose that in the grand scheme of things numarray.array is intended > as an eventual replacement for array.array, or not? In the interest of those who rely on the current array module, I hope not. > much "lets make it really good (where good is what *I* say) then loads of > people will adopt it", it was more: "Numeric has a good chance to grow > considerably in popularity over the next years, so it will be much easier to > fix things now than later" (for slicing behavior, now is likely to be the last > chance). I agree - except that I think it is already too late. > The fact that matlab users are used to copy-on-demand and the fact that many > people, (including you if I understand you correctly) think that copy-slicing > semantics as such (without backward compatibility concerns) are preferable, Yes, assuming that views are somehow available. But my preference is not so strong that I consider it a sufficient reason to break lots of code. View semantics is not a catastrophe. All of us continue to use NumPy in spite of it, and I suspect none of use loses any sleep over it. I have spent perhaps a few hours in total (over six years of using NumPy) to track down view-related bugs, which makes it a minor problem on my personal scale. > I don't think matlab or similar alternatives make legally binding promises > about backwards compatibility, or do they? It guess it is actually more Of course not, software providers for the mass market take great care not to promise anything. But if Matlab did anything as drastic as what we are discussing, they would loose lots of paying customers. > But reliability to me also includes the ability for growth -- I not only want > my old code to work in a couple of years, I also want the tool I wrote it in > to remain competitive and this can conflict with backwards-compatibility. I In what way does the current slicing behaviour render your code non-competitive? > like the balance python strikes here so far -- the language has Me too. But there haven't been any incompatible changes in the documented core language, and only very few in the standard library (the to-be-abandoned re module comes to mind - anything else?). For a bad example, see the Python XML package(s). Lots of changes, incompatibilities between parsers, etc. The one decision I really regret is to have chosen an XML-based solution for documentation. Now I spend two days at every new release of my stuff to adapt the XML code to the fashion of the day. It is almost ironic that I appear here as the great anti-change advocate, since in many other occasions I have argued for improvement over excessive compatiblity. Basically I favour motivated incompatible changes, but under the condition that updating of existing code is manageable. Changing the semantics of a type is about the worst I can imagine in this respect. Konrad. -- ------------------------------------------------------------------------------- Konrad Hinsen | E-Mail: hi...@cn... Centre de Biophysique Moleculaire (CNRS) | Tel.: +33-2.38.25.56.24 Rue Charles Sadron | Fax: +33-2.38.63.15.17 45071 Orleans Cedex 2 | Deutsch/Esperanto/English/ France | Nederlands/Francais ------------------------------------------------------------------------------- |
From: Alexander S. <a.s...@gm...> - 2002-06-18 22:22:09
|
Chris Barker <Chr...@no...> writes: > My conclusion is that nested lists and Arrays simply are different > beasts so we can't expect complete compatibility. I'm also wondering why > lists have that weird behavior of a single index returning a reference, > and a slice returning a copy. Perhaps it has something to so with the This is not weird at all. Slicing and single item indexing are different conceptually and what I think you have in mind wouldn't really work. Think of a real life container, like box with subcompartments. Obviously you should be able to take out (or put in) an item from the box, which is what single indexing does (and the item may happen to be another box). My understanding is that you'd like the box to return copies of whatever was put into it on indexing, rather than the real thing -- this would not only be counterintuitive and inefficient, it also means that you could exclusively put items with a __copy__-method in lists, which would rather limit their usefulness. Slicing on the other hand creates a whole new box but this box is filled with (references to) the same items (a behavior for which a real life equivalent is more difficult to find :) : >>> l = 'foobar' >>> l = ['foobar', 'barfoot'] >>> l2 = l[:] >>> l2[0] is l[0] 1 Because the l and l2 are different boxes, however, assigning new items to l1 doesn't change l2 and vice versa. It is true, however that the situation is somewhat different for arrays, because "multidimensional" lists are just nested boxed, whereas multidimensional arrays have a different structure. array[1] indexes some part of itself according to its .shape (which can be modified, thus changing what array[1] indexes, without modifying the actual array contents in memory), whereas list[1] indexes some "real" object. This may mean that the best behavior for ``array[0]`` would be to return a copy and ``array[:]`` etc. what would be a "deep copy" if it where nested lists. I think this is the behavior Paul Dubois MA currently has. > auto-resizing of lists. That being said, I still like the idea of slices > producing copies, so: > > > 1) array > An Array like we have now, but slice-is-copy > semantics. > > > 2) array[0] > An Array of rank one less than array, sharing data with array > > > 3) array.view > An object that can do nothing but create other Arrays that share data > with array. I don't know if is possible but I'd be just as happy if > array.view returned None, and array.view[slice] returned an Array that No it is not possible. > shared data with array. Perhaps there is some other notation that could > do this. > > > 4) array.view[0] > Same as 2) I can't see why single-item indexing views would be needed at all if ``array[0]`` doesn't copy as you suggest above. > > To add a few: > > 5) array[0:1] > An Array with a copy of the data in array[0] (I suppose you'd also want array[0:1] and array[0] to have different shape?) > > 6) array.view[0:1] > An Array sharing data with array > > As I write this, I am starting to think that this is all a bit strange. > Even though lists treat slices and indexes differently, perhaps Arrays > should not. They really are different beasts. I also see why it was done Yes, arrays and lists are indeed different beasts and a different indexing behavior (creating copies) for arrays might well be preferable (since array indexing doesn't refer to "real" objects). > the way it was in the first place! > > -Chris alex -- Alexander Schmolck Postgraduate Research Student Department of Computer Science University of Exeter A.S...@gm... http://www.dcs.ex.ac.uk/people/aschmolc/ |
From: Chris B. <Chr...@no...> - 2002-06-17 22:48:29
|
Konrad Hinsen wrote: > Let's make this explicit. Given the following four expressions, > 1) array > 2) array[0] > 3) array.view > 4) array.view[0] I thought I had I clear idea of what I wanted here, which was the non-view stuff being the same as Python lists, but I discovered something: Python lists provide slices that are copies, but they are shallow copies, so nested lists, which are sort-of the equivalent of multidimensional arrays, act a lot like the view behavior of NumPy arrays: make a "2-d" list >>> l = [[i, 1+5] for i in range(5)] >>> l [[0, 6], [1, 6], [2, 6], [3, 6], [4, 6]] make an array that is the same: >>> a = array(l) array([[0, 6], [1, 6], [2, 6], [3, 6], [4, 6]]) assign a new binding to the first element: >>> b = a[0] >>> m = l[0] change something in it: >>> b[0] = 30 >>> a array([[30, 6], [ 1, 6], [ 2, 6], [ 3, 6], [ 4, 6]]) The first array is changed Change something in the first element of the list: >>> m[0] = 30 >>> l [[30, 6], [1, 6], [2, 6], [3, 6], [4, 6]] The first list is changed too. Now try slices instead: >>> b = a[2:4] change an element in the slice: >>>> b[1,0] = 55 >>> a array([[30, 6], [ 1, 6], [ 2, 6], [55, 6], [ 4, 6]])>> a The first array is changed Now with the list >>> m = l[2:4] >>> m [[2, 6], [3, 6]] This is a copy, but it is a shallow copy, so: >>> m[1][0] = 45 Change an element >>> l [[30, 6], [1, 6], [2, 6], [45, 6], [4, 6]] The list is changed, but: m[0] = [56,65] >>> l [[30, 6], [1, 6], [2, 6], [45, 6], [4, 6]] The list doesn't change, where: >>> b[0] = [56,65] >>> a array([[30, 6], [ 1, 6], [56, 65], [55, 6], [ 4, 6]]) The array does change My conclusion is that nested lists and Arrays simply are different beasts so we can't expect complete compatibility. I'm also wondering why lists have that weird behavior of a single index returning a reference, and a slice returning a copy. Perhaps it has something to so with the auto-resizing of lists. That being said, I still like the idea of slices producing copies, so: > 1) array An Array like we have now, but slice-is-copy semantics. > 2) array[0] An Array of rank one less than array, sharing data with array > 3) array.view An object that can do nothing but create other Arrays that share data with array. I don't know if is possible but I'd be just as happy if array.view returned None, and array.view[slice] returned an Array that shared data with array. Perhaps there is some other notation that could do this. > 4) array.view[0] Same as 2) To add a few: 5) array[0:1] An Array with a copy of the data in array[0] 6) array.view[0:1] An Array sharing data with array As I write this, I am starting to think that this is all a bit strange. Even though lists treat slices and indexes differently, perhaps Arrays should not. They really are different beasts. I also see why it was done the way it was in the first place! -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chr...@no... |
From: Mark F. <fa...@uv...> - 2002-06-17 20:48:06
|
Dear Numpy-Discussion, It is good to see that Numeric Python inspires such confidence in people all around the world, especially when subjected to due deliberation. I hope that this invoiced contract entitlement will not be set to zero once we obtain a view of it. I would like to propose a further elaboration of the Sharing Partern. Eric, Travis, Konrad, Scott, Paul, and Perry will each get 2% of the total based on their contributions to the mailing list traffic so far (I am blissfully ignorant of who has written actual code), and the rest of the 20% will go to the first individual to deliver a finished working Numarray. With copy semantics only, please. best regards, Mark Fardal > Dear Sir, > I am the Chairman Contract Review Committee of > National Electric Power Authority (NEPA). > Although this proposal might come to you as a surprise > since it is coming from someone you do not know or > ever seen before, but after due deliberation with my > colleagues, I decided to contact you based onIntuition. > We are soliciting for your humble and confidential > assistance to take custody of Seventy One Million, > Five Hundred Thousand United StatesDollars.{US$71,500,000.00}. > This sum (US$71.5M) is an over invoiced contract sum > which is currently in offshore payment account of the > Central Bank of Nigeria as an unclaimed contract > entitlement which can easily be withdrawn or drafted > or paid to any recommended beneficiary by my committee. > On this note, you will be presented as a contractor to > NEPA who has executed a contract to a tune of the > above sum and has not been paid. > Proposed Sharing Partern (%): > 1. 70% for me and my colleagues. > 2. 20% for you as a partner/fronting for us. > 3. 10% for expenses that may be incure by both parties > during the cause of this transacton. > Our law prohibits a civil servant from operating a > foreign account, hence we are contacting you. > If this proposal satisfies you, do response as soon as > possible with the following information: > 1. The name you wish to use as the beneficiary of thefund. > 2. Your Confidential Phone and Fax Numbers. > Further discussion will be centered on how the fund > shall be transferred and full details on how to accomplish this great opportuni\ ty of ours. > Thank you and God bless. > > Best regards, > > victor ichaka nabia > |