From: Andy D. <adu...@co...> - 2000-04-24 20:27:59
|
On Fri, 14 Apr 2000, Tim Churches wrote: > Andy Dustman wrote: > > Yes, but the problem with mysql_store_result() is the large amount of > memory required to store the result set. Couldn't the user be > responsible for predetermining the size of the array via a query such as > "select count(*) from sometable where...." and then pass this value as a > parameter to the executeNumPy() method? In MySQL at least such count(*) > queries are resolved very quickly so such an approach wouldn't take > twice the time. Then mysql_use_result() could be used to populate the > initialised NumPy array with data row, so there so only ever one > complete copy of the data in memory, and that copy is in the NumPy > array. After some more thought on this subject, and some poking around at NumPy, I came to the following conclusions: Since NumPy arrays are fixed-size, but otherwise sequences (in the multi-dimensional case, sequences of sequences), the best approach would be for the user to pass in a pre-sized array (i.e. from zeros(), and btw, the docstring for zeros is way wrong), and _mysql would simply access it through the Sequence object protocol, and update as many values as it could: If you passed a 100-row array, it would fill 100 rows or as many as were in the result set, whichever is less. Since this requires no special knowledge of NumPy, it could be a standard addition (no conditional compiliation required). This method (tentatively _mysql.fetch_rows_into_array(array)) would return the array argument as the result. IndexError would likely be raised if the array was too narrow (too many columns in result set). Probably this would not be a MySQLdb.Cursor method, but perhaps I can have a seperate module with a cursor subclass which returns NumPy arrays. > > Question: Would it be adequate to put all columns returned into the array? > > If label columns need to be returned, this could pose a problem. They may > > have to be returned as a separate query. Or else non-numeric columns would > > be excluded and returned in a list of tuples (this would be harder). > > Yes, more thought needed here - my initial thought was one NumPy array > per column, particularly since NumPy arrays must be homogenous wrt data > type. Each NumPy array could be named the same as the column from which > it is derived. Okay, I think I know what you mean here. You are wanting to return each column as a (vertical) vector, whereas I am thinking along the lines of returning the result set as a matrix. Is that correct? Since it appears you can efficiently slice out column vectors as a[:,n], is my idea acceptable? i.e. >>> a=Numeric.multiarray.zeros( (2,2),'d') >>> a[1,1]=2 >>> a[0,1]=-1 >>> a[1,0]=-3 >>> a array([[ 0., -1.], [-3., 2.]]) >>> a[:,0] array([ 0., -3.]) >>> a[:,1] array([-1., 2.]) -- andy dustman | programmer/analyst | comstar.net, inc. telephone: 770.485.6025 / 706.549.7689 | icq: 32922760 | pgp: 0xc72f3f1d "Therefore, sweet knights, if you may doubt your strength or courage, come no further, for death awaits you all, with nasty, big, pointy teeth!" |