From: Anders P. <and...@op...> - 2004-04-21 15:31:05
|
There are a couple of things you can try... 1) Basically what you want to do is solve a linear equation system with=20= far more rows than columns: Ax=3DB Is your current strategy to calculate A.transpose() and multiplying=20 both sides of the equation with this? There is another way. Simply call A.solve(B) Not sure I tested this, but it should produce the same result! Provided=20= you use the same A for each of the 10000 iterations the heavy=20 "processing" is cached for you (A internally stores its QR=20 decomposition). 2) Perhaps you can reduce the number of BigDecimal calls. When you=B4re=20= doing matrix multiplication; do the matrices involved have a lot of=20 zeros? If they do, then implementing multiplication code in one or more=20= of the MatrixStore subclasses may speed things up. Have to leave the office now - live in a different time zone, /Anders On 2004-04-21, at 17.06, Inger, Matthew wrote: > Yes, that kind of pattern for factories is very commonly > used. You could still have the implementation you supplied > which is a generic Reflection based strategy, but when you're > dealing with thousands, or millions of records, every little > bit counts. > > 1. I'm processing 10,000 records > 2. The X values are the same for every iteration > > Using this, i created a Solver class which caches the X matrix, > along with it's transpose, and a couple of other computed matrices > (such as Xt * X) which do not involve the Y values. And i'm also > caching the Unit matrix. > > So each iteration through the loop, only the Y values have to be set, > and i've minimized the number of matrix operations required to compute > the coefficients. > > After running this, i trimmed run time from 8s to 5s, and noticed that > 75% of the time is spent in the BigDecimal class, with about 1/3 of=20 > that > spent in the constructor, and 1/3 in the "timesTenToThe" method. So=20= > there's > really nothing further i can do to speed it up. > > > -----Original Message----- > From: Anders Peterson [mailto:and...@op...] > Sent: Wednesday, April 21, 2004 11:01 AM > To: Anders Peterson > Cc: Inger, Matthew; 'oja...@li...' > Subject: Re: [ojAlgo-user] Questions > > > This is done! It's in CVS. > > Please let me know if it faster. > > /Anders > > On 2004-04-21, at 11.41, Anders Peterson wrote: > >> In my previous mail it should read "It will speed things up..." ;-) >> >> On 2004-04-20, at 14.57, Inger, Matthew wrote: >> >>> The basic slowdown is that the constructors are not cached. >>> Finding the constructor each time is what makes that factory >>> slow. >> >> I understand! Do you know of any reason why MatrixFactory should not >> cache the constructors? >> >>> If MatrixFactory was an interface, one could just call >>> the appropriate constructurs directly without reflection. >>> >>> public interface MatrixFactory { >>> public BasicMatrix create(double values[][]); >>> } >>> >>> public class BigMatrixFactory implements MatrixFactory { >>> public BasicMatrix create(double values[][]) { >>> return new BigMatrix(values); >>> } >>> } >>> >>> You could still have the original vresion, but it would >>> be named something different, and implement this new interface. >> >> Ok, I see what you mean. Is that a commonly used design pattern for >> factories? >> >>> -----Original Message----- >>> From: Anders Peterson [mailto:and...@op...] >>> Sent: Tuesday, April 20, 2004 4:40 AM >>> To: Inger, Matthew >>> Cc: 'oja...@li...' >>> Subject: Re: [ojAlgo-user] Questions >>> >>> >>> On 2004-04-19, at 22.39, Inger, Matthew wrote: >>> >>>> FYI: Using MatrixFactory is also extremely slow. >>>> If i directly instantiate JamaMatrix for 100,000 >>>> regressions, it takes ~ 2 seconds or so. If i use >>>> MatrixFactory to instantiate, it goes up to 8 seconds. >>> >>> MatrixFactory uses reflection to call the appropriate constructor. >>> Using it centralises matrix creation and allows you to switch >>> implementation by changing one line of code - the call to the >>> MatrixFactory constructor. (Make sure you re-use the factory >>> instances.) >>> >>> This will naturally take longer than calling the constructor=20 >>> directly, >>> but I am surprised it is that much slower. >>> >>> If this is a problem; call the constructors directly. Perhaps you >>> should implement your own static factory method. >>> >>>> I'm a little curious why MatrixFactory is not an interface, >>>> with implementations per different type of Matrix. >>> >>> Interfaces can't specify contructors. >>> >>> /Anders >>> >>>> -----Original Message----- >>>> From: Inger, Matthew >>>> Sent: Monday, April 19, 2004 3:04 PM >>>> To: 'Anders Peterson'; Inger, Matthew >>>> Subject: RE: [ojAlgo-user] Questions >>>> >>>> >>>> There are a small number of variables. In most cases, we >>>> have a single variable equation, and up to 24 points with >>>> which to fit the linear regression. >>>> >>>> Then, we do that 10,000 times. >>>> >>>> Basically, each row in the database contains time elapsed >>>> data (typically one value per month). We use the # of the >>>> month as the x values, and the data as the y value, and try >>>> to fit a line to it to determine short term trends. >>>> >>>> So for each row, we calculate 1 set of coefficients (b0 and b1, >>>> aka intercept & slope), and make 1 more predictions. >>>> >>>> However, there are potentially millions of rows in the database. >>>> >>>> >>>> -----Original Message----- >>>> From: Anders Peterson [mailto:and...@op...] >>>> Sent: Monday, April 19, 2004 2:32 PM >>>> To: Inger, Matthew >>>> Subject: Re: [ojAlgo-user] Questions >>>> >>>> >>>> This is just a naive first attempt... >>>> >>>> On 2004-04-19, at 20.06, Inger, Matthew wrote: >>>> >>>>> PS: As far as solving the equations, i basically have two >>>>> matrices. X and Y. Any kind of code snippet would help here. >>>>> Keep in mind, equations are of the form >>>>> >>>>> y =3D b0 + b1*X >>>> >>>> 1*b0 + x*b1 =3D y >>>> >>>> A*X =3D B >>>> >>>> where >>>> >>>> A: >>>> 1 x1 >>>> 1 x2 >>>> 1 x3 >>>> ... ... >>>> >>>> B: >>>> y1 >>>> y2 >>>> y3 >>>> ... >>>> >>>> X: >>>> b0 >>>> b1 >>>> >>>> X =3D A.solve(B); >>>> >>>> Are you doing this 10 000 times with a relatively small number of = x- >>>> and y-values or do you have 10 000 x- and y-values for each >>>> calculation? >>>> >>>> /Anders >>>> >>>>> -----Original Message----- >>>>> From: Anders Peterson [mailto:and...@op...] >>>>> Sent: Monday, April 19, 2004 2:02 PM >>>>> To: Inger, Matthew >>>>> Cc: 'oja...@li...' >>>>> Subject: Re: [ojAlgo-user] Questions >>>>> >>>>> >>>>> On 2004-04-19, at 18.33, Inger, Matthew wrote: >>>>> >>>>>> 1. Have any performance tests been run with BigMatrix? >>>>> >>>>> Not really... >>>>> >>>>>> I'm using >>>>>> BigMatrix to do a >>>>>> LinearRegression algorithm. I had previously been using = Jama >>>>>> directly, >>>>>> but our FD >>>>>> department noticed a difference between the results given by >>>>>> java >>>>>> and >>>>>> the LINEST >>>>>> function in Excel. I have determined that the difference is >>>>>> DEFINATELY >>>>>> due to rounding >>>>>> limitations in the java double primitive type. >>>>>> >>>>>> These issues go away when i use ojAlgo library, >>>>> >>>>> That's great news to me - that is the reason I created ojAlgo! >>>>> >>>>>> however, the time >>>>>> required to do the >>>>>> calculations increases to about 12-13x what it was using >>>>>> regular >>>>>> double >>>>>> calculations. >>>>>> >>>>>> Doing 10,000 linear regressions with Jama took about .781s >>>>>> (according >>>>>> to junit), and >>>>>> with BigMatrix, took 10.345s >>>>> >>>>> I wasn't aware the difference was that big. ;-) >>>>> >>>>>> Any idea why such a hug disparity? >>>>> >>>>> Try altering the scale. A larger scale means better results, but = it >>>>> takes longer. You can change a matrix' scale whenever you want -=20= >>>>> the >>>>> new scale will be used from then on. Elements are not rounded=20 >>>>> unless >>>>> you call enforceScale(). >>>>> >>>>> Internally BigMatrix uses various decompositions for some of the >>>>> more >>>>> complex calculations. These decompositions inherit a scale from = the >>>>> parent matrix. It is however not the same - it's bigger. I believe >>>>> the >>>>> formula is 2 * (3 + max(9, matrixScale)) and this formula is >>>>> evaluated >>>>> when the decomposition is created. After the decomposition is >>>>> created >>>>> its scale wont change even if you change the matrix' scale. The >>>>> scale >>>>> of a decomposition is never smaller than 24 (according to the = above >>>>> formula). >>>>> >>>>> I'm not happy with this, and have been meaning to change it. >>>>> Decompositions should have the same scale as its parent matrix, = and >>>>> when it is changed, updated "everywhere". >>>>> >>>>> Would you prefer that? >>>>> >>>>>> 2. Is there a way to use the solvers in OjAlg to do the linear >>>>>> regression? >>>>>> Or is that not >>>>>> something that is provided? >>>>> >>>>> How are you currently doing it? >>>>> >>>>> I imagine the best way would be to simply build an over-determined >>>>> set >>>>> of linear equations - Ax=3DB - and call A.solve(B). Internally >>>>> BigMatrix >>>>> would then use the QR decomposition to give a least squares >>>>> estimation >>>>> of x. That would be my first attempt. >>>>> >>>>> You could use the QP solver do the same thing, but I don't see how >>>>> that >>>>> would simplify or improve anything. >>>>> >>>>> What version of ojAlgo are you using - v1, v2 or CVS? >>>>> >>>>> I recommend working from CVS. Unfortunately I have moved things >>>>> around >>>>> a bit in between version... >>>>> >>>>> I don't have access to a profiling tool, but I know they can be=20 >>>>> very >>>>> useful in finding bottlenecks. If you can identify a problem,=20 >>>>> and/or >>>>> suggest a specific improvement, I'll be happy to help you = implement >>>>> it. >>>>> >>>>> /Anders >>>>> >>>>>> ---------------------- >>>>>> Matthew Inger >>>>>> Design Architect >>>>>> Synygy, Inc >>>>>> 610-664-7433 x7770 >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------------------------------- >>>>>> This SF.Net email is sponsored by: IBM Linux Tutorials >>>>>> Free Linux tutorial presented by Daniel Robbins, President and = CEO >>>>>> of >>>>>> GenToo technologies. Learn everything from fundamentals to system >>>>>> administration.http://ads.osdn.com/? >>>>>> ad_id=3D1470&alloc_id=3D3638&op=3Dclick >>>>>> _______________________________________________ >>>>>> ojAlgo-user mailing list >>>>>> ojA...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/ojalgo-user >>>> >>> >>> >>> ------------------------------------------------------- >>> This SF.Net email is sponsored by: IBM Linux Tutorials >>> Free Linux tutorial presented by Daniel Robbins, President and CEO = of >>> GenToo technologies. Learn everything from fundamentals to system >>> = administration.http://ads.osdn.com/?ad_id=3D1470&alloc_id=3D3638&op=3Dclic= k >>> _______________________________________________ >>> ojAlgo-user mailing list >>> ojA...@li... >>> https://lists.sourceforge.net/lists/listinfo/ojalgo-user >>> >> > |