Screenshot instructions:
Windows
Mac
Red Hat Linux
Ubuntu
Click URL instructions:
Rightclick on ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)
You can subscribe to this list here.
1998 
_{Jan}

_{Feb}

_{Mar}

_{Apr}
(19) 
_{May}
(9) 
_{Jun}
(62) 
_{Jul}
(42) 
_{Aug}
(32) 
_{Sep}
(10) 
_{Oct}
(23) 
_{Nov}
(10) 
_{Dec}
(38) 

1999 
_{Jan}
(15) 
_{Feb}
(27) 
_{Mar}
(31) 
_{Apr}
(19) 
_{May}
(3) 
_{Jun}
(24) 
_{Jul}
(5) 
_{Aug}
(7) 
_{Sep}

_{Oct}

_{Nov}
(3) 
_{Dec}
(2) 
2000 
_{Jan}
(6) 
_{Feb}
(1) 
_{Mar}
(3) 
_{Apr}

_{May}
(1) 
_{Jun}
(2) 
_{Jul}

_{Aug}
(6) 
_{Sep}

_{Oct}
(8) 
_{Nov}
(5) 
_{Dec}
(1) 
2001 
_{Jan}
(18) 
_{Feb}
(7) 
_{Mar}
(2) 
_{Apr}
(2) 
_{May}

_{Jun}
(1) 
_{Jul}
(1) 
_{Aug}

_{Sep}
(1) 
_{Oct}
(2) 
_{Nov}

_{Dec}
(1) 
2002 
_{Jan}
(2) 
_{Feb}

_{Mar}

_{Apr}
(1) 
_{May}
(7) 
_{Jun}
(1) 
_{Jul}
(2) 
_{Aug}

_{Sep}

_{Oct}
(34) 
_{Nov}
(7) 
_{Dec}
(5) 
2003 
_{Jan}
(13) 
_{Feb}
(4) 
_{Mar}

_{Apr}

_{May}
(3) 
_{Jun}
(1) 
_{Jul}
(3) 
_{Aug}

_{Sep}
(3) 
_{Oct}
(1) 
_{Nov}
(1) 
_{Dec}
(14) 
2004 
_{Jan}

_{Feb}
(1) 
_{Mar}
(2) 
_{Apr}
(8) 
_{May}
(13) 
_{Jun}

_{Jul}
(9) 
_{Aug}
(2) 
_{Sep}

_{Oct}
(2) 
_{Nov}

_{Dec}
(1) 
2005 
_{Jan}

_{Feb}
(3) 
_{Mar}

_{Apr}
(6) 
_{May}

_{Jun}
(9) 
_{Jul}

_{Aug}

_{Sep}
(1) 
_{Oct}

_{Nov}
(2) 
_{Dec}

2006 
_{Jan}

_{Feb}

_{Mar}

_{Apr}

_{May}
(2) 
_{Jun}
(2) 
_{Jul}

_{Aug}
(1) 
_{Sep}
(3) 
_{Oct}
(1) 
_{Nov}

_{Dec}

2007 
_{Jan}

_{Feb}
(2) 
_{Mar}
(1) 
_{Apr}
(2) 
_{May}
(1) 
_{Jun}
(3) 
_{Jul}
(7) 
_{Aug}
(5) 
_{Sep}
(2) 
_{Oct}
(3) 
_{Nov}
(4) 
_{Dec}
(3) 
2008 
_{Jan}
(1) 
_{Feb}
(7) 
_{Mar}
(3) 
_{Apr}
(1) 
_{May}
(6) 
_{Jun}
(4) 
_{Jul}

_{Aug}

_{Sep}

_{Oct}

_{Nov}

_{Dec}

2009 
_{Jan}

_{Feb}
(5) 
_{Mar}
(2) 
_{Apr}
(1) 
_{May}
(1) 
_{Jun}
(1) 
_{Jul}

_{Aug}

_{Sep}

_{Oct}

_{Nov}
(2) 
_{Dec}
(1) 
2010 
_{Jan}
(9) 
_{Feb}
(1) 
_{Mar}

_{Apr}

_{May}

_{Jun}

_{Jul}
(2) 
_{Aug}

_{Sep}
(2) 
_{Oct}

_{Nov}

_{Dec}
(3) 
2011 
_{Jan}

_{Feb}
(8) 
_{Mar}

_{Apr}

_{May}
(27) 
_{Jun}
(35) 
_{Jul}
(22) 
_{Aug}
(13) 
_{Sep}
(5) 
_{Oct}

_{Nov}

_{Dec}
(9) 
2012 
_{Jan}
(3) 
_{Feb}
(1) 
_{Mar}
(1) 
_{Apr}
(6) 
_{May}
(4) 
_{Jun}
(7) 
_{Jul}
(1) 
_{Aug}

_{Sep}

_{Oct}

_{Nov}

_{Dec}

2013 
_{Jan}
(2) 
_{Feb}

_{Mar}

_{Apr}

_{May}

_{Jun}

_{Jul}

_{Aug}

_{Sep}

_{Oct}
(1) 
_{Nov}

_{Dec}

2014 
_{Jan}
(1) 
_{Feb}

_{Mar}

_{Apr}

_{May}
(1) 
_{Jun}

_{Jul}

_{Aug}

_{Sep}

_{Oct}

_{Nov}

_{Dec}

2015 
_{Jan}

_{Feb}

_{Mar}

_{Apr}

_{May}

_{Jun}

_{Jul}

_{Aug}

_{Sep}

_{Oct}

_{Nov}
(1) 
_{Dec}

2016 
_{Jan}
(26) 
_{Feb}

_{Mar}

_{Apr}

_{May}

_{Jun}

_{Jul}
(1) 
_{Aug}
(3) 
_{Sep}

_{Oct}

_{Nov}

_{Dec}

S  M  T  W  T  F  S 


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29
(1) 
30

31




From: Walter Landry <landry@ph...>  20000529 23:57:33

Hello everyone, A few months ago, I posted in blitzsupport a problem that I was having with optimizing Tiny Vectors. Well, I have a solution, but it implies some things about how blitz is designed. So I am posting to blitzdev. The original problem was a program like #include <iostream.h> #include <blitz/tinyvec.h> using namespace blitz; int main() { TinyVector<double,3> x(0,1,2),n(2,3,4),y(3,4,5); for(int i=1;i<1000000000;i++) { y=x+n; x=yn; n=(yx)*(yx)*(yx)/(n*n); } cout << y << endl; } I also had a hand optimized version #include <iostream.h> int main() { double x[3],y[3],n[3]; x[0]=0; x[1]=1; x[2]=2; n[0]=2; n[1]=3; n[2]=4; y[0]=3; y[1]=4; y[2]=5; for(int i=1;i<1000000000;i++) { y[0]=(x[0]+n[0]); y[1]=(x[1]+n[1]); y[2]=(x[2]+n[2]); x[0]=(y[0]n[0]); x[1]=(y[1]n[1]); x[2]=(y[2]n[2]); n[0]=(y[0]x[0])*(y[0]x[0])*(y[0]x[0])/(n[0]*n[0]); n[1]=(y[1]x[1])*(y[1]x[1])*(y[1]x[1])/(n[1]*n[1]); n[2]=(y[2]x[2])*(y[2]x[2])*(y[2]x[2])/(n[2]*n[2]); } cout << y[0] << " " << y[1] << " " << y[2] << endl; } I found that once the expressions got too complicated, the compiler (KCC) couldn't optimize the expression anymore. Since KCC seemed to be the best C++ compiler for this, I thought I was hosed. In any case, to figure out what was going on, I wrote my own small vector class: class Tensor1 { double data0, data1, data2; public: Tensor1(double d0, double d1, double d2): data0(d0), data1(d1), data2(d2) {} Tensor1() {} double & operator()(const int N) { return N==0 ? data0 : (N==1 ? data1 : data2); } double operator()(const int N) const { return N==0 ? data0 : (N==1 ? data1 : data2); } template<char i> Tensor1_Expr<Tensor1,i> operator()(const Index<i> index) { return Tensor1_Expr<Tensor1,i>(*this); } }; /* A wrapper class to add the two Tensor1's */ template<class A, class B, char i> class Tensor1_plus_Tensor1 { const Tensor1_Expr<A,i> iterA; const Tensor1_Expr<B,i> iterB; public: double operator()(const int N) const { return iterA(N)+iterB(N); } Tensor1_plus_Tensor1(const Tensor1_Expr<A,i> &a, const Tensor1_Expr<B,i> &b): iterA(a), iterB(b) {} }; template<class A, class B, char i> inline Tensor1_Expr<const Tensor1_plus_Tensor1<const Tensor1_Expr<A,i>,const Tensor1_Expr<B,i>,i>,i> operator+(const Tensor1_Expr<A,i> &a, const Tensor1_Expr<B,i> &b) { typedef const Tensor1_plus_Tensor1<const Tensor1_Expr<A,i>, const Tensor1_Expr<B,i>,i> TensorExpr; return Tensor1_Expr<TensorExpr,i>(TensorExpr(a,b)); } (Similar wrapper classes for subtraction, multiplication, and division. I've omitted them to save space.) template <class A, char i> class Tensor1_Expr { A iter; public: Tensor1_Expr(A &a): iter(a) {} double operator()(const int N) const { return iter(N); } }; /* A Tensor1 Expression wrapper class. */ template<char i> class Tensor1_Expr<Tensor1, i> { Tensor1 &iter; public: Tensor1_Expr(Tensor1 &a): iter(a) {} double & operator()(const int N) { return iter(N); } double operator()(const int N) const { return iter(N); } template<class B> const Tensor1_Expr<Tensor1,i> & operator=(const Tensor1_Expr<B,i> &result) { iter(0)=result(0); iter(1)=result(1); iter(2)=result(2); return *this; } const Tensor1_Expr<Tensor1,i> & operator=(const Tensor1_Expr<Tensor1,i> &result) { return operator=<Tensor1>(result); } }; /* Finally, the actual program. */ int main() { Tensor1 y(0,1,2); Tensor1 x(2,3,4); Tensor1 n(5,6,7); const Index<'i'> i; for(int j=0;j<100000;j++) { y(i)=x(i)+n(i); x(i)=y(i)n(i); n(i)=(y(i)x(i))*(y(i)x(i))/(n(i)); // n(i)=(y(i)x(i))*(y(i)x(i))*(y(i)x(i))*(y(i)x(i))*(y(i)x(i))*(y(i)x(i))*(y(i)x(i))*(y(i)x(i))*(y(i)x(i))*(y(i)x(i))*(y(i)x(i))*(y(i)x(i))*(y(i)x(i))*(y(i)x(i))*(y(i)x(i))*(y(i)x(i))/(n(i)*n(i)*n(i)*n(i)*n(i)*n(i)*n(i)*n(i)*n(i)*n(i)*n(i)*n(i)*n(i)*n(i)*n(i)); } cout << y(0) << " " << y(1) << " " << y(2) << endl; } I complied it on an Origin2000 with KCC little.C o little ITensor1_Expr +K3 inline_auto_space_time=750000000000 inline_implicit_space_time=200000000000 inline_generated_space_time=40000000 000.0 inline_auto_space_time=100000000000.0 no_exceptions (the no_exceptions option is very important) and it was able to optimize even very complicated expressions. I stopped at 19 (yx)/n terms because I figured my expressions couldn't get any worse than that. This seems to imply that there is something in TinyVectors that makes it more complicated than it should be. The only things that spring out at me are that TinyVectors take their length and type as template expressions, and the various operators (e.g. operator+(TinyVector d1, TinyVector d2)) construct template expressions using d1.begin() and d2.begin(), instead of just copying the whole thing as in my example. I don't know if any of these things is what is keeping TinyVectors from being properly optimized, but there is definitely a problem. In any case, I know that my method works because I've written some template expression code for tensors. I was party to a discussion about this a year or so ago on blitzdev, and now I've finally gotten around to doing it. It's biggest claim to fame is that it handles implicit summation. Thus you can write A(i,j) = B(i,k)*C(k,j) instead of having to write A = sum(B(i,k)*C(k,j),k) Also, the result is strongly typed by the indices, so you can't write A(i,k) = B(i,k)*C(k,j) or even A = B(i,k)*C(k,j) It has Tensor0, Tensor1, Tensor2, Tensor2_symmetric, Tensor3_dg (symmetric on the first two indices), Tensor3_christof(symmetric on the last two indices), Tensor4_ddg(symmetric on the first two, and last two, indices) and Tensor4_Riemann(antisymmetric on the first two, and last two, indices, and symmetric under cyclic permutation of the last three indices). I wrote this for a General Relativity code, so that is why I implemented this particular choice of tensors. It can also handle pointers to doubles. So you could write double a0[10000], a1[10000], a2[10000], b0[10000], b1[10000], b2[10000], c0[10000], c1[10000], c2[10000]; Tensor1 A(a0,a1,a2), B(b0,b1,b2), C(c0,c1,c2); Index<'i'> i; for(int a=0;a<10000;a++) { A(i)=B(i)+C(i); ++A; ++B; ++C; } This is a different approach from the Blitz way of doing things. Blitz optimizes one expression at a time. So, for example, if you want to invert a 3x3 matrix, you can write it like det=a(0,0)*a(1,1)*a(2,2) + a(1,0)*a(2,1)*a(0,2) + a(2,0)*a(0,1)*a(1,2)  a(0,0)*a(2,1)*a(1,2)  a(1,0)*a(0,1)*a(2,2)  a(2,0)*a(1,1)*a(0,2); inverse(0,0)= (a(1,1)*a(2,2)  a(1,2)*a(1,2))/det; inverse(0,1)= (a(0,2)*a(1,2)  a(0,1)*a(2,2))/det; inverse(0,2)= (a(0,1)*a(1,2)  a(0,2)*a(1,1))/det; inverse(1,1)= (a(0,0)*a(2,2)  a(0,2)*a(0,2))/det; inverse(1,2)= (a(0,2)*a(0,1)  a(0,0)*a(1,2))/det; inverse(2,2)= (a(1,1)*a(0,0)  a(1,0)*a(1,0))/det; However, det is just going to be thrown away at the end. We don't need to store it for all (10000 or whatever) points. We just need to compute it for one point, use it in six expressions, and forget it. The blitz method makes you ship the memory of det in and out of the cache 6 times. A better way to do this is to put the whole inversion into one loop. I've seen a factor of 4 improvement doing it this way. The disadvantages, which are alltooreal, are that you have to manually start the loop, and you have to remember to increment the variables. In the case of the inversion, it ends up looking like double det; for(int i=0;i<10000;i++ { det=a(0,0)*a(1,1)*a(2,2) + a(1,0)*a(2,1)*a(0,2) + a(2,0)*a(0,1)*a(1,2)  a(0,0)*a(2,1)*a(1,2)  a(1,0)*a(0,1)*a(2,2)  a(2,0)*a(1,1)*a(0,2); inverse(0,0)= (a(1,1)*a(2,2)  a(1,2)*a(1,2))/det; inverse(0,1)= (a(0,2)*a(1,2)  a(0,1)*a(2,2))/det; inverse(0,2)= (a(0,1)*a(1,2)  a(0,2)*a(1,1))/det; inverse(1,1)= (a(0,0)*a(2,2)  a(0,2)*a(0,2))/det; inverse(1,2)= (a(0,2)*a(0,1)  a(0,0)*a(1,2))/det; inverse(2,2)= (a(1,1)*a(0,0)  a(1,0)*a(1,0))/det; ++a; ++inverse; } Forgetting to put in the ++ operators could result in subtle bugs. You could also have problems if you put in more than one loop: for(int i=0;i<10000;i++) { a(i,j)=... ++a; } for(int i=0;i<10000;i++) { a(i,j)+=... ++a; } This will end up writing off of the end of a. Furthermore, I use the restrict keyword, so you might get some weird problems if you try to alias things. You might want to #define the restrict away. I found that it actually decreased performance for extremely complicated expressions. Basically, you're giving up some expressive power. Also, if your compiler can't optimize the expressions, then you end up with something MUCH slower than blitz's way. It can handle quite complex expressions. As a real life example K_new(i,j)=Lapse*(R(i,j) + Trace_K*K(i,j)  (2*K(i,k)^K_mix(k,j))  0.5*matter_ADM*g(i,j)  S_ADM(i,j)) + Shift_up(k)*dK(i,j,k) + (K(i,k)*dShift_up(k,j)  K(j,k)*dShift_up(k,i))  ddLapse(i,j) + (dLapse(k)*christof(k,i,j)); K_new is symmetric, the ^ operator means contract to make a Tensor2_symmetric, and the  means add to make a Tensor2_symmetric (it is not a symmetrizer, so it doesn't divide by 2). I had to use these operators (instead of * and +) to keep the compilers from making a Tensor2. You can't assign a Tensor2 to a Tensor2_symmetric, so you have to explicitly request the symmetrized result. KCC was able to optimize the entire expression. I don't know if any other compilers can optimize these things well. gcc doesn't (though that isn't surprising), and I couldn't get SGI's CC compiler to do it either. xlC can't compile it (though I might be able to make xlC compile it, I don't think it would be worth it), and I haven't tried The Portland Group's compiler. I don't have access to anything else. The other problems are that it can't handle anything but Tensors of doubles, and it only work for 3 dimensions (so people wanting 2, 4, 5, etc. dimensions have to write their own). Also, not all possible operations are supported. For example, you can't write A(i,j)=B(j,i) but why would you want to do that? ;) If you did want to do that, it is not hard to add it in. A somewhat more useful extension might be antisymmetric rank 2 tensors. I have a way of doing that, but it isn't particularly nice. I won't describe it here, except to say that it is unsafe. In any case, I've put up a gzipped tar ball at http://www.physics.utah.edu/~landry/FTensor.tar.gz It has a small sample program little.C. If you want a more complicated use of it, you can download neutron_star.tar.gz and look at step.C, hydro_step.C, update_data.C, and files that they include. They use the functions d() and dd() to take first and second derivatives, returning the appropriate derivatives. To actually compile neutron_star.tar.gz, you need many more things. If you're interested, send me an email. Walter Landry landry@...  blitzdev list  * To subscribe/unsubscribe: use the handy web form at http://oonumerics.org/blitz/lists.html 