From: Bob F <citibob@co...>  20080417 15:13:39

Hello, It looks like there's some interest in this, so I wanted to say a bit more about where I'm coming from and what I've found so far. The good thing about MATLAB is that it's really easy to start messing around with matrices and try things out. This interactivity is really good, especially when you're trying to explore a dataset and you don't know what you'll find. In particular, MATLAB has some very convenient syntax, the importance of which should NOT be overlooked. Good syntax makes things easy in ways they would not be with bad syntax. MATLAB also has a relatively well constructed set of numerical libraries. The bad things about MATLAB are legion: 1. MATLAB has no static typechecking and is therefore not suitable for large, complex projects. 2. MATLAB is missing many other features that make it illsuited for large software projects as well. 3. MATLAB language is slow, and there's no easy way to speed it up. Sure, the matrix operations are lightning fast, but a wimpy for loop can be deathly slow. This distorts the way the programmer writes code. With this in mind, the obvious thing is to prototype your numerical routines in MATLAB, then insert those routines into a larger framework in a language more suitable for software development (such as Java or C# or something). But this also does not work because: 4. You cannot embed a MATLAB subroutine in anything else (at least not easily). MathWorks has a "way" to do this, but it involves inter process communication to a MATLAB process that is inherently single threaded. Not the basis for a fast, robust system in my view. What I WANT is a MATLABlike system with the following properties: 1. Retains the good features of MATLAB. Easy to prototype stuff in, responsive on the command line. 2. Once a subroutine is developed, it should easily and naturally integrate into a larger project, preferably a Java project. Some thoughts on what's been put out there so far: 1. As far as I know, no existing language is "best" for this task; they all fall short in terms of specialized matrix syntax. I've not looked closely at the SciPy, it might do better. But from what I've seen so far, its syntax is still significantly more encumbered than MATLAB's. 2. It would be death to build a nice system like this and use a substandard matrix library. The matrix library one uses defines the rest of the system, since it is such a basic data type. After trying them all, I've settled on MTJ (Matrix Toolkit for Java) as my base matrix library. It is based on the LAPACK/BLAS subroutines. You can use either LAPACK/BLAS converted to Java, or a native C version (called through JMI). MTJ avoids many mistakes that I've seen so often in Java matrix code: a) MTJ does not try to do generic matrices. One can certainly write a matrix library that does matrices of ints or fields or something "just" as easily as doubles. But it will be too slow to use. MTJ supports matrices of doubles, and it does so at full speed. b) MTJ does not use arrays of arrays; these are also not really appropriate as a matrix representation for numerics. MTJ uses FORTRANstyle columnmajor matrices. Some packages use rowmajor instead; I can live with that. But arrays of arrays are not really acceptable for numerics, and they don't allow for many of the "tricks" that are needed either. c) MTJ does not build a Complex class, then tell us that we can have complex matrices by building a matrix of Complex instances. That will be at least an order of magnitude too slow  as well as using 3 times as much memory  as traditional FORTRANstyle complexes. Supporting complex in Java is tricky. But at the base of it all, complex matrices need to basically be 2D arrays with twice as many elements in them as similar double matrices. 3. When I looked at Groovy, it seemed to be heavily oriented toward nonNumerical, highly generic computation, highly runtimetyped computation. I was not comfortable with it as a platform for high efficiency numerical computation. 4. Whatever is done, I need it to integrate seamlessly with Java. Groovy's approach of compiling code at runtime is very good in this respect, and it allows Java's JIT to get into the action and really make things fast. This will be a good approach in the end, since it will minimize the speed differential between Java code and our MATLAB like code (and thereby avoid distorting the way the programmer produces code). BeanShell is known to be pretty slow in general, so that's a big reason NOT to use it for this purpose; one would start building up some routines and then discover that the interpreter is not fast enough to run time. Then you're kind of stuck. The PySci approach looks promising; I would want to see that it integrates well with Java when running under Jython. However, I still have the reservation that it might be too slow in the end for more complex subroutines. Hope that helps! On a slightly different note: I was very excited when I first learned about Java scripting languages. They seemed like an obvious good way to easily add extensibility to my applications. But more and more, I've ended up not using them, opting for extensibility via plain old compiled Java instead. The reasons for this are: 1. Plain old Java integrates the most seamlessly, and it operates with a welldefined security model. 2. Plain old Java is very well supported with millions of dollars of ongoing investment. 3. NetBeans offers a lot of syntax aids for writing plain old Java. It always helps you look up which methods you can call, what the arguments are, etc. It is a big boon. When I leave plain old Java, I cannot make use of these features any more. Therefore, I've gotten good extensibility results so far by expecting that the person doing the extensions will write them in Java and load a Jar file. Every project is different, of course, but it's worked well for me so far. 