From: Böszörményi Z. <zb...@pr...> - 2022-04-24 15:53:35
|
Hi, this was brewing for about 3 years now but I am happy to announce the first pre-release of OpenCReports, my take on re-implementing RLIB from scratch. https://github.com/zboszor/OpenCReports https://github.com/zboszor/OpenCReports/releases/tag/v0.1 I don't have any ETA for actually finishing it, though. FYI, The name comes from the fact that it's written in C and it's developed in the open. THIS PRE-RELEASE DOESN'T HAVE ANY OUTPUT DRIVER. AS SUCH, IT'S NOT USEFUL FOR END-USERS YET. Having said that, it's quite full featured in the data handling department. I apologize in advance about RLIB bashing, but I know quite a lot about its internals since I am its current maintainer. OpenCReports started out as an adventure in Flex and Bison, mostly because expressions in RLIB used a home grown parser and it had quite some bugs. For one, it was forgiving about syntax errors in corner cases. E.g. a missing closing parenthesis at the end of the expression string was allowed. On the other hand, OpenCReports is not forgiving. It throws and error in this case, i.e. the expression result will be an error message. The grammar code is quite bulletproof, as in it doesn't leak memory and doesn't have use-after-free bugs. In general, the code is always compiled with ASAN and UBSAN during development. The grammar handles: * Arithmetic operators, including the famous Facebook challange about implicit multiplication. This means that these below are not the same. Controversial, but correct in academic environments. 1/(1+1)(2+2) equals to 1/8 1/(1+1)*(2+2) equals to 2 * Binary operators * Logic operators * Unary operators * Function calls One ambiguous operator is "^". By default "x^y" is "x XOR y" (since I like C operators) but it's selectable to be pow(x, y) to be more compatible with RLIB. Expressions can be (and are) optimized after parsing. This is done to reduce the amount of work during dataset traversal. Fully constant expressions, no matter how complex they are, are pre-computed by the optimizer. There are four data types in OpenCReports: string, error, number and datetime. Strings are UTF-8 through-and-through. Errors are actually strings behind the scenes, they just contain and error message. But if they are used in other expressions, the error message and error type is propagated upward to the parent expression. In RLIB, numbers were handled as fixed point values stored in a 64-bit integer with 7 decimal digits. Integers were multiplied by 10 million and stored in the 64-bit representation. It had its drawbacks: * The constant multiplication and divison by 10 million always rounded down. In some cases, adding small percentages that added up to 100.0% on paper didn't add up to 100.0% in an RLIB report. * Relatively small numbers may have been overflowing the 64-bit integer if processed further, e.g. in variables. On the other hand, numbers are handled by MPFR in OpenCReports. The precision is selectable but by default it's 256 bits. Since there is no constant adjustment for the fixed precision and there is always surplus precision, processing numbers doesn't suffer from the same bugs as RLIB. While using MPFR may sound slower than using 64-bit storage and fixed precision (it certainly is) but RLIB doesn't have an expression optimizer and this already covers most of the speed loss. The fact that it is actually numerically correct worth the change. Datetime is four data types in one: * datetime (timestamp) with valid date and time * date * time * interval Expressions may be "delayed", i.e. their result will show the value of the expression in the dataset. This is also a features of RLIB. RLIB separated parsing these into different functions. In OpenCReports, all of them are aliases to stodt(). There is also a separate interval() function to parse or create an interval value. All values may be NULL. Data traversal is done a little differently. E.g. RLIB needs to go back one record in the dataset to detect breaks. Some data sources don't allow going backwards but allows restarting the dataset from the first row. Because of this, RLIB needed to cache all the rows regardless of the data source, be it PostgreSQL, MySQL or ODBC. On the other hand, OpenCReports separated the datasource from the row traversal in a way that the dataset pointer doesn't need go backward. OpenCReports caches the last 2 rows from the dataset with one row lookahead to detect the end. This allows OpenCReports avoid extra caching of rows. According to the original developers of RLIB, the follower queries should work like this: * 1:1 followers are laid out side by side (record by record) along with the main query. The dataset lasts while the main query lasts, the 1:1 followers are either cut if they contain more rows, or their fields are empty (NULL) if they contain fewer rows than the main query. * N:1 followers should work exactly like LEFT OUTER JOIN in SQL The RLIB implementation of N:1 follower queries is not correct and doesn't produce the same result as a LEFT OUTER JOIN. It's fixed in OpenCReports. Breaks are implemented in OpenCReports. All of the RLIB variable types (and more) are implemented in OpenCReports. In RLIB, variables are special entities. In OpenCReports, they reuse expression handling with a twist: recursive expressions were added exactly for satisfying variables. But recursive expressions (referencing "r.self") are an integral part of expression handling in OpenCReports and can be used by user expressions. In fact, it's on my TODO list to allow creating custom variables by specifying the base type, base expression, initial value, two intermediate expression and the result expression. OpenCReports supports all the basic variable types of RLIB: count, expression, sum, average, lowest and highest. There are some variable variants with or without ignoring NULLs from the dataset. These are: "countall" and "averageall". When NULLs are not ignored, rows are counted and NULLs are replaced with 0 when averaging. Variables may have a "resetonbreak" setting, like in RLIB. Variables may also be "precalculated", like in RLIB. If they have a resetonbreak setting, they will show the value of the last row in the break. Without resetonbreak, they will show the value of the last row in the dataset. The dataset is processed twice if there are delayed expressions or precalculated variables. OpenCReports allows mixing delayed, non-delayed subexpressions and precalculated variables in the same expression. AFAIK, this was not possible in RLIB. Almost all of the RLIB functions are implemented in OpenCReports. The two missing ones are format() and dtosf(). Many other functions supported by MPFR are also implemented. The C API of OpenCReports is extensive. There are quite a few unit tests that utilize the API's certain aspects. There is an initial documentation in SGML from which a PDF is generated during the build. It's far from complete and it doesn't even cover the current state of the code. The original XML DTD was not covering everything that was possible with RLIB's report XML. I reconstructed it from the source code and extended it with the ones supported by RLIB and with some new additions. E.g. "delayed" and "precalculate" are now aliases in variables. Currently, OpenCReports only handle any XML tags related to report data processing described above. The output related ones, i.e. <Output>, <Detail>, <NoData> are not handled. There is one extension to the RLIB DTD. If the report XML's top node is <OpenCReport> then further XML nodes are available: <Datasources> and <Queries>. This will allow describing practically everything in XML with minimal programming. An RLIB wrapper is on my TODO list. As I described above, OpenCReport isn't and won't be bug-for-bug compatible with RLIB. Comments are welcome. Best regards, Zoltán Böszörményi |