From: Böszörményi Z. <zb...@pr...> - 2022-04-24 16:03:28
|
2022. 04. 24. 17:40 keltezéssel, Böszörményi Zoltán via Rlib-users írta: > Hi, > > this was brewing for about 3 years now but I am happy > to announce the first pre-release of OpenCReports, > my take on re-implementing RLIB from scratch. > > https://github.com/zboszor/OpenCReports > https://github.com/zboszor/OpenCReports/releases/tag/v0.1 > > I don't have any ETA for actually finishing it, though. > > FYI, The name comes from the fact that it's written in C > and it's developed in the open. > > THIS PRE-RELEASE DOESN'T HAVE ANY OUTPUT DRIVER. > AS SUCH, IT'S NOT USEFUL FOR END-USERS YET. > > Having said that, it's quite full featured in the > data handling department. > > I apologize in advance about RLIB bashing, but I know > quite a lot about its internals since I am its current > maintainer. > > OpenCReports started out as an adventure in Flex and > Bison, mostly because expressions in RLIB used a home > grown parser and it had quite some bugs. For one, it was > forgiving about syntax errors in corner cases. > E.g. a missing closing parenthesis at the end of the > expression string was allowed. > > On the other hand, OpenCReports is not forgiving. > It throws and error in this case, i.e. the expression > result will be an error message. > > The grammar code is quite bulletproof, as in it doesn't > leak memory and doesn't have use-after-free bugs. > In general, the code is always compiled with ASAN and > UBSAN during development. > > The grammar handles: > * Arithmetic operators, including the famous Facebook > challange about implicit multiplication. > This means that these below are not the same. > Controversial, but correct in academic environments. > > 1/(1+1)(2+2) equals to 1/8 > 1/(1+1)*(2+2) equals to 2 > > * Binary operators > * Logic operators > * Unary operators > * Function calls > > One ambiguous operator is "^". By default "x^y" is > "x XOR y" (since I like C operators) but it's selectable > to be pow(x, y) to be more compatible with RLIB. > > Expressions can be (and are) optimized after parsing. > This is done to reduce the amount of work during dataset > traversal. Fully constant expressions, no matter how > complex they are, are pre-computed by the optimizer. > > There are four data types in OpenCReports: string, > error, number and datetime. > > Strings are UTF-8 through-and-through. > > Errors are actually strings behind the scenes, they just > contain and error message. But if they are used in other > expressions, the error message and error type is propagated > upward to the parent expression. > > In RLIB, numbers were handled as fixed point values stored > in a 64-bit integer with 7 decimal digits. Integers were > multiplied by 10 million and stored in the 64-bit > representation. It had its drawbacks: > * The constant multiplication and divison by 10 million > always rounded down. In some cases, adding small > percentages that added up to 100.0% on paper didn't add up > to 100.0% in an RLIB report. > * Relatively small numbers may have been overflowing the > 64-bit integer if processed further, e.g. in variables. > > On the other hand, numbers are handled by MPFR in > OpenCReports. The precision is selectable but by default > it's 256 bits. Since there is no constant adjustment for > the fixed precision and there is always surplus precision, > processing numbers doesn't suffer from the same bugs as RLIB. > > While using MPFR may sound slower than using 64-bit storage > and fixed precision (it certainly is) but RLIB doesn't have > an expression optimizer and this already covers most of the > speed loss. The fact that it is actually numerically correct > worth the change. > > Datetime is four data types in one: > * datetime (timestamp) with valid date and time > * date > * time > * interval > > RLIB separated parsing these into different functions. > In OpenCReports, all of them are aliases to stodt(). > > There is also a separate interval() function to parse or > create an interval value. > > All values may be NULL. > > Expressions may be "delayed", i.e. their result will show > the last value of the expression in the dataset. This is also > a feature of RLIB. As a clarification, I moved the above paragraph to the intended location and fixed the missing word so it reads "last value". > Data traversal is done a little differently. > E.g. RLIB needs to go back one record in the dataset to > detect breaks. Some data sources don't allow going backwards > but allows restarting the dataset from the first row. > Because of this, RLIB needed to cache all the rows regardless > of the data source, be it PostgreSQL, MySQL or ODBC. > > On the other hand, OpenCReports separated the datasource > from the row traversal in a way that the dataset pointer > doesn't need go backward. OpenCReports caches the last 2 rows > from the dataset with one row lookahead to detect the end. > This allows OpenCReports avoid extra caching of rows. > > According to the original developers of RLIB, the follower > queries should work like this: > * 1:1 followers are laid out side by side (record by record) > along with the main query. The dataset lasts while the > main query lasts, the 1:1 followers are either cut if they > contain more rows, or their fields are empty (NULL) if > they contain fewer rows than the main query. > * N:1 followers should work exactly like LEFT OUTER JOIN in SQL > > The RLIB implementation of N:1 follower queries is not correct > and doesn't produce the same result as a LEFT OUTER JOIN. > It's fixed in OpenCReports. > > Breaks are implemented in OpenCReports. > > All of the RLIB variable types (and more) are implemented > in OpenCReports. > > In RLIB, variables are special entities. > > In OpenCReports, they reuse expression handling with a > twist: recursive expressions were added exactly for > satisfying variables. > > But recursive expressions (referencing "r.self") are an > integral part of expression handling in OpenCReports and > can be used by user expressions. In fact, it's on my TODO > list to allow creating custom variables by specifying > the base type, base expression, initial value, two > intermediate expression and the result expression. > > OpenCReports supports all the basic variable types of RLIB: > count, expression, sum, average, lowest and highest. > > There are some variable variants with or without ignoring > NULLs from the dataset. These are: "countall" and "averageall". > When NULLs are not ignored, rows are counted and NULLs are > replaced with 0 when averaging. > > Variables may have a "resetonbreak" setting, like in RLIB. > > Variables may also be "precalculated", like in RLIB. > If they have a resetonbreak setting, they will show the value > of the last row in the break. Without resetonbreak, they will > show the value of the last row in the dataset. > > The dataset is processed twice if there are delayed > expressions or precalculated variables. > > OpenCReports allows mixing delayed, non-delayed subexpressions > and precalculated variables in the same expression. > AFAIK, this was not possible in RLIB. > > Almost all of the RLIB functions are implemented in > OpenCReports. The two missing ones are format() and dtosf(). > Many other functions supported by MPFR are also implemented. > > The C API of OpenCReports is extensive. > There are quite a few unit tests that utilize the API's > certain aspects. > > There is an initial documentation in SGML from which > a PDF is generated during the build. It's far from > complete and it doesn't even cover the current state of > the code. > > The original XML DTD was not covering everything that was > possible with RLIB's report XML. I reconstructed it from > the source code and extended it with the ones supported > by RLIB and with some new additions. E.g. "delayed" and > "precalculate" are now aliases in variables. > > Currently, OpenCReports only handle any XML tags related > to report data processing described above. The output > related ones, i.e. <Output>, <Detail>, <NoData> are not > handled. > > There is one extension to the RLIB DTD. If the report XML's > top node is <OpenCReport> then further XML nodes are available: > <Datasources> and <Queries>. This will allow describing > practically everything in XML with minimal programming. > > An RLIB wrapper is on my TODO list. > > As I described above, OpenCReport isn't and won't be > bug-for-bug compatible with RLIB. > > Comments are welcome. > > Best regards, > Zoltán Böszörményi > > > _______________________________________________ > Rlib-users mailing list > Rli...@li... > https://lists.sourceforge.net/lists/listinfo/rlib-users |