Re: [Rlib-users] OpenCReports 0.1 released

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

2022. 04. 24. 17:40 keltezéssel, Böszörményi Zoltán via Rlib-users írta:
> Hi,
> 
> this was brewing for about 3 years now but I am happy
> to announce the first pre-release of OpenCReports,
> my take on re-implementing RLIB from scratch.
> 
> https://github.com/zboszor/OpenCReports
> https://github.com/zboszor/OpenCReports/releases/tag/v0.1
> 
> I don't have any ETA for actually finishing it, though.
> 
> FYI, The name comes from the fact that it's written in C
> and it's developed in the open.
> 
> THIS PRE-RELEASE DOESN'T HAVE ANY OUTPUT DRIVER.
> AS SUCH, IT'S NOT USEFUL FOR END-USERS YET.
> 
> Having said that, it's quite full featured in the
> data handling department.
> 
> I apologize in advance about RLIB bashing, but I know
> quite a lot about its internals since I am its current
> maintainer.
> 
> OpenCReports started out as an adventure in Flex and
> Bison, mostly because expressions in RLIB used a home
> grown parser and it had quite some bugs. For one, it was
> forgiving about syntax errors in corner cases.
> E.g. a missing closing parenthesis at the end of the
> expression string was allowed.
> 
> On the other hand, OpenCReports is not forgiving.
> It throws and error in this case, i.e. the expression
> result will be an error message.
> 
> The grammar code is quite bulletproof, as in it doesn't
> leak memory and doesn't have use-after-free bugs.
> In general, the code is always compiled with ASAN and
> UBSAN during development.
> 
> The grammar handles:
> * Arithmetic operators, including the famous Facebook
>    challange about implicit multiplication.
>    This means that these below are not the same.
>    Controversial, but correct in academic environments.
> 
>    1/(1+1)(2+2) equals to 1/8
>    1/(1+1)*(2+2) equals to 2
> 
> * Binary operators
> * Logic operators
> * Unary operators
> * Function calls
> 
> One ambiguous operator is "^". By default "x^y" is
> "x XOR y" (since I like C operators) but it's selectable
> to be pow(x, y) to be more compatible with RLIB.
> 
> Expressions can be (and are) optimized after parsing.
> This is done to reduce the amount of work during dataset
> traversal. Fully constant expressions, no matter how
> complex they are, are pre-computed by the optimizer.
> 
> There are four data types in OpenCReports: string,
> error, number and datetime.
> 
> Strings are UTF-8 through-and-through.
> 
> Errors are actually strings behind the scenes, they just
> contain and error message. But if they are used in other
> expressions, the error message and error type is propagated
> upward to the parent expression.
> 
> In RLIB, numbers were handled as fixed point values stored
> in a 64-bit integer with 7 decimal digits. Integers were
> multiplied by 10 million and stored in the 64-bit
> representation. It had its drawbacks:
> * The constant multiplication and divison by 10 million
>    always rounded down. In some cases, adding small
>    percentages that added up to 100.0% on paper didn't add up
>    to 100.0% in an RLIB report.
> * Relatively small numbers may have been overflowing the
>    64-bit integer if processed further, e.g. in variables.
> 
> On the other hand, numbers are handled by MPFR in
> OpenCReports. The precision is selectable but by default
> it's 256 bits. Since there is no constant adjustment for
> the fixed precision and there is always surplus precision,
> processing numbers doesn't suffer from the same bugs as RLIB.
> 
> While using MPFR may sound slower than using 64-bit storage
> and fixed precision (it certainly is) but RLIB doesn't have
> an expression optimizer and this already covers most of the
> speed loss. The fact that it is actually numerically correct
> worth the change.
> 
> Datetime is four data types in one:
> * datetime (timestamp) with valid date and time
> * date
> * time
> * interval
> 
> RLIB separated parsing these into different functions.
> In OpenCReports, all of them are aliases to stodt().
> 
> There is also a separate interval() function to parse or
> create an interval value.
> 
> All values may be NULL.
> 
> Expressions may be "delayed", i.e. their result will show
> the last value of the expression in the dataset. This is also
> a feature of RLIB.

As a clarification, I moved the above paragraph to the
intended location and fixed the missing word so it reads
"last value".

> Data traversal is done a little differently.
> E.g. RLIB needs to go back one record in the dataset to
> detect breaks. Some data sources don't allow going backwards
> but allows restarting the dataset from the first row.
> Because of this, RLIB needed to cache all the rows regardless
> of the data source, be it PostgreSQL, MySQL or ODBC.
> 
> On the other hand, OpenCReports separated the datasource
> from the row traversal in a way that the dataset pointer
> doesn't need go backward. OpenCReports caches the last 2 rows
> from the dataset with one row lookahead to detect the end.
> This allows OpenCReports avoid extra caching of rows.
> 
> According to the original developers of RLIB, the follower
> queries should work like this:
> * 1:1 followers are laid out side by side (record by record)
>    along with the main query. The dataset lasts while the
>    main query lasts, the 1:1 followers are either cut if they
>    contain more rows, or their fields are empty (NULL) if
>    they contain fewer rows than the main query.
> * N:1 followers should work exactly like LEFT OUTER JOIN in SQL
> 
> The RLIB implementation of N:1 follower queries is not correct
> and doesn't produce the same result as a LEFT OUTER JOIN.
> It's fixed in OpenCReports.
> 
> Breaks are implemented in OpenCReports.
> 
> All of the RLIB variable types (and more) are implemented
> in OpenCReports.
> 
> In RLIB, variables are special entities.
> 
> In OpenCReports, they reuse expression handling with a
> twist: recursive expressions were added exactly for
> satisfying variables.
> 
> But recursive expressions (referencing "r.self") are an
> integral part of expression handling in OpenCReports and
> can be used by user expressions. In fact, it's on my TODO
> list to allow creating custom variables by specifying
> the base type, base expression, initial value, two
> intermediate expression and the result expression.
> 
> OpenCReports supports all the basic variable types of RLIB:
> count, expression, sum, average, lowest and highest.
> 
> There are some variable variants with or without ignoring
> NULLs from the dataset. These are: "countall" and "averageall".
> When NULLs are not ignored, rows are counted and NULLs are
> replaced with 0 when averaging.
> 
> Variables may have a "resetonbreak" setting, like in RLIB.
> 
> Variables may also be "precalculated", like in RLIB.
> If they have a resetonbreak setting, they will show the value
> of the last row in the break. Without resetonbreak, they will
> show the value of the last row in the dataset.
> 
> The dataset is processed twice if there are delayed
> expressions or precalculated variables.
> 
> OpenCReports allows mixing delayed, non-delayed subexpressions
> and precalculated variables in the same expression.
> AFAIK, this was not possible in RLIB.
> 
> Almost all of the RLIB functions are implemented in
> OpenCReports. The two missing ones are format() and dtosf().
> Many other functions supported by MPFR are also implemented.
> 
> The C API of OpenCReports is extensive.
> There are quite a few unit tests that utilize the API's
> certain aspects.
> 
> There is an initial documentation in SGML from which
> a PDF is generated during the build. It's far from
> complete and it doesn't even cover the current state of
> the code.
> 
> The original XML DTD was not covering everything that was
> possible with RLIB's report XML. I reconstructed it from
> the source code and extended it with the ones supported
> by RLIB and with some new additions. E.g. "delayed" and
> "precalculate" are now aliases in variables.
> 
> Currently, OpenCReports only handle any XML tags related
> to report data processing described above. The output
> related ones, i.e. <Output>, <Detail>, <NoData> are not
> handled.
> 
> There is one extension to the RLIB DTD. If the report XML's
> top node is <OpenCReport> then further XML nodes are available:
> <Datasources> and <Queries>. This will allow describing
> practically everything in XML with minimal programming.
> 
> An RLIB wrapper is on my TODO list.
> 
> As I described above, OpenCReport isn't and won't be
> bug-for-bug compatible with RLIB.
> 
> Comments are welcome.
> 
> Best regards,
> Zoltán Böszörményi
> 
> 
> _______________________________________________
> Rlib-users mailing list
> Rli...@li...
> https://lists.sourceforge.net/lists/listinfo/rlib-users