Menu

Tree [25984d] master /
 History

HTTPS access


File Date Author Commit
 doc 2020-02-07 Basile Starynkevitch Basile Starynkevitch [32c6e3] improved store_rps.cc to consistently use "attr...
 generated 2020-05-09 Basile Starynkevitch Basile Starynkevitch [e79cd0] adding Rps_SetObj::the_empty_set(), Rps_PaylCla...
 persistore 2020-05-10 Basile Starynkevitch Basile Starynkevitch [003d8d] renamed rpsldpy_class as rpsldpy_classinfo sinc...
 plugins 2019-12-27 Basile Starynkevitch Basile Starynkevitch [c26bd2] adding Rps_Dumper::add_code_addr
 .gdbinit 2020-04-27 Basile Starynkevitch Basile Starynkevitch [a3a7e3] stop in std::terminate
 .gitignore 2020-04-17 Basile Starynkevitch Basile Starynkevitch [32dcb6] better working RPSDEBUG_LOG & RPS_DEBUG_PRINTF ...
 .gitmodules 2019-05-18 Basile Starynkevitch Basile Starynkevitch [d7afdf] removed mps/
 .qt-refpersys.ini 2020-05-07 Basile Starynkevitch Basile Starynkevitch [ba66fc] adding string decoration
 COPYING 2019-04-28 Abhishek Chakravarti Abhishek Chakravarti [576f79] Atomic commit of new non-MPS based code
 LICENSE 2019-04-28 Abhishek Chakravarti Abhishek Chakravarti [576f79] Atomic commit of new non-MPS based code
 Makefile 2020-05-13 Basile Starynkevitch Basile Starynkevitch [25984d] adding RPS_SHORTGIT_ID from FLTK branch
 README.md 2020-03-10 Nimesh Neema Nimesh Neema [75dfba] Fixed a typo in the package name. Changed from ...
 app-refpersys.json 2020-05-04 Basile Starynkevitch Basile Starynkevitch [316ec1] improve font for warning, debug, fatal, assert...
 appli_qrps.cc 2020-05-11 Basile Starynkevitch Basile Starynkevitch [dae216] The --no-aslr option disables ASLR and redirect...
 backtrace_rps.cc 2020-05-06 Basile Starynkevitch Basile Starynkevitch [da5331] use [] not <> in Rps_Backtracer::bt_error_method
 build-temporary-plugin.sh 2020-02-17 Basile Starynkevitch Basile Starynkevitch [602823] better temporary plugins
 command_qrps.cc 2020-05-10 Basile Starynkevitch Basile Starynkevitch [fc1214] more debug related to Qt-relevant payloads
 garbcoll_rps.cc 2020-01-28 Basile Starynkevitch Basile Starynkevitch [7d6e6f] adding garbage collected values in major Qt win...
 generate-gitid.sh 2020-05-13 Basile Starynkevitch Basile Starynkevitch [eb06b7] copied from FLTK branch altredump make target a...
 generate-timestamp.sh 2020-02-21 Basile Starynkevitch Basile Starynkevitch [74d1fa] generate rps_makefile
 indent-cxx-files.sh 2019-11-26 Basile Starynkevitch Basile Starynkevitch [2450fd] omake indent works with indent-cxx-files.sh script
 inline_rps.hh 2020-05-10 Basile Starynkevitch Basile Starynkevitch [bbcd9c] Rps_ObjectZone::payload_type_name not inlined, ...
 magicattrs_rps.cc 2020-01-19 Basile Starynkevitch Basile Starynkevitch [bcec5a] using real constants outside of comments in mag...
 main_rps.cc 2020-05-11 Basile Starynkevitch Basile Starynkevitch [59dce2] adding early --no-aslr option to disable addres...
 morevalues_rps.cc 2020-02-15 Basile Starynkevitch Basile Starynkevitch [0cc00f] more verbose Rps_QtPtrZone::val_output
 objects_rps.cc 2020-05-10 Basile Starynkevitch Basile Starynkevitch [bbcd9c] Rps_ObjectZone::payload_type_name not inlined, ...
 oid_rps.hh 2020-01-30 Basile Starynkevitch Basile Starynkevitch [2ddf46] reindented
 output_qrps.cc 2020-05-10 Basile Starynkevitch Basile Starynkevitch [fc1214] more debug related to Qt-relevant payloads
 primes_rps.cc 2019-11-26 Basile Starynkevitch Basile Starynkevitch [3fce46] added gitid everywhere, implemented some of Rps...
 qthead_qrps.hh 2020-05-11 Abhishek Chakravarti Abhishek Chakravarti [6d3ee5] Start work on dialog to display object legend
 refpersys.hh 2020-05-11 Basile Starynkevitch Basile Starynkevitch [59dce2] adding early --no-aslr option to disable addres...
 refpersys_logo.svg 2020-05-01 Basile Starynkevitch Basile Starynkevitch [f8bf9a] adding logo by Gaëtan Tapon
 rps_manifest.json 2020-05-10 Basile Starynkevitch Basile Starynkevitch [003d8d] renamed rpsldpy_class as rpsldpy_classinfo sinc...
 scalar_rps.cc 2020-01-05 Basile Starynkevitch Basile Starynkevitch [c43c0b] created `set` class _6JYterg6iAu00cV9Ye and `cl...
 store_rps.cc 2020-05-10 Basile Starynkevitch Basile Starynkevitch [003d8d] renamed rpsldpy_class as rpsldpy_classinfo sinc...
 values_rps.cc 2020-05-09 Basile Starynkevitch Basile Starynkevitch [e79cd0] adding Rps_SetObj::the_empty_set(), Rps_PaylCla...
 window_qrps.cc 2020-05-11 Abhishek Chakravarti Abhishek Chakravarti [6d3ee5] Start work on dialog to display object legend

Read Me

refpersys

This project is on https://gitlab.com/bstarynk/refpersys/ and has its
own web site on http://refpersys.org/ where more details are given.

A research project

The Reflective Persistent System language is a research project,
taking many good ideas from
Bismon, sharing a lot of goals
(except static source code analysis) with it but avoiding bad ideas
from it.

For Linux/x86-64 only. Don't even think of running that on non-Linux
systems, unless you provide patches for that. And we need a 64 bits
processor.

We have multi-threading in mind, but in some limited way. We think of
a pool of a few dozen Pthreads at most (but not of a thousand
Pthreads).

We absolutely want to avoid any
GIL

Don't expect anything useful from RefPerSys before at least 2023. But
you could have fun sharing our ideas and experimenting yours.

We considered previously to use the garbage collector from Ravenbrook
MPS
. Since that project is
now obsolete, we gave up that idea.

Don't expect RefPerSys to be a realistic project. It is not (and
certainly not before 2025).

Some draft design ideas are written in the RefPerSys design
draft
which is
very incomplete work in progress.

If you happen to know about any research call for proposals or funding
opportunities in Europe (Euro zone) about this (e.g. related to
artificial general
intelligence

goals) please mention them to Basile
Starynkevitch
(France) by email to
basile@starynkevitch.net.

persistent values

Like Bismon, RefPerSys is managing an evolving,
persistable,
heap of
dynamically
typed,
garbage-collected,
values, exactly like Bismon does (see §2 Data and its persistence in
Bismon
of the Bismon draft
report
...).
The
semantics
-but not the syntax- of values is on purpose close to those of
Lisp,
Python, Scheme, JavaScript, Go, or even Java, etc.... Most of these
RefPerSys values are
immutable; for
example boxed strings, sets -with dichotomic
search
inside them-
or tuples of references to objects,
closures,
etc ...- But some of these RefPerSys values are mutable objects, and
by convention every mutable value is called an object. Each
mutable object has its own lock, and any access or update of mutable
data inside objects is generally made under its lock. By exception,
some very few, and very often accessed, mutable fields inside objects
(e.g. their class) are
atomic pointers, for
performance reasons. Objects have (exactly like in Bismon) attributes,
components, and some optional payload. An attribute is an association
between an object (called the key of that attribute) and some
RefPerSys arbitrary non-nil value (called the value of that
attribute), and each object has its mutable associative table of
attributes. A component is an arbitrary RefPerSys value, and each
object has some mutable vector of them. The payload is any additional
mutable data (e.g. a string buffer, an mutable vector or hashtable of
values, some class metadata, etc...), owned by the object. So the data
model of a RefPerSys object is as flexible as the data model of
JavaScript. However, RefPerSys objects have a mutable class defining
their behavior (not their fields, which are represented as
attributes) so used for dynamic message
dispatching
.

Worker threads and agenda of tasklets

RefPerSys will have a small fixed set of worker threads (perhaps a
dozen of them), each running some agenda loop; we would have some
central data structure (called the agenda, like in
Bismon (see §1.7 of the Bismon
draft
report
...)
organizing runnable tasklets (e.g. a few FIFO queues of them). A
tasklet should conceptually run quickly (in a few milliseconds) and is
allowed to add or remove runnable tasklets (including itself) to the
agenda. Each worker thread is looping: fetching a runnable tasklet
from the agenda, then running that tasklet.

This research project is
GPLv3+ licensed and
copyrighted by the RefPerSys team, currently made of:

Some files might be "borrowed" from other similar GPLv3+ licensed
projects (notably from Bismon...)
and could retain their original copyright owner.

Contributing

Please ask, by email, the above RefPerSys team for C++ coding
conventions before starting non-trivial contributions to the C++
runtime of RefPerSys. If you are contributing to its C++ runtime,
please run make clean after any git pull.

The GPLv3+ license of RefPerSys is unlikely to change before 2025 (and
probably even after).

File conventions

The RefPerSys runtime is implemented in C++17, with hand-written C++
code in *_rps.cc, and has a single C++ header file refpersys.hh.
We don't claim to be C++ gurus. Most C++ experts could write more
genuine C++ code than we do and will find our C++ code pityful. We
just want our runtime to work, not to serve as an example of well
written C++17 code.

The prefered C++ compiler (in 2020Q1) for RefPerSys is
GCC version 8 or
9.

It could be worthwhile to sometimes compile RefPerSys with clang++
(see http://clang.llvm.org/ for more). In practice make clean then
make RPS_BUILD_CXX=clang++. The Clang static
analyzer
could be useful, but
expect a lot of warnings, since C++ dont have flexible array
members
but we
need something similar.

RefPerSys may later also use generated C++ code in some _*.cc
file, some generated C code in some _*.c and generated C or C++
headers in some _*.h files. By convention, files starting with an
underscore are generated (but they may, or not, being git
versioned). Some generated C++ files which are git add-ed are under
generated/ subdirectory.

We could need later some C++ generating program (maybe similar in
spirit to Bismon's
BM_makeconst.cc. it
would then be named rps_* for the executable, and fits in a single
self-sufficient rps_*.cc C++ file. Perhaps we'll later have some
rps_makeconst executable to generate some C++, and its source in
some rps_makeconst.cc. So the convention is that any future C++
generating source code is in some rps_*.cc C++ file. In commit
65a8f84aeffc9ba4e468 or newer the dumping facility is scanning
hand-written C++ source files to emit generated/rps-constants.hh

Building and dependencies.

The build automation
tool used here is GNU make since
commit 6d56f50660c7cc41b9 (it was
omake before).

You should have compiled and installed Ian Taylor's
libbacktrace,
e.g. under /usr/local/. You may need to add /usr/local/lib/ in
your /etc/ld.so.conf and run ldconfig -v -a after installation of
that libbacktrace.

The JsonCPP and
Qt5 C++ libraries are needed, and also a
mail command in your $PATH.

To install the dependencies on a recent Debian 10 buster or
Ubuntu 19 system, you could run the following
steps

  • sudo apt install libunistring-dev
  • sudo apt install qt5-default libqt5x11extras5-dev libqt5xdg-dev
  • sudo apt install libjsoncpp-dev
  • sudo apt install ccache g++ make build-essential remake gdb automake
  • sudo apt install ttf-unifont ttf-mscorefonts-installer unifont msttcorefonts fonts-ubuntu fonts-tuffy fonts-spleen fonts-roboto fonts-recommended fonts-yanone-kaffeesatz fonts-play fonts-eurofurence fonts-ecolier-court fonts-dejavu fonts-croscore fonts-cegui fonts-inter fonts-inconsolata
  • git clone https://github.com/ianlancetaylor/libbacktrace.git
  • cd libbacktrace
  • ./configure
  • make
  • make install

Build instructions

You need a recent C++17 compiler such as g++ (We use
GCC version 9) or
clang++ version, libunistring-dev. Look into,
and perhaps improve, our Makefile. Build using make -j 3 or more.

You also should do a make clean after any git pull

Garbage collection

RefPerSys is a multi-threaded and garbage-collected system. We are
fully aware that multi-thread friendly and efficient garbage
collection is a very difficult topic.

The reader unaware of garbage collection terminology (precise
vs. conservative GC, tracing garbage collection, copying GC, GC
roots, GC locals, mark and sweep GC, incremental GC, write
barrier
) is advised to read the GC handbook
and is expected to have read very carefully the Tracing Garbage
Collection

wikipage.

We have considered to use Ravenbrook
MPS
. Unfortunately for us,
that very good GC implementation seems unmaintained, and with almost a
hundred thousand lines of code is very difficult to grasp, understand,
and adopt. Finally, using MPS is not reasonable in our eyes.

We also did consider using Boehm
GC
. That conservative GC is really simple
to use (basically, use GC_MALLOC instead of malloc, etc...) and is
C++ friendly. However,
it is rather slow (even for allocations of GC-ed zones, and we would
have many of them) and might be quite unsuitable for programs having
lots of circular
references
, and
reflexive programs have lots of them.

Garbage collection ideas

So we probably are heading towards developing our own precise and
multi-thread friendly GC (hopefully "better" than Boehm, but worse
than MPS), with the following ideas:

  • local roots in the local frame are explicit, like in Bismon
    (LOCALFRAME_BM macro of
    bismon/cmacros_BM.h)
    or Ocaml (see its §20.5 Living in harmony with the garbage
    collector

    and CAMLlocal* and CAMLparam* and CAMLreturn* macros). The
    local call frame is conventionally reified as the _ local
    variable, so an automatic
    variable
    GC-ed
    pointer foo is coded _.foo in our C++ runtime. A local frame in
    RefPerSys should be declared in C++ using RPS_LOCALFRAME.

  • our garbage collector manages memory zones inside a set of
    mmap-ed memory blocks : either small blocks of a megaword that
    is 8 megabytes (i.e. RPS_SMALL_BLOCK_SIZE), or large blocks of 8
    megawords (i.e. RPS_LARGE_BLOCK_SIZE). Values are inside such
    memory zones. Mutable objects may contain -perhaps indirectly-
    pointers to quasivalues (notably in their payload), that is to
    garbage collected zones which are not first-class values. A typical
    example of quasivalue could be some bucket in some (fully
    RefPerSys-implemented) array hash table (appearing as the payload of
    some object), in which buckets would be some small and mutable
    dynamic arrays of entries with colliding hashes. Such buckets indeed
    garbage collected zones, but are not themselves values (since they
    are mutable, but not reified as objects).

  • The GC allocation operations are explicitly given the pointer to the
    local frame (i.e. &_, named RPS_CURFRAME), which is linked to
    the previous call frame and so on. That pointer is passed to every
    routine needing the GC (i.e. allocating or mutating values); only
    functions which don't allocate or mutate (e.g. accessor or getter
    functions
    ) can avoid
    getting that local frame pointer.

  • The C++ runtime, and any code generated in RefPerSys, should
    explicitly be in A-normal
    form
    . So coding z = f(g(x),y) is forbidden in C++ (where f and g are C++ functions
    using the GC). Instead, reserve a local slot such as _.tmp1 in the
    local frame, then code _.tmp1 = g(RPS_CURFRAME, _.x); _.z = f(RPS_CURFRAME, _.tmp1, _.y);
    In less pedantic terms, we should do only one call (to GC-aware
    functions) or one allocation per statement; and every such
    call
    to some allocation primitive, or to a GC-aware function,
    should pass the RPS_CURFRAME and use RPL_LOCALFRAME in the
    calling function
    .

  • A write barrier
    should be called after object or quasivalue updates, and before any
    other allocation or update of some other object, value, or
    quasivalue. In practice, code
    _.foo.rps_write_barrier(RPS_CURFRAME) or more simply
    _.foo.RPS_WRITE_BARRIER()

  • Every garbage-collection aware thread (a thread allocating GC-ed
    values, mutating GC-ed quasivalues or objects, running the GC
    forcibly) should call quite often, typically once per few
    milliseconds, the Rps_GarbageCollector::maybe_garbcoll routine. If
    this is not possible (e.g. before a potentially blocking read or
    poll system call), special precautions should be taken. Forgetting
    to call that maybe_garbcoll function often enough (typically every
    few milliseconds) could maybe crash the system.

  • Consequently, as a rule of thumb, any routine which can directly or
    indirectly
    allocate GC-ed values or quasi-values, or directly or
    indirectly
    mutate GC-ed values or quasi-values, should take a
    calling callframe argument. We might need to consider: putting that
    specific callframe argument in some global register, using GCC
    register ... asm extension to define global register
    variables

    and compile with the -ffixed-reg code generation
    option
    .
    By coding convention, that calling callframe argument should be
    preferably named callingfra, and should be the first argument of
    every function or methods (member functions in C++ classes)
    requiring the GC.

useful references

For Bismon, see http://github.com/bstarynk/bismon and read its dfraft
Bismon report

(updated quite often).

For the C++17 language, see this C++ reference.

For Linux programming, see Advanced Linux
Programming
and the
syscalls(2)
man page.

For GCC, see notably its Invoking
GCC
chapter.

For garbage collection, read Paul Wilson's Uniprocessor Garbage
Collection
Techniques

old paper, then read the GC handbook

useful and relevant libraries

We already need the following libraries:

We may want to use, either soon or within a few years, (usually after 2022) interesting C or C++ libraries such as:

We should list other libraries interesting for us here, just in case (to avoid forgetting them).

past contributors

Thanks to Niklas Rosencrantz (Sweden) for past minor contributions.

See also

https://gitlab.com/abhishekchakravarti/scheme-interpreter-exercise/