Thread: [eclipse-clp-users] BAD_TERMs in the C/C++ interface

ECLiPSe Constraint Logic Programming System

Brought to you by: andy_cheadle, hsakkout, jschimpf, kish_shen, snovello

eclipse-clp-users

[eclipse-clp-users] BAD_TERMs in the C/C++ interface

From: Victor M. <ma...@fh...> - 2019-12-13 18:59:59

Hi,

I'm making extensive use of the C++ interface, and the aggressive garbage 
collection in eclipse-clp is still giving me trouble. So as I understand it, 
any term goes BAD after it's been given to the eclipse engine once.

Now I already have infrastructure in place to deal with that, but in my 
control flow it's becoming a huge issue that I never really know whether a 
term has gone BAD or not. Is there any way to query whether a certain term 
(handle) has already been garbage collected?

Thanks in advance,
Victor

Re: [eclipse-clp-users] BAD_TERMs in the C/C++ interface

From: Joachim S. <jsc...@co...> - 2019-12-14 01:37:01

Hello Victor,

On 13/12/2019 18:46, Victor Mataré wrote:
> I'm making extensive use of the C++ interface, and the aggressive garbage
> collection in eclipse-clp is still giving me trouble. So as I understand it,
> any term goes BAD after it's been given to the eclipse engine once.
> 
> Now I already have infrastructure in place to deal with that, but in my
> control flow it's becoming a huge issue that I never really know whether a
> term has gone BAD or not. Is there any way to query whether a certain term
> (handle) has already been garbage collected?

I'm quite confident that the C++ interface has the functionality to solve your 
problem in a clean way, no strange workarounds should be necessary.

 From what you say, I suspect that you may not be using term references where 
they are needed.

To give some background: "terms" are always stored and managed by an ECLiPSe 
engine.  When an engine runs, it may (a) attempt to garbage-collect terms, and 
(b) move terms as a side effect of garbage collection.  For that reason, the 
engine must know about any references to terms that you are keeping in your own 
C++ data structures.

The way to do that is to use the EC_ref/EC_refs class: terms assigned to an 
EC_ref/EC_refs will not be garbage collected, and the references will be 
correctly relocated when the term is moved in memory.  In contrast, the EC_word 
class must be considered volatile: the content of an EC_word becomes invalid 
once you pass control to the engine (the reason EC_words exist at all is that 
they have less overhead, and are good enough to hold subterms temporarily while 
constructing or deconstructing complex terms).

That's my best guess without knowing your code.  It would also be helpful to 
know whether you observe this problem already in ECLiPSe 6.1, or whether it is 
new in 7.0.

Cheers,
Joachim

Re: [eclipse-clp-users] BAD_TERMs in the C/C++ interface

From: Victor M. <ma...@fh...> - 2019-12-15 04:14:50

On Samstag, 14. Dezember 2019 02:21:33 CET Joachim Schimpf wrote:
> Hello Victor,
> 
> On 13/12/2019 18:46, Victor Mataré wrote:
> > I'm making extensive use of the C++ interface, and the aggressive garbage
> > collection in eclipse-clp is still giving me trouble. So as I understand
> > it, any term goes BAD after it's been given to the eclipse engine once.
> > 
> > Now I already have infrastructure in place to deal with that, but in my
> > control flow it's becoming a huge issue that I never really know whether a
> > term has gone BAD or not. Is there any way to query whether a certain term
> > (handle) has already been garbage collected?
> 
> I'm quite confident that the C++ interface has the functionality to solve
> your problem in a clean way, no strange workarounds should be necessary.
> 
>  From what you say, I suspect that you may not be using term references
> where they are needed.

It's not that. I don't need persistent term references because I'm not pulling 
information out of the eclipse engine. In the part I'm talking about, I'm 
basically just using compile_term/1 to "compile" a C++ object structure into a 
bunch of prolog clauses. The problem is that some of those C++ objects 
represent variables that are referenced in other objects (i.e. they are not 
singleton variables).

So I can't simply re-initialize the variable terms whenever they're referenced 
because that would make them singletons. I've now resorted to initializing 
these variable terms once before building a term or clause that has variables, 
but that is error prone because I didn't expect to have that problem when I 
designed the control flow.

Now that I think of it, some clearer documentation might have helped me in the 
beginning. There is just this rather vague half-sentence in the Embedding and 
Interfacing Manual:

"terms do not survive the execution of ECLiPSe"

None of these words are really clear. What terms are really affected and how, 
what does "survive" mean and what does "execution" mean?

I think the Embedding Manual should have at least a short section about memory 
management that clearly states what happens. In fact, it should have a big fat 
warning along these lines:

==============================================================================
Every EC_word that has been put into post_goal() is invalid after EC_resume(). 
That includes all pieces of complex terms. Putting an invalid EC_word into 
another post_goal() leads to undefined behaviour, which might cause the 
eclipse engine to corrupt data, crash immediately, crash randomly, or to never 
crash.
==============================================================================

At least that is the behaviour I observed so far when I made mistakes with 
this. I assume that the undefined behaviour helps performance, but what would 
also be really useful for debugging these things is a switch that makes it 
crash immediately. Sure, I can design my client application so that it cannot 
happen, but I failed to do that initially because there was neither a fat 
warning nor a fail-fast behavior. It's really a huge pitfall if you're 
designing a larger application.

Hope I did get the problem across ;-)

Best regards & thanks for your helpful support,
Victor

PS: If you look at e.g. SWI Prolog, they expose a GC frame object which makes 
their (also non-trivial) memory management explicit, controllable and 
therefore obvious. Lacking that, I believe one does need very clear 
documentation.

Re: [eclipse-clp-users] BAD_TERMs in the C/C++ interface

From: Joachim S. <jsc...@co...> - 2019-12-16 14:07:37

On 15/12/2019 03:39, Victor Mataré wrote:
> On Samstag, 14. Dezember 2019 02:21:33 CET Joachim Schimpf wrote:
>> Hello Victor,
>>
>> On 13/12/2019 18:46, Victor Mataré wrote:
>>> I'm making extensive use of the C++ interface, and the aggressive garbage
>>> collection in eclipse-clp is still giving me trouble. So as I understand
>>> it, any term goes BAD after it's been given to the eclipse engine once.
>>>
>>> Now I already have infrastructure in place to deal with that, but in my
>>> control flow it's becoming a huge issue that I never really know whether a
>>> term has gone BAD or not. Is there any way to query whether a certain term
>>> (handle) has already been garbage collected?
>>
>> I'm quite confident that the C++ interface has the functionality to solve
>> your problem in a clean way, no strange workarounds should be necessary.
>>
>>   From what you say, I suspect that you may not be using term references
>> where they are needed.
> 
> It's not that. I don't need persistent term references because I'm not pulling
> information out of the eclipse engine.

You don't need them in principle, but you _do_ need them if you want to reuse 
subterms (such as variables) across invocations of EC_resume().  The rules are 
really quite simple:

  - All EC_words must be considered invalid after an EC_resume()

  - EC_Refs remain valid after EC_resume() (although the term they refer to may 
have changed)

In C++ parlance, you might think of EC_words as raw pointers, and of EC_refs as 
smart pointers.


In your case, you seem to repeatedly construct terms and invoke EC_resume(). 
Because you don't _need_ to share variables between the constructed terms, you 
have two choices:

  - you can construct fresh terms each time, and assign them either to your old 
invalid EC_words, or to new EC_words.  [I think this is what you are doing now]

  - you can choose to reuse old subterms, but then you must assign them to 
EC_refs to carry them across invocations of EC_resume()


> In the part I'm talking about, I'm
> basically just using compile_term/1 to "compile" a C++ object structure into a
> bunch of prolog clauses. The problem is that some of those C++ objects
> represent variables that are referenced in other objects (i.e. they are not
> singleton variables).
> 
> So I can't simply re-initialize the variable terms whenever they're referenced
> because that would make them singletons. I've now resorted to initializing
> these variable terms once before building a term or clause that has variables,
> but that is error prone because I didn't expect to have that problem when I
> designed the control flow.
> 
> Now that I think of it, some clearer documentation might have helped me in the
> beginning. There is just this rather vague half-sentence in the Embedding and
> Interfacing Manual:
> 
> "terms do not survive the execution of ECLiPSe"
> 
> None of these words are really clear. What terms are really affected and how,
> what does "survive" mean and what does "execution" mean?

"Term" has its standard meaning in Prolog/ECLiPSe, namely the universal data 
type (everything is a term: constants, structures and lists, even variables). 
Everything you can assign to an EC_word is a term.

"Execution of ECLiPSe" means passing control to ECLiPSe, either by calling 
EC_resume(), or (in case your code was called from ECLiPSe as an external) by 
returning.

"not survive" means your EC_words may contain nonsense.

The meanings were probably crystal clear to the implementer who once wrote them, 
but less so from a user's perspective ;)


> 
> I think the Embedding Manual should have at least a short section about memory
> management that clearly states what happens. In fact, it should have a big fat
> warning along these lines:
> 
> ==============================================================================
> Every EC_word that has been put into post_goal() is invalid after EC_resume().
> That includes all pieces of complex terms. Putting an invalid EC_word into
> another post_goal() leads to undefined behaviour, which might cause the
> eclipse engine to corrupt data, crash immediately, crash randomly, or to never
> crash.
> ==============================================================================

post_goal() has nothing to do with it, only EC_resume() is important.

The C++ interface is a thin layer on top of the C interface.  It is a low-level 
interface involving pointers etc, and thus of course capable of causing crashes. 
  One could design a safer, less direct, higher-level interface (like the Java 
or TCL one), but that's not what we have here.


> ...
> Hope I did get the problem across ;-)

Sure, I hope I could help a little.


-- Joachim