Re: [Jython-dev] Buffer protocol - direct vs. JVM ByteBuffers

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

<html><head></head><body><div style="font-family: Verdana;font-size: 12.0px;"><div>
<div>
<div>&gt; The object can hand out a second reference to the same PyBuffer. (It&#39;s not required to, but the built-ins do.)</div>

<div>&nbsp;</div>

<div>Good point. Now that you mention it I think it should be possible to identify the actual backend even if distinct PyBuffer objects sharing the same backend (array or ByteBuffer) were exposed. Under this view I agree that the single-threaded case is well solvable (and should be solved) this way i.e. with potential copy-back.</div>

<div>&nbsp;</div>

<div><br/>
Thinking of multi-threaded PyBuffer use:</div>

<div>&nbsp;</div>

<div>I wondered how CPython deals with this (e.g. when&nbsp; using the threading-module) and tried to find some (official) statement from CPython-world about sharing a buffer between multiple threads, but no luck so far. If anyone has resources or an example about such a use-case, I&#39;d appreciate a pointer.</div>

<div>&nbsp;</div>

<div>So I suppose the user is fully responsible to synchronize his buffer transactions.</div>

<div>&nbsp;</div>

<div>Given that a multithreaded BufferProtocol usecase is much more natural in Jython I&#39;d propose we should define a recommended standard process and maybe even API for typical tasks in this setting, e.g. locking a buffer(-section) for write access, or for an atomic-like read/process/write-back transaction.<br/>
Having such a standard would yield lock-compatibility between distinct frameworks sharing a buffer-exposing object.<br/>
Also, behavior in multithreaded case would become much better predictable/controllable from JyNI perspective. Last but not least it would help to avoid errors (deadlocks etc) in this difficult area; consider that Python users are usually not much experienced with multithread stuff.</div>
</div>

<div>&nbsp;</div>

<div>&nbsp;</div>

<div>-Stefan</div>

<div>&nbsp;</div>

<div>&nbsp;</div>

<div>&nbsp;
<div name="quote" style="margin:10px 5px 5px 10px; padding: 10px 0 10px 10px; border-left:2px solid #C3D9E5; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
<div style="margin:0 0 10px 0;"><b>Gesendet:</b>&nbsp;Dienstag, 17. Mai 2016 um 21:08 Uhr<br/>
<b>Von:</b>&nbsp;&quot;Jeff Allen&quot; &lt;ja...@fa...&gt;<br/>
<b>An:</b>&nbsp;&quot;Stefan Richthofer&quot; &lt;Ste...@gm...&gt;<br/>
<b>Cc:</b>&nbsp;&quot;Jython Developers&quot; &lt;jyt...@li...&gt;<br/>
<b>Betreff:</b>&nbsp;Re: [Jython-dev] Buffer protocol - direct vs. JVM ByteBuffers</div>

<div name="quoted-content">
<div style="background-color: rgb(255,255,255);">On 17/05/2016 01:52, Stefan Richthofer wrote:
<blockquote>
<div style="font-family: Verdana;font-size: 12.0px;">
<div>&nbsp;
<div>The thread could create two (or more) PyBuffer-views of the same object and hand both to various functions that read and write on them without calling release (and thus trigger copy-back) inbetween. The extension would expect if view &#39;A&#39; was modified, view &#39;B&#39; already reflects this modification when passed to another function.</div>
</div>
</div>
</blockquote>
The object can hand out a second reference to the same PyBuffer. (It&#39;s not required to, but the built-ins do.)<br/>
&nbsp;
<blockquote>
<div style="font-family: Verdana;font-size: 12.0px;">
<div>
<div>&gt; The only way I can imagine an object with Java fields as storage giving you a direct ByteBuffer on demand is to allocate one and copy its state there. [...] effectively a change of implementation on the fly.</div>

<div>&nbsp;</div>

<div>This is what I have in mind. Changing the backend on the fly shouldn&#39;t be much more expensive than creating a copy, but would then save the cost of copy-back and entire copy cost for future calls. Using the bulk-set method of ByteBuffer this should be easy and efficient to do (bulk-get to convert the other direction). The only infeasible situation would be if an AS_DIRECT_NIO buffer was requested while an array-backed buffer is exported and not yet released or vise versa.</div>
</div>
</div>
</blockquote>
Aye, there&#39;s the rub, if at least one is writable.<br/>
&nbsp;
<blockquote>
<div style="font-family: Verdana;font-size: 12.0px;">
<div>
<div>In this case the request should just fail. I suppose for sake of debugging we should add a verbose mode/flag that makes Jython print out (or append it to the error message in&nbsp;bufferErrorFromSyndrome) the exact reason why some buffer-request failed so a user is able to identify the design flaw.</div>
</div>
</div>
</blockquote>
Exceptions should always be that clear. However, it&#39;s not really a design flaw: hold a memoryview, and call a numpy function on the array: it&#39;s hardly faulty logic.&nbsp;

<blockquote>
<div style="font-family: Verdana;font-size: 12.0px;">
<div>
<div>&gt; This seems to bring us full circle to the behaviour of Get&lt;PrimitiveType&gt;ArrayElements, except that I suppose the object knows it has done it and can handle intervening access via the Java API</div>

<div>&nbsp;</div>

<div>Since actual views to the same memory would be shared with (native) extensions unlike Get&lt;PrimitiveType&gt;ArrayElements (in copy-case/no array pinning) this would not break the case described above, where multiple PyBuffer views to the same object are in the game.</div>

<div>&nbsp;</div>

<div>I&#39;m not sure what you mean by &quot;full circle to the behaviour of Get&lt;PrimitiveType&gt;ArrayElements&quot;,</div>
</div>
</div>
</blockquote>
I meant in the sense that we have made a copy especially for C and may have to copy it back. You&#39;re correct that there are still a number of delicate problems to solve during the period the implementation has changed.<br/>
<br/>
Jeff
<blockquote>
<div style="font-family: Verdana;font-size: 12.0px;">
<div>
<div>
<div style="margin: 10.0px 5.0px 5.0px 10.0px;padding: 10.0px 0 10.0px 10.0px;border-left: 2.0px solid rgb(195,217,229);">
<div style="margin: 0 0 10.0px 0;"><b>Gesendet:</b>&nbsp;Montag, 16. Mai 2016 um 22:04 Uhr<br/>
<b>Von:</b>&nbsp;&quot;Jeff Allen&quot; <a class="moz-txt-link-rfc2396E" href="ja...@fa..." target="_parent">&lt;ja...@fa...&gt;</a><br/>
<b>An:</b>&nbsp;&quot;Stefan Richthofer&quot; <a class="moz-txt-link-rfc2396E" href="Ste...@gm..." target="_parent">&lt;Ste...@gm...&gt;</a><br/>
<b>Cc:</b>&nbsp;&quot;Jython Developers&quot; <a class="moz-txt-link-rfc2396E" href="jyt...@li..." target="_parent">&lt;jyt...@li...&gt;</a><br/>
<b>Betreff:</b>&nbsp;Re: [Jython-dev] Buffer protocol - direct vs. JVM ByteBuffers</div>

<div>
<div style="background-color: rgb(255,255,255);">
<p>Thanks for that. So, the (possible) copy-back semantics of <tt><span class="cEmphasis">Get&lt;PrimitiveType&gt;ArrayElements</span></tt> effectively make a direct buffer copy of the contents. If the C-code runs in the thread of the Java execution that calls it (or equivalently it is suspended) then to first order the copy causes no problem. But it is not difficult to cook up a scenario where another thread or call-back into Java sees a different state from C.</p>

<p>Alternatively, one uses the &quot;critical&quot; methods, and suffers restrictions that are, I expect, unenforcible on arbitrary CPython extension modules, such as being short and not yielding the CPU.</p>

<p>I&#39;m reminded of the relationship in CPython between C-code and interpreted code, where the GIL must be held, proving all other threads are &quot;restubg&quot; between instructions, and a context switch is only allowed when surrounded by the appropriate magical incantations. I think the Universe is trying to tell us something.</p>

<p>The problem I see with the <tt>DIRECT_NIO</tt> flag is that one cannot expect to choose, at the point of getting a PyBuffer, whether that buffer should be direct or heap. The data that hold the state of an object have a certain implementation in Java, and so the buffer will be a heap buffer. Or one can imagine a <tt>PyObject</tt> whose state is always in a direct <tt>ByteBuffer</tt> (representing an image mapped from disk, say) and then the <tt>PyBuffer</tt> would always be direct. Just possibly objects whose main purpose is to be native-friendly would have that implementation. Just possibly, this is a thing you get to choose when the object is constructed.</p>

<p>The only way I can imagine an object with Java fields as storage giving you a direct <tt>ByteBuffer</tt> on demand is to allocate one and copy its state there. This seems to bring us full circle to the behaviour of <tt><span class="cEmphasis">Get&lt;PrimitiveType&gt;ArrayElements</span></tt> , except that I suppose the object knows it has done it and can handle intervening access via the Java API ... effectively a change of implementation on the fly.</p>

<pre class="moz-signature">Jeff Allen</pre>

<div class="moz-cite-prefix">On 15/05/2016 16:39, Stefan Richthofer wrote:</div>

<blockquote>
<div style="font-family: Verdana;font-size: 12.0px;">
<div>
<div>Just an add-on to my recent post:</div>

<div>&nbsp;</div>

<div>&gt;It may depend on <tt>storage.hasArray()</tt>, but <tt>storage.isDirect()</tt> seems to make no difference.</div>

<div>&nbsp;</div>

<div>I think it is likely the JVM does not offer a backing array, if the buffer is created as direct (i.e. these flags likely exclude each other), because this would imply array pinning and all the restrictions coming with it. I didn&#39;t test it though, but anyway we cannot rely on the one or other behavior, as doc explicitly does not guarantee a backing array for direct buffers, saying this is &quot;implementation specific&quot;.</div>

<div>&nbsp;</div>

<div>&nbsp;
<div style="margin: 10.0px 5.0px 5.0px 10.0px;padding: 10.0px 0 10.0px 10.0px;border-left: 2.0px solid rgb(195,217,229);">
<div style="margin: 0 0 10.0px 0;"><b>Gesendet:</b>&nbsp;Sonntag, 15. Mai 2016 um 11:57 Uhr<br/>
<b>Von:</b>&nbsp;&quot;Jeff Allen&quot; <a class="moz-txt-link-rfc2396E" href="ja...@fa..." target="_parent">&lt;ja...@fa...&gt;</a><br/>
<b>An:</b>&nbsp;&quot;Stefan Richthofer&quot; <a class="moz-txt-link-rfc2396E" href="Ste...@gm..." target="_parent">&lt;Ste...@gm...&gt;</a>, &quot;Jython Developers&quot; <a class="moz-txt-link-rfc2396E" href="jyt...@li..." target="_parent">&lt;jyt...@li...&gt;</a><br/>
<b>Betreff:</b>&nbsp;[Jython-dev] Buffer protocol - direct vs. JVM ByteBuffers</div>

<div>
<div style="background-color: rgb(255,255,255);">
<p>Stefan:</p>

<p><a class="moz-txt-link-freetext" href="https://github.com/jythontools/jython/pull/39" target="_blank">https://github.com/jythontools/jython/pull/39</a></p>

<p>What difference does it make in your use case whether a NIO ByteBuffer is direct or non-direct? I can see why a client might want to know which it had been given, but not why it might want an exception raised in one or other case.</p>

<p>Nothing I&#39;m doing seems to depend on what kind of memory the exporting object has, therefore on the implementation type of <tt>ByteBuffer storage</tt>. It may depend on <tt>storage.hasArray()</tt>, but <tt>storage.isDirect()</tt> seems to make no difference.</p>
Jeff

<pre class="moz-signature">-- 
Jeff Allen</pre>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</div></div></body></html>