Barry Scott <barry@ba...> - 2009-02-21 12:20
> On 18 Feb 2009, at 19:32, William Newbery wrote:
>
> >
> >
> > > On 10 Feb 2009, at 17:15, William Newbery wrote:
> > >
> > > > I want to start using uncicode strings.
> > > >
> > > > Looking at Py::string I see the method as_unicodestring, however
> > > > this returns a std::basic_string<Py_UNICODE> and provides no
> > option
> > > > of encodeing...
> > > >
> > > > Another method is the encode method, this lets me provide the
> > > > encoding but just returns another Py::String...
> > > >
> > > >
> > > > What exactly do I need to do to go between a python unicode string
> > > > and a std::wstring (where sizeof(wchar_t)==2)in UTF-16 encoding?
> > >
> > > I take it that Py_UNICODE is 4 on your platform.
> > >
> > > You could try encode('utf-16') to get a Py::String that is in
> > utf-16.
> > > Then use as_std_string() to get a std:string, use c_str() to get a
> > > pointer to the contents and cast it to wchar_t.
> > >
> > > Adding a as_std_wstring would be a reasonable thing to add to PyCXX
> > > to make this convenient.
> > > as_std_wstring could look inside the Py_Object and avoid a number of
> > > conversion steps.
> > >
> > > Barry
> >
> > The problem is thats basicly a hack and results in several bugs
> > since your stuffing a double byte string into a std::string.
> > -Any utf-16 charecter that has 00 for the first byte will break it.
> > I dont know if there are any such charecters in little endian
> > encoding, but for big endian quite alot will
>
> >
> > -std::string only terminates with a single \0 but utf-16 needs \0\0.
> > This means casting the c_str() to a wchar_t wont work because the
> > charecter after the first \0 it is outside the string, and thus
> > could be anything. So you end up having to make yet another copy by
> > allocating a block of which is size()+2 and making sure both of the
> > last two bytes are 0...
>
> std::string does not use NUL to terminate strings. Use c_str() to get
> to the data and use size() to find out the length().
>
> >
> >
> > "Adding a as_std_wstring would be a reasonable thing to add to PyCXX
> > to make this convenient." wstring could be say ucs-2 or someother
> > wide format as easily as utf_16, and then people may also want
> > ucs-4, etc.
> >
> > Something that can support all the diffrent formats would be good.
> >
> > eg mayby:
> > int Py::String::c_encode(const char *format, char *buffer, int
> > buffersize);
> > where if *buffer is null it just returns the number of bytes needed
> > to encode in the given format. The user can then allocate the needed
> > buffer and get the string encoded correctly in whatever format,
> > ending with something that is safe to cast to wchar_t or unsigned
> > int or whatever is correct for that format. buffersize should again
> > be in bytes to avoid confusion.
>
> Could you create a patch for this?
>
> Barry

From my limited knowleged of the c-api I was able to put this together. Theres a few things I would like to do better but am not aware howto, namly:
-A way to calc the required buffer without actauly encodeing a bytes object
-A way to encode directly into a buffer rather than a python created bytes object which then must be copied

Also I'm not sure how your checking for and throwing exceptions that origenate from python code so Ive left that out.

        int as_c_string(char *buffer, Py_ssize_t bufferBytes, const char *encoding, const char *error="strict")const
        {
            unsigned nullCnt=4;//worst case
            PyObject* bytes = PyUnicode_AsEncodedString(ptr(), encoding, errors);//newref
           
            Py_ssize_t sourceBytes;
            const char *bytesData;
            PyBytes_AsStringAndSize(bytes, (char**)&bytesData, &sourceBytes);
           
            if(!buffer)
            {
                Py_DECREF(bytes);
                return sourceBytes+nullCnt;
            }
           
            if(sourceBytes+nullCnt > bufferBytes)
                throw RuntimeError("buffer to small for string.");
            memcpy((void*)buffer, (const void*)bytesData, sourceBytes);
            memset((void*)buffer+sourceBytes, 0, nullCnt);//null terminate
            Py_DECREF(bytes);
        }


Share your photos with Windows Live Photos - Free Try it Now!

Windows Live Hotmail just got better. Find out more!