Menu

diacritics in tags

jorgab
2019-08-09
2019-08-10
  • jorgab

    jorgab - 2019-08-09

    Hi all,

    I'm writing a script that imports CSV content to the tags of musicfiles in a folder, and then write these tags to filenames. The CSV is comma delimited and UTF-8.
    Everything is working well exept that all diacritics are transformed into jibberish inside the tags, and therefore in the filename as well. For example: é (result: A©) or œ (result: A“u). This happens in the GUI as well. I've tried editing filenameformat and string replacement (however i'd like not to replace them) settings but no luck. The job for Kid3 in the script is this:
    kid3-cli -c "cd 'customfolder\New Tracks'" -c "import 'C:\customfolder\csvInput.csv' 'CSV quotedcomma' 2" -c filenameformat -c save
    kid3-cli -c "cd 'customfolder\New Tracks'" -c "select all" -c fromtag -c save

    These are the import settings for 'quotedcomma':
    "?%{title}([^\r\n\t"])"?,"?%{artist}([^\r\n\t"])"?,"?%{album}([^\r\n\t"])"?

    I hope someone can point out what I'm doing wrong.
    Thanks for any help!

    Jor

     
  • Urs Fleisch

    Urs Fleisch - 2019-08-10

    The option "Text Encoding (Export, Playlist)" is only used when writing playlists and the export functions (including LRC files). For "Import" the default system encoding is used which is based on the system locale on Windows. So I assume that you are on Windows, the CSV file has UTF-8 encoding, but your system encoding is different. I will check this on Windows later, but for the moment I think that you could change the encoding of your files to the system encoding, change the system encoding to UTF-8 or try to preped your files with an UTF-8 BOM (this is what is needed to make MS Excel recognize UTF-8 encoding in CSV files).

     
  • jorgab

    jorgab - 2019-08-10

    Thank you Urs! That was the issue indeed. I ticked the box 'Beta:Use Unicode UTF-8 for worldwide language support' at 'language for non-Unicode programs', Region settings in Windows 10 language settings.

    Works perfect!
    Thanks again

     
  • Urs Fleisch

    Urs Fleisch - 2019-08-10

    I just checked it on a Windows 10 system. Although this should be a modern operating system, it looks like the default system encoding is still "Windows-1252" and not UTF-8. I tried an import with several encodings. With UTF-8, I experienced the same issues as you described. The two encodings which were working on Windows with default encoding settings are "Windows-1252" and UTF-8 with a BOM. I prepended the bom using sed -i '1s/^/\xef\xbb\xbf/' file.csv, on Windows you could use Notepad++.