Obviously crashing is not good. This file is not a CSV file, it is of the type "merged delimiters", where the delimiter is a space. There is an option within Ravel to handle that file. However, even worse , it is UTF-16 encoded, so Ravel will never be able to load such files. You have to convert it using XL or other spreadsheet programs that can handle UTF-16 and convert it to UTF-8. In time, we might add UTF-16 handling (big job, though), but for now, we do need to detect the format, and display an error message.
In time, people should stop using UTF-16 too. It is an abomination, particularly for latin-based alphabets!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Found a solution to prevent crashes or infinite loops on bad input data (trying to parse a UTF-16 file as UTF-8 is effectively bad data). And also added UTF-16/32 BOM detection and throw for now. Will add UTF-16/32 support as a separate feature ticket for consideration later.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Data files.
Movies of crashes
CrashOnFileImport20240818A.mp4
https://drive.google.com/file/d/1WcJMo_R4ntjVn7CKRwpJrP117NPaY1Hx/view?usp=drive_web
CrashOnFileImport20240818B.mp4
https://drive.google.com/file/d/1HBVbIryvIZzwt0Sc7NOy6XpS5Newlr6c/view?usp=drive_web
Obviously crashing is not good. This file is not a CSV file, it is of the type "merged delimiters", where the delimiter is a space. There is an option within Ravel to handle that file. However, even worse , it is UTF-16 encoded, so Ravel will never be able to load such files. You have to convert it using XL or other spreadsheet programs that can handle UTF-16 and convert it to UTF-8. In time, we might add UTF-16 handling (big job, though), but for now, we do need to detect the format, and display an error message.
In time, people should stop using UTF-16 too. It is an abomination, particularly for latin-based alphabets!
Found a solution to prevent crashes or infinite loops on bad input data (trying to parse a UTF-16 file as UTF-8 is effectively bad data). And also added UTF-16/32 BOM detection and throw for now. Will add UTF-16/32 support as a separate feature ticket for consideration later.