ALTER TABLE lang_constants DROP INDEX cons_name , ADD UNIQUE INDEX cons_name (constant_name ASC);
This is an issue with regards to the database representation of information, and primarily concerns when people add custom/additional translation.
Adding a uniqueness constraint to the database would provide an additional check to make sure that duplicate constant's don't accidentally get defined when definitions are added using the interface in "Administration > Other > Language".
If duplicates entries exist, then the results of the query in translations.inc.php
$sql="SELECT * FROM lang_definitions JOIN lang_constants ON " .
"lang_definitions.cons_id = lang_constants.cons_id WHERE " .
"lang_id=? AND constant_name = ? LIMIT 1";
Could be unpredictable.
However with regards to the spreadsheet, when just the spreadsheet is loaded, all of the constants "should" be unique. However, if for some reason the same constant had two rows in the spreadsheet (error with the perl script somewhere, manual edits that were done incorrectly...) This extra constraint would lead to an error being generated when the spreadsheet was loaded, rather than the spreadsheet being loaded and translations potentially being unpredictable.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I think you might be able to with a couple of messy queries using subqueries. One to update lang_definitions.cons_id to point to just one of the duplicates, and another to remove the duplicated constants. I'd have to play with it for a bit.
Then you might be left with some duplicated (lang_id, cons_id) pairs in the lang_definitions table, for which the same problem of possible duplicates exists.
Except there may also be conflict resolution needed for when multiple lang_definitions exist that might map to the same constant_name with different definitions. Which definition should win?
I am certain that appropriate queries are possible, but to make them "one-size fits all" would be difficult.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It doesn't really matter which one wins, since the translation lookup will not have a predictable result anyway. Throwing either one away will be better than doing nothing.
Yes it's kinda messy but I suspect do-able. Main thing is to make sure upgrading is robust.
What I understand is that there might be included duplicates in the spreadsheet. As far as I made translations, the only duplicates in a rows are the ones with a " " space at the end of a Constant and so also in Definitions. But translations wise for the used scripts or I can't be of any help. For manual added translations, as long as a sentence/constant is translated there is no need for uniqueness. The translated words represent the Constant and the constant is used for interactions and not the translations. Correct me if I am wrong.
If by manual you mean to xl(..) translation characters as a part of the software this extra constant will be included or not included if needed due to not yet included in the Constant column.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You having said this, I am out! Please let's hope other developers will jump in to support the idea of uniqueness. I did translate some different constants but were definitely in luck not to encounter problems due to uniqueness.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Pieter, thanks for your interest and desire to help.
The basic issue in my mind is that the table should have been enforcing uniqueness all along. If all that people are using is just the spreadsheet, this is a non-issue, but if someone is trying to use definitions from additional sources, there may be a problem.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The only manual problem that exists than is your additional translations through Administration => Others -> Translations etc....
But these new translations have to be done in the software ...php ...txt(?) ....sql files, etc. and after the xl(...) is added give it the translation definition. This is like creating a personal local translation-file.
BTW for translation there is used: xl(..) xl t(...) xl a(...) I did not check for completeness, but there seemed to be a reason for this different translation options.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
<script LANGUAGE="JavaScript"> alert('<?php echo addslashes($db_data)); ?>'); // escape potential quotations marks alert('<?php echo xls('Quotes and Apostrophes in translations may cause problems'); ?> // If the return value for a translation contains quotes or apostrophes using just xl // would result in white screen of death. Prefer xls, but addslashes(xl('my string')) // will prevent problems. </script> - See more at: http://www.open-emr.org/wiki/index.php/Codebase_Security#SQL-Injection_and_Cross-Scripting_Prevention
White screen of death... never had reconned this was an option, but might explain some forums on white screen.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm guessing that the perl scripts work to maintain uniqueness. Changing the index to unique would provide additional "protection" when manual edits are made
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
For testing:
-Ensure a new install works.
-Will want to test to ensure funny things(ie. especially a crash) don't happen when bringing stuff in from lang_custom if for some reason there are two items with same constant lingering in there.
Indeed this is odd. How many Chars is mediumtext? It could explain this difference in layout of the screens using a translation. Sometimes it is very hard to find a translation with equal or less char than the original text for a translation, but in can't be that a translation is much longer than the original.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
To show an example, here is a entry that is 425 characters long:
"Alternatively, you may use the search page to upload an electronic remittance (X12 835) file that you have obtained from your payer or clearinghouse. You can do this by clicking the Browse button and selecting the file to upload, and then clicking Search to perform the upload and display the corresponding invoices. In this case the other parameters mentioned above do not apply and will be ignored. Uploading saves the file but does not yet process its contents -- that is done separately as described below."
Note that it won't break for users that use the standard translations or work in the standard translation spreadsheet and the first 255 characters are enough to make it unique. But will break for users that do local translations since they will not see the entire constant there. Perhaps we should increase the size of the VARCHAR here (agree should not go to a MEDIUMTEXT since then lose index/performance).
Is there a reason why we don't have a uniqueness index on the constant_name in lang_constants?
It seems like the behavior of xl would be inconsistent if more than one entry exists for the same constant_name.
I have been involved for a long time in Translations for OpenEMR.
Can you give a real world example for uniqueness? I do not see in translations spreadsheet and connections-software related what need to be changed.
I'm proposing something akin to
This is an issue with regards to the database representation of information, and primarily concerns when people add custom/additional translation.
Adding a uniqueness constraint to the database would provide an additional check to make sure that duplicate constant's don't accidentally get defined when definitions are added using the interface in "Administration > Other > Language".
If duplicates entries exist, then the results of the query in translations.inc.php
Could be unpredictable.
However with regards to the spreadsheet, when just the spreadsheet is loaded, all of the constants "should" be unique. However, if for some reason the same constant had two rows in the spreadsheet (error with the perl script somewhere, manual edits that were done incorrectly...) This extra constraint would lead to an error being generated when the spreadsheet was loaded, rather than the spreadsheet being loaded and translations potentially being unpredictable.
Be sure to handle the case of upgrading a database where cons_name has non-unique values for some reason. You don't want the upgrade script to fail.
Rod
http://www.sunsetsystems.com/
Except if there are non-unique constants, then there isn't really a good "automatic" way to resolve conflicts.
I think you might be able to with a couple of messy queries using subqueries. One to update lang_definitions.cons_id to point to just one of the duplicates, and another to remove the duplicated constants. I'd have to play with it for a bit.
Then you might be left with some duplicated (lang_id, cons_id) pairs in the lang_definitions table, for which the same problem of possible duplicates exists.
Nothing is easy. :)
Rod
http://www.sunsetsystems.com/
Except there may also be conflict resolution needed for when multiple lang_definitions exist that might map to the same constant_name with different definitions. Which definition should win?
I am certain that appropriate queries are possible, but to make them "one-size fits all" would be difficult.
It doesn't really matter which one wins, since the translation lookup will not have a predictable result anyway. Throwing either one away will be better than doing nothing.
Yes it's kinda messy but I suspect do-able. Main thing is to make sure upgrading is robust.
Rod
http://www.sunsetsystems.com/
What I understand is that there might be included duplicates in the spreadsheet. As far as I made translations, the only duplicates in a rows are the ones with a " " space at the end of a Constant and so also in Definitions. But translations wise for the used scripts or I can't be of any help. For manual added translations, as long as a sentence/constant is translated there is no need for uniqueness. The translated words represent the Constant and the constant is used for interactions and not the translations. Correct me if I am wrong.
If by manual you mean to xl(..) translation characters as a part of the software this extra constant will be included or not included if needed due to not yet included in the Constant column.
You are wrong.
Both the constant and the definition are used in the lookup process.
You having said this, I am out! Please let's hope other developers will jump in to support the idea of uniqueness. I did translate some different constants but were definitely in luck not to encounter problems due to uniqueness.
Pieter, thanks for your interest and desire to help.
The basic issue in my mind is that the table should have been enforcing uniqueness all along. If all that people are using is just the spreadsheet, this is a non-issue, but if someone is trying to use definitions from additional sources, there may be a problem.
The only manual problem that exists than is your additional translations through Administration => Others -> Translations etc....
But these new translations have to be done in the software ...php ...txt(?) ....sql files, etc. and after the xl(...) is added give it the translation definition. This is like creating a personal local translation-file.
BTW for translation there is used: xl(..) xl t(...) xl a(...) I did not check for completeness, but there seemed to be a reason for this different translation options.
xlt and xla are wrappers for the base xl function.
An explanation of the purpose/function is available on the wiki.
http://www.open-emr.org/wiki/index.php/Codebase_Security#SQL-Injection_and_Cross-Scripting_Prevention
and also
https://github.com/openemr/openemr/blob/master/library/htmlspecialchars.inc.php#L103
<script LANGUAGE="JavaScript"> alert('<?php echo addslashes($db_data)); ?>'); // escape potential quotations marks alert('<?php echo xls('Quotes and Apostrophes in translations may cause problems'); ?> // If the return value for a translation contains quotes or apostrophes using just xl // would result in white screen of death. Prefer xls, but addslashes(xl('my string')) // will prevent problems. </script> - See more at: http://www.open-emr.org/wiki/index.php/Codebase_Security#SQL-Injection_and_Cross-Scripting_Prevention
White screen of death... never had reconned this was an option, but might explain some forums on white screen.
I can't think of one. Uniqueness should certainly be enforced one way or another.
Rod
http://www.sunsetsystems.com/
I'm guessing that the perl scripts work to maintain uniqueness. Changing the index to unique would provide additional "protection" when manual edits are made
Hi,
This is a "unique" situation where you may not need to spend much time worrying about upgrading the table. Could just place it in the database.sql file and then when users upgrade to the next official translation set, can fix it here:
https://github.com/openemr/openemr/blob/master/contrib/util/language_translations/buildLanguageDatabase.pl#L477
For testing:
-Ensure a new install works.
-Will want to test to ensure funny things(ie. especially a crash) don't happen when bringing stuff in from lang_custom if for some reason there are two items with same constant lingering in there.
Also, the perl scripts that build the translation spreadsheet are set up to not allow identical constants:
https://github.com/openemr/openemr/blob/master/contrib/util/language_translations/collectConstants.pl
https://github.com/openemr/openemr/blob/master/contrib/util/language_translations/combineConstantsSpreadsheet.pl
The perl script that builds the database tables from the spreadsheet does not explicitly ensure that(but is uses the spreadsheet built from above scripts and also does confirmations of constants to the official partner constant list):
https://github.com/openemr/openemr/blob/master/contrib/util/language_translations/buildLanguageDatabase.pl
-brady
OpenEMR
A very important item I forget to mention here; the change to database.sql is actually going to do nothing(it's good to do for clarity sake), since when it brings in the translation set on install, tables will get replaced by:
https://github.com/openemr/openemr/blob/master/contrib/util/language_translations/currentLanguage_utf8.sql
So, for testing, remember to change it there.
-brady
Last edit: Brady Miller 2014-04-17
Also,
While looking at the tables, doesn't it seem odd that constant name is limited to 255 while the definition is a mediumtext?
-brady
VARCHAR(255) or even a longer length varchar will have better indexing/lookup performance than MEDIUMTEXT.
Validation that no existing constants are longer than 255 would probably be a good thing to do.
Indeed this is odd. How many Chars is mediumtext? It could explain this difference in layout of the screens using a translation. Sometimes it is very hard to find a translation with equal or less char than the original text for a translation, but in can't be that a translation is much longer than the original.
To show an example, here is a entry that is 425 characters long:
"Alternatively, you may use the search page to upload an electronic remittance (X12 835) file that you have obtained from your payer or clearinghouse. You can do this by clicking the Browse button and selecting the file to upload, and then clicking Search to perform the upload and display the corresponding invoices. In this case the other parameters mentioned above do not apply and will be ignored. Uploading saves the file but does not yet process its contents -- that is done separately as described below."
Note that it won't break for users that use the standard translations or work in the standard translation spreadsheet and the first 255 characters are enough to make it unique. But will break for users that do local translations since they will not see the entire constant there. Perhaps we should increase the size of the VARCHAR here (agree should not go to a MEDIUMTEXT since then lose index/performance).
-brady
OpenEMR