Hi
I represent a very small company into transcription We want to train acoustic models with the TEDLIUM corpus and other datasets since we cannot afford the paid-for corpora. Since TEDLIUM and other such datasets have a CC licence with noncommercial and nonderivative clauses, I am unsure as to whether they can be used for commercial purposes. As the actual data is not used for commercial purposes but only the acoustic models, can we say that it does not breach the license terms. Also, since the acoustic model is just a collection of statistical data, can it be considered not a derivative work? What is your take on this?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi
I represent a very small company into transcription We want to train acoustic models with the TEDLIUM corpus and other datasets since we cannot afford the paid-for corpora. Since TEDLIUM and other such datasets have a CC licence with noncommercial and nonderivative clauses, I am unsure as to whether they can be used for commercial purposes. As the actual data is not used for commercial purposes but only the acoustic models, can we say that it does not breach the license terms. Also, since the acoustic model is just a collection of statistical data, can it be considered not a derivative work? What is your take on this?
This is not a right place to ask such questions. If you are in doubt ask a lawyer.