I would think that parts of the proposed program are possible - at the most simple level you can perform a lookup in a database of known bad words for all words in the subtitles.
However, have you thought about movies with no subtitles?
Or, more importantly, how are you going to cover up the sounds when only parsing a text file? The only thing that would be suitable would be to write a plugin of some sort to the media player, but then you lose generaliity and it also doesn't seem to be what you're doing in the first place. A possible way to do this would be to have the program play some "beep.wav" when it thinks a bad word will be uttered, but the original words will still be audible and is at best an unprofessional hack that is not likely to amaze anyone.
Lastly, any algorithm used to calculate the time of words in the middle of a subtitle would need to be flawless, with error rates maybe in the 0.0001% range. If you can't develop an algorithm that can be that accurate before you start the project, I would reconsider your project.
If you have suitable ways to manage all these issues however, I may help you on this project if it's within the scope of spare time I have.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have thought about this. I think the easiest way would be to simply mute the sound briefly. There are languages, such as "Autoit" that work via wine on Linux, or natively on Windows. It would be possible to use perl, or any language to generate a simple file that would then tell the program when to mute. The good part about Autoit, is that it can "know" when a program is paused or playing. That could be done simply by checking for windows opened with "paused" in the title.
The front-end as I mentioned could really be done in any language. Simply direct the user to: http://subscene.com/filmsearch.aspx?q=uncle+buck and replace the "uncle+buck" with the title of the movie they want. The other option is making a native subtitle extracter, which would be extra work / hastle.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Seeing how generally the subtitles do not put large sentences it would not be hard to mute the word. Here there are 5 words spoken in a period of 3 seconds. giving each word about 0.6 seconds. That would mean the mute would have to occur around 00:01:39,284
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2011-01-12
This would be fairly trivial to implement in a Gstreamer plugin/pipeline (C, Vala, Python or others I'm sure). For example the playbin2 element is an "all-in-one" media player plugin that supports lots of formats. It has a "text-sink" property which you can use to get the text into your code and an "audio-sink" property that you can use to control the audio (mute/beep/whatever). Gstreamer also works on at least Linux and Windows. You could probably do the whole thing in a few hundred lines of Python code. My guess would be a few hundred lines of Python code to write this using your method you described.
If you make this as a plugin for Gstreamer, you will have the advantage of it working with any players that support support Gstreamer as a backend (ex. Totem, others) and you won't have to write your own player if you don't want to.
I'd probably be willing to help out a bit if you go this route.
P.S. I'm still not sure if you could guarantee to find the word each time using your method. You could also analyze the waveform for the specified time period and maybe count words/peaks/valleys in the audio or something. I have no clue if this would work or not, but you could get the data from the Gstreamer pipeline in any case.
I would think that parts of the proposed program are possible - at the most simple level you can perform a lookup in a database of known bad words for all words in the subtitles.
However, have you thought about movies with no subtitles?
Or, more importantly, how are you going to cover up the sounds when only parsing a text file? The only thing that would be suitable would be to write a plugin of some sort to the media player, but then you lose generaliity and it also doesn't seem to be what you're doing in the first place. A possible way to do this would be to have the program play some "beep.wav" when it thinks a bad word will be uttered, but the original words will still be audible and is at best an unprofessional hack that is not likely to amaze anyone.
Lastly, any algorithm used to calculate the time of words in the middle of a subtitle would need to be flawless, with error rates maybe in the 0.0001% range. If you can't develop an algorithm that can be that accurate before you start the project, I would reconsider your project.
If you have suitable ways to manage all these issues however, I may help you on this project if it's within the scope of spare time I have.
Hi,
I have thought about this. I think the easiest way would be to simply mute the sound briefly. There are languages, such as "Autoit" that work via wine on Linux, or natively on Windows. It would be possible to use perl, or any language to generate a simple file that would then tell the program when to mute. The good part about Autoit, is that it can "know" when a program is paused or playing. That could be done simply by checking for windows opened with "paused" in the title.
The front-end as I mentioned could really be done in any language. Simply direct the user to: http://subscene.com/filmsearch.aspx?q=uncle+buck and replace the "uncle+buck" with the title of the movie they want. The other option is making a native subtitle extracter, which would be extra work / hastle.
Example:
typical line from subtitles….
00:01:36,884 -> 00:01:39,428
- You're thinking of ''shit.''
- Right.
Seeing how generally the subtitles do not put large sentences it would not be hard to mute the word. Here there are 5 words spoken in a period of 3 seconds. giving each word about 0.6 seconds. That would mean the mute would have to occur around 00:01:39,284
This would be fairly trivial to implement in a Gstreamer plugin/pipeline (C, Vala, Python or others I'm sure). For example the playbin2 element is an "all-in-one" media player plugin that supports lots of formats. It has a "text-sink" property which you can use to get the text into your code and an "audio-sink" property that you can use to control the audio (mute/beep/whatever). Gstreamer also works on at least Linux and Windows. You could probably do the whole thing in a few hundred lines of Python code. My guess would be a few hundred lines of Python code to write this using your method you described.
If you make this as a plugin for Gstreamer, you will have the advantage of it working with any players that support support Gstreamer as a backend (ex. Totem, others) and you won't have to write your own player if you don't want to.
I'd probably be willing to help out a bit if you go this route.
P.S. I'm still not sure if you could guarantee to find the word each time using your method. You could also analyze the waveform for the specified time period and maybe count words/peaks/valleys in the audio or something. I have no clue if this would work or not, but you could get the data from the Gstreamer pipeline in any case.
Cheers
http://www.gstreamer.net/data/doc/gstreamer/head/gst-plugins-base-plugins/html/gst-plugins-base-plugins-playbin2.html
http://projects.gnome.org/totem/
http://gstreamer.freedesktop.org/apps/