Hi,
I have to realize a system of management and
extraction script from the pages web.
The problem is the implementation of a function
of "cleaning": extracts the scripts and putting them
in one or more external file and “cleans up” the pages
from such script, and connects the html page (without
script) to script (in external file). How am I able
with the html parser, to remove the scripts from the
pages and get the html back (without script)?
Example:
Page.html (With javascript)-----> Page. html
(without javascript) + file.js
thanks,
Francesco
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You should be able to subclass ScriptTag and both file the script to disk when doSemanticAction() is called and return nothing when the page is converted back to HTML with toHtml().
See the documentation for PrototypicalNodeFactory.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I have to realize a system of management and
extraction script from the pages web.
The problem is the implementation of a function
of "cleaning": extracts the scripts and putting them
in one or more external file and “cleans up” the pages
from such script, and connects the html page (without
script) to script (in external file). How am I able
with the html parser, to remove the scripts from the
pages and get the html back (without script)?
Example:
Page.html (With javascript)-----> Page. html
(without javascript) + file.js
thanks,
Francesco
You should be able to subclass ScriptTag and both file the script to disk when doSemanticAction() is called and return nothing when the page is converted back to HTML with toHtml().
See the documentation for PrototypicalNodeFactory.