Download Latest Version Browser Services.zip (23.1 kB)
Email in envelope

Get an email when there's a new version of BrowserServices

Home
Name Modified Size InfoDownloads / Week
Browser Services.zip 2019-02-28 23.1 kB
README.txt 2019-02-22 4.3 kB
browser_interface.py 2019-02-22 11.9 kB
interface_test.py 2019-02-20 959 Bytes
Totals: 4 Items   40.2 kB 0
This Firefox addon allows an external application like a Python script to control the browser through the command line.
The original intent of this program was to automate "masked" downloads. A masked download is done by requesting a normal page
load such as a user would do, complete with ads. Once the page has been completely downloaded, a second download request is done.
Since the page has been cached, the download can be done from the cache effectively masking it from the server. Judicious use of
an ad-blocker is suggested as too many ads may slow down the downloading excessively. Since this mimics normal browsing, web
scraping can be done without triggering bot defences.
Native messaging is not used because of the extreme difficulty in debugging applications using it. It's very easy to send a new
tab request using the webbrowser library of Python. The message is clearly seen in the address bar and since the Python script is
not running as a subprocess of the browser, it is very easy to debug.
A message is sent to the browser by requesting it to open a new tab with a well formed, but known inoperative URL,
'http://example.com/'. The webRequest API is used to listen for either the error caused by this URL or a match with the URL.
A JSON structured list of task requests follows the URL. Inside the browser, the URL is stripped off and JSON elements saved.
The applications that this program was developed for, insider trading filters for the Canadian and US stock markets, require
thousands of files to be downloaded. The SEC EDGAR site requires that download requests be limited to 10 per second. Any faster
and the user's machine will be blocked. Rather than sending individual requests, a list of requests is sent and when the list is
complete, the browser will then sequence through the list. This makes it easy to properly time the requests with minimal code.
These insider filtering programs are available at https://sourceforge.net/projects/investortools/
Multiple applications can use the browser simultaneously. This is done by each application sending a unique client ID. This ID
is sent at the beginning of every message and is used for both assembling the list of requests and for internal tracking within
the browser.
Although the primary purpose of this addon is masked downloads, cross origin downloading and same origin downloading are also
available. This completely eliminates the need of an HTTP library for downloading.
Messaging from the browser to the external application is done by requesting the browser to place signal files in the download
folder. All task requests for one client are done synchronously such that tasks don't overlap. If a download request is
immediately followed by a signal write request, the downloaded file will be closed and accessible when the signal file name
appears in the directory.
A simple protocol using two signal files can be used by the external application to determine when all tasks have
been completed without risk of race conditions. If "Signal_1" is present in the directory, the application will direct the
browser to write "Signal_2" when the list of tasks is completed. Because the files are written synchronously, the presence of
"Signal_2" means that all tasks are completed and the "Signal_1" file can be removed and reused as a signal. Later, when
"Signal_1" appears, "Signal_2" can be removed and reused as a signal.
Each request consists of two JSON objects, the client name string and an object containing two elements, the service name as a
string and another object containing parameters associated with the service. The parameter objects will match the options for
the underlying WebExtension API as well as any additional options such as the delay between requests. Python primitives and
objects map almost perfectly onto JSON objects. This makes it possible to use the full range of options for each download
service.
For convenience, a BrowserInterface class is provided. This has two methods, "sendRequests" for sending a list of requests to
browser and "poll" which will return true when all requests are done. A list of requests is sent to the sendRequests method
which will construct URL packets, send them to the browser and handle the signalling protocol.
Source: README.txt, updated 2019-02-22