Name | Modified | Size | Downloads / Week |
---|---|---|---|
Browser Services.zip | 2019-02-28 | 23.1 kB | |
README.txt | 2019-02-22 | 4.3 kB | |
browser_interface.py | 2019-02-22 | 11.9 kB | |
interface_test.py | 2019-02-20 | 959 Bytes | |
Totals: 4 Items | 40.2 kB | 0 |
This Firefox addon allows an external application like a Python script to control the browser through the command line. The original intent of this program was to automate "masked" downloads. A masked download is done by requesting a normal page load such as a user would do, complete with ads. Once the page has been completely downloaded, a second download request is done. Since the page has been cached, the download can be done from the cache effectively masking it from the server. Judicious use of an ad-blocker is suggested as too many ads may slow down the downloading excessively. Since this mimics normal browsing, web scraping can be done without triggering bot defences. Native messaging is not used because of the extreme difficulty in debugging applications using it. It's very easy to send a new tab request using the webbrowser library of Python. The message is clearly seen in the address bar and since the Python script is not running as a subprocess of the browser, it is very easy to debug. A message is sent to the browser by requesting it to open a new tab with a well formed, but known inoperative URL, 'http://example.com/'. The webRequest API is used to listen for either the error caused by this URL or a match with the URL. A JSON structured list of task requests follows the URL. Inside the browser, the URL is stripped off and JSON elements saved. The applications that this program was developed for, insider trading filters for the Canadian and US stock markets, require thousands of files to be downloaded. The SEC EDGAR site requires that download requests be limited to 10 per second. Any faster and the user's machine will be blocked. Rather than sending individual requests, a list of requests is sent and when the list is complete, the browser will then sequence through the list. This makes it easy to properly time the requests with minimal code. These insider filtering programs are available at https://sourceforge.net/projects/investortools/ Multiple applications can use the browser simultaneously. This is done by each application sending a unique client ID. This ID is sent at the beginning of every message and is used for both assembling the list of requests and for internal tracking within the browser. Although the primary purpose of this addon is masked downloads, cross origin downloading and same origin downloading are also available. This completely eliminates the need of an HTTP library for downloading. Messaging from the browser to the external application is done by requesting the browser to place signal files in the download folder. All task requests for one client are done synchronously such that tasks don't overlap. If a download request is immediately followed by a signal write request, the downloaded file will be closed and accessible when the signal file name appears in the directory. A simple protocol using two signal files can be used by the external application to determine when all tasks have been completed without risk of race conditions. If "Signal_1" is present in the directory, the application will direct the browser to write "Signal_2" when the list of tasks is completed. Because the files are written synchronously, the presence of "Signal_2" means that all tasks are completed and the "Signal_1" file can be removed and reused as a signal. Later, when "Signal_1" appears, "Signal_2" can be removed and reused as a signal. Each request consists of two JSON objects, the client name string and an object containing two elements, the service name as a string and another object containing parameters associated with the service. The parameter objects will match the options for the underlying WebExtension API as well as any additional options such as the delay between requests. Python primitives and objects map almost perfectly onto JSON objects. This makes it possible to use the full range of options for each download service. For convenience, a BrowserInterface class is provided. This has two methods, "sendRequests" for sending a list of requests to browser and "poll" which will return true when all requests are done. A list of requests is sent to the sendRequests method which will construct URL packets, send them to the browser and handle the signalling protocol.