Copyright (c) 2012-2014 Kirill Belyaev
* TeleScope - XML Message Stream Broker/Replicator Platform
* This work is licensed under the Creative Commons Attribution-NonCommercial 3.0 Unported License.
* To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/3.0/ or send
* a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA.
The violent carriage of it
Will clear or end the business: when the oracle,
Thus by Apollo's great divine seal'd up,
Shall the contents discover, something rare
Even then will rush to knowledge. Go: fresh horses!
And gracious be the issue!"
Winter's Tale, Act 3, Scene 1. William Shakespeare
# WHY do you need it? #
The general scenario for TeleScope deployment is the need for consistent/selective replication of XML message stream from a single source publisher to multiple destinations for ETL (Extract, Transform, Load) purposes. The replication is formed through establishing a network of interconnected TeleScope brokers:
1. publish/distribute remote content from remote publisher to local subscribers - all subscribers should have the same consistent replication when all subs receive all the messages in the stream (filtered at a single entry point)
2. publish/distribute remote content from remote publisher to local subscribers - all subscribers then can filter content they need (filtered at a single entry point)
3. publish/distribute local XML content to a set of subscribers - subscribers then can filter the messages they want to save
Local distribution example:
Applications are generating XML data that needs to be replicated to other applications/databases.
For example central weather application dumps continental US weather data in XML format - you need to distribute specific geographic regions covered by weather data to different nodes that then run ETL on the data.
Run XMLConverter utility (supplied) to continuously convert XML data into TeleScope XML format ready for streaming.
Run TeleScope to publish converted XML data to subscribers in other parts of the network segment.
Each TeleScope subscriber filters and saves only messages relevant to itself for further ETL.
TeleScope is the efficient intensive-load XML message stream broker written in C for the Fedora 17-18, Slackware 13-14, Red Hat Enterprise Linux 6 (RHEL-6) Linux distributions.
The platform is intended to be operated upon the single number/word values and is not meant to be deployed for full-text XML stream analysis. TeleScope has internal query language with a set of standard logical operators that allows to construct relatively complex query expressions. The platform features the pub-sub architecture and serves up to 128 simultaneously connected XML stream subscribers. The broker features Continuous Query engine over the XML message stream. TeleScope provides the remote cli interface to login (in cisco fashion via telnet) and change/reset the query transaction on the current stream on the fly in real time. It also gives data query and subscribers statistics via a separate status port. TeleScope is able to analyze vast continuous XML message stream generated by the publisher in real time under heavy load.
TeleScope is able to analyze vast continuous stream of XML messages generated
by the publisher (sensor network, log files converted into xml, gps data, gis or weather
data in xml or anything else provided that it is valid XML) in real time select and save
messages that correspond to the speciﬁed query pattern.
Besides effective ﬁltering mechanism TeleScope is capable to scale really well with
a large number of concurrent stream subscribers. Its performance under intensive
benchmarking has been evaluated and estimated to be suitable for real-world deployment
under heavy load with a large number of concurrent clients. The system is also able
to distribute the filtering computations among a network of pub-sub nodes and form
data stream meshes of various topologies.
NOTE: The TeleScope system has been compiled and tested ONLY on the latest Fedora 17 - 18, Slackware 13.37.0 - 14.0, Red Hat Enterprise Linux 6 (RHEL-6) Linux distributions.
Development libraries including gcc and make utility, gnome libxml2 headers (libxml2-devel)
untar the telescope-version-X.tar and cd into the project directory
type make clean to remove stale .o files
type make all in the current directory to compile TeleScope
the executable is installed in ./bin/telescope
copy the ./bin/telescope executable to the desired directory (typically /usr/local/bin/telescope )
Run-time Command-line options:
-s - turn on server mode
-l - turn logging on to file
-d - turn XML data file reading on to populate the Queue Table from the file - provide data file name
-h - hostname to connect to
-p - port number for the host to connect to
-f - filename to write the captured xml messages (mandatory: name for text data ﬁle)
-e - query expression to match in the xml data stream
Run in publisher/subscriber mode - connect to publisher and serve stream to subscribers
./bin/telescope -s - h xmlhub.com -p 8080 -f test.txt -e "type = UPDATE"
./bin/telescope -s - h xmlhub.com -p 8080 -f test.txt -e "AFI = IPV6"
Run in publisher mode - read data from a file and stream to subscribers
./bin/telescope -f /dev/null -e "" -d xmldataS.txt -s &
Run in subscriber mode
./bin/telescope -h volga -p 50000 -f data -e "station_id = 42362" &
./scripts/telescope-stop.sh - stop TeleScope
TeleScope presently has a limited set of command-line options provided upon start-up that direct its
mode of operation. Presently no conﬁguration ﬁle is supported since all the required functionality
directives could be expressed via arguments to the executable. The following arguments are
• -f (mandatory: name for text data ﬁle): Indicates the name of the data ﬁle where the pro-
cessed and matching XML messages should be dumped to
• -h (optional: hostname of the publisher for XML data stream): Indicates the hostname of
the XML data publisher to connect to in case of a client (subscriber) mode of operation.
• -p (optional: port of the XML data publisher): Indicates the XML publisher port to connect
to in case of a client (subscriber) mode of operation when -h is speciﬁed
• -s (optional: publisher (server) mode ﬂag): Instructs TeleScope to go into publisher (server)
mode of operation with the default server port hardcoded to port 50000 to listen for incoming
clients (subscribers) connections.
• -l (optional: log ﬂag): instructs TeleScope to log its operation into the the standard /var/log/telescope.log
location by default. If this option is not speciﬁed or permissions to write into /var/log/ are
insufﬁcient logging is directed to /dev/null
• -d (optional: name of XML data ﬁle to read from): Indicates the name of the XML data ﬁle to read XML messages from
• -e (mandatory: pattern matching expression): Indicates the actual XML ﬁltering expression
that follows in ” ” (double quotes).
A separate Status Thread in TeleScope enables the retrieval of some general information about
the running TeleScope instance. The Status Thread listens on port 50001 and dumps the following
info about the instance to the telnet connection:
• total number of messages processed
• number of messages matching the pattern
• ratio of above
• uptime info
• number of connected clients
• clients’ IP addresses
Example: telnet volga 50001
Connected to volga.
Escape character is '^]'.
STATUS DATA START:
Number of connected clients is:
Client IPs are:
STATUS DATA END:
A separate CLI Thread in TeleScope enables the login feature to login into the running
instance and perform change to the currently active transaction or shutdown the running
TeleScope instance. The CLI Thread listens on port 50002. The default access password
telnet volga 50002
Connected to volga.
Escape character is '^]'.
available commands are :
help (h); exit (q); show transaction (st); change transaction (ct); reset transaction (rt); shutdown (sd)
AFI = IPV6
The unique feature of every TeleScope instance is its ability to instantiate itself in one of two
possible operational modes: either in the server (publisher) mode or in the client (subscriber) mode depending on the speciﬁed command-line option ﬂag. In fact TeleScope is even able to operate in dual-mode being
both a client and the server all at once. The server mode of operation (-s flag) launches the TeleScope instance
as a server listening for incoming client connections to arrive on the speciﬁed port (port 50000 by default). The client mode of operation forces TeleScope to become a client connecting to the speciﬁed TeleScope server
instance (or XML aggregator) to receive XML data stream. Each mode assumes that speciﬁc data pattern
construct should be speciﬁed at the command-line to perform incoming stream matching.
XML Message Structure:
TeleScope would process any valid XML message as long as it adheres to the simple specification:
It should start with the following opening tag:
<XML_MESSAGE length="0001406" - the length attribute indicates the total length of the message.
It should end with the following closing tag:
The XML messages in data file should be separated by the new line "\n".
The sample data file (sample-xmldata.txt) is provided in the distribution.
That's about it! :)
Happy subscribing and publishing data streams!
Example XML data messages:
<XML_MESSAGE length="0001406"><current_observation version="1.0" xmlns:xsd="http://www.w3.org/2001/XMLSchema"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://www.weather.gov/view/current_observation.xsd"><credit>NOAA's National Weather Service</credit><credit_URL>http://weather.gov/</credit_URL><image><url>http://weather.gov/images/xml_logo.gif</url><title>NOAA's National Weather Service</title><link>http://weather.gov</link></image><suggested_pickup>15 minutes after the hour</suggested_pickup><suggested_pickup_period>60</suggested_pickup_period><location>Brutus - Green Canyon 158</location><station_id>42362</station_id><latitude>27.8</latitude><longitude>-90.67</longitude><observation_time>Last Updated on Apr 29 2012, 2:30 pm CDT</observation_time><observation_time_rfc822>Sun, 29 Apr 2012 14:30:00 -0500</observation_time_rfc822><temperature_string>72.0 F (22.2 C)</temperature_string><temp_f>72.0</temp_f><temp_c>22.2</temp_c><dewpoint_string>33.1 F (0.6 C)</dewpoint_string><dewpoint_f>33.1</dewpoint_f><dewpoint_c>0.6</dewpoint_c><mean_wave_dir>Southeast</mean_wave_dir><mean_wave_degrees></mean_wave_degrees><disclaimer_url>http://weather.gov/disclaimer.html</disclaimer_url><copyright_url>http://weather.gov/disclaimer.html</copyright_url><privacy_policy_url>http://weather.gov/notice.html</privacy_policy_url></current_observation></XML_MESSAGE>
<BGP_MESSAGE length="00002172" version="0.4" xmlns="urn:ietf:params:xml:ns:xfb-0.4" type_value="2" type="UPDATE"><SEQ id="2128112124" seq_num="825205435"/><TIME timestamp="1337745796" datetime="2012-05-23T04:03:16Z" precision_time="677"/><PEERING as_num_len="4"><SRC_ADDR><ADDRESS>2001:de8:6::6447:1</ADDRESS><AFI value="2">IPV6</AFI></SRC_ADDR><SRC_PORT>179</SRC_PORT><SRC_AS>6447</SRC_AS><DST_ADDR><ADDRESS>2001:de8:6::3:71:1</ADDRESS><AFI value="2">IPV6</AFI></DST_ADDR><DST_PORT>179</DST_PORT><DST_AS>30071</DST_AS><BGPID>0.0.0.0</BGPID></PEERING><ASCII_MSG length="105"><MARKER length="16">FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF</MARKER><UPDATE withdrawn_len="0" path_attr_len="82"><WITHDRAWN count="0"/><PATH_ATTRIBUTES count="5"><ATTRIBUTE length="1"><FLAGS transitive="TRUE"/><TYPE value="1">ORIGIN</TYPE><ORIGIN value="0">IGP</ORIGIN></ATTRIBUTE><ATTRIBUTE length="14"><FLAGS transitive="TRUE"/><TYPE value="2">AS_PATH</TYPE><AS_PATH><AS_SEG type="AS_SEQUENCE" length="3"><AS>30071</AS><AS>3356</AS><AS>26878</AS></AS_SEG></AS_PATH></ATTRIBUTE><ATTRIBUTE length="4"><FLAGS optional="TRUE"/><TYPE value="4">MULTI_EXIT_DISC</TYPE><MULTI_EXIT_DISC>2316</MULTI_EXIT_DISC></ATTRIBUTE><ATTRIBUTE length="4"><FLAGS optional="TRUE" transitive="TRUE"/><TYPE value="8">COMMUNITIES</TYPE><COMMUNITIES><COMMUNITY><AS>30071</AS><VALUE>57042</VALUE></COMMUNITY></COMMUNITIES></ATTRIBUTE><ATTRIBUTE length="44"><FLAGS optional="TRUE"/><TYPE value="14">MP_REACH_NLRI</TYPE><MP_REACH_NLRI><AFI value="2">IPV6</AFI><SAFI value="1">UNICAST</SAFI><NEXT_HOP_LEN>32</NEXT_HOP_LEN><NEXT_HOP><ADDRESS>2001:de8:6::3:71:1</ADDRESS><ADDRESS>fe80::20e:cff:feb1:dd92</ADDRESS></NEXT_HOP><NLRI count="1"><PREFIX label="DANN"><ADDRESS>2604:f400:1::/48</ADDRESS><AFI value="2">IPV6</AFI><SAFI value="1">UNICAST</SAFI></PREFIX></NLRI></MP_REACH_NLRI></ATTRIBUTE></PATH_ATTRIBUTES><NLRI count="0"/></UPDATE></ASCII_MSG><OCTET_MSG><OCTETS length="105">FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF006902000000524001010040020E02030000757700000D1C000068FE8004040000090CC008047577DED2800E2C0002012020010DE8000600000000000300710001FE80000000000000020E0CFFFEB1DD9200302604F4000001</OCTETS></OCTET_MSG></BGP_MESSAGE>
<XML_MESSAGE length="0000245" prop1="gnome is great" prop2="& linux too"><head><title>Welcome to Gnome</title></head><chapter><title>The Linux adventure</title><p>bla bla bla ...</p><image href="linus.gif"/><p>...</p></chapter></XML_MESSAGE>