SonG is a helper application for use along with SoNIA. It generates SoNIA .son format input files by converting multiple text files or by using a set of queries to extract relationships from a MySQL database. It also includes several helpful error checking features, as well as filters for cleaning and debugging slightly messy input data.
SoNG can assemble relational data from text files such as .csv files in which the data values are separated into columns by delimiters of comma, tab, space, or pipe. The first row of the file must a header row with the names of the columns.
When the input source is set to "File" an input file can be selected using the "Open arcs file" button. Each row of the input file is assumed to describe a relationship arc. The source and target of the relationship must be identified by the FromId and ToId columns. Most other columns will be passed through into the .son file as data, unless they set to "ignore" in Input Columns panel. If the Generate nodes from arc file option is checked, SonG will create the rows of the AlphaId column using the set of node ids which appear in the arc file.
If the input arc file includes columns for StartTime and EndTime it is possible to generate the appropriate times along with the nodes. When the Generate node time option is also checked, a node record will be generated for each discrete integer time-step at which that the node id appears in an arc record. The EndTime will be set equal to the start time. Currently this is only appropriate certain underlying time models.
Additional node-level attributes can be attached to the generated node records by checking the Attach node properties option, and setting a node file path in the Input Source pane. SonG will copy the attributes from rows in the node files that have value in the AlphaId column that matches with a node id. Multiple rows with the same id will trigger a warning, or a data problem if the values of multiple matching rows are different.
If the Generate nodes from arc file option is NOT checked, SoNG will assume that the data for node properties and node definitions should be loaded intact from the node data file specified in the Input Source panel.
SoNG will not auto-magically generate networks from your MySQL database. But if can right some simple queries to output the set of relationships and properties you are interested in, it will greatly simplify the process.
The first step is to connect to the database by filling in the appropriate connection settings values in the Input Source pane, and click the Connect to DB button.
The relationship query should return columns with names that match with the .son arc definitions. For example:
If Generate node set from relations is checked, SonG will attempt to create a set of node records that match with all the FromId and ToId values found in the relationship query. If a query is found in the Node Properties Query field, the query will be executed once for each of the node ids and the columns returned will be used to fill in the attributes for each node. SonG will replace the string "$NODEID" in the properties query with each node id before sending the query to the database. So for example:
would produce a query like
and would presumably set a label for each node, and set the width of the node equal to the sum of the contributions from each node_i.
Much like when working from text files (above), SonG can create appropriate node records at each observed ede time point (assuming you are using some kind of discrete time model) if the Also generate times is checked and the StartTime and EndTime columns are included in the relationships query.
If the nodes properties need to vary in time, an appropriate restriction to the query can be added using the $TIME substitution string. So:
It is also possible to construct the node set with a complete independent query by un-checking the Generate node set from relations option. In this case, the rows returned by Node Set Query will need to match with the edge set, but SonG won't be able to do anything to enforce it. In other words it is up to the user to write a query that matches, or use some of the filtering options to detect and remove any non-matches.
For example:
Sometimes when dealing with very large networks it is useful to extract only a connected component instead of the entire graph. However writing SQL queries to do this can be extremely difficult, especially when the depth of the component is not known ahead of time. The Crawl relationships to define network option makes it possible to crudely search and and extract a component. The Seed node set query is used to define the set of node ids that the search will start from.
or
When crawling, the Arc relationship query must include the $NODEID substitution string and will be executed once for each node in the seed set. Any new node ids that are returned by the queries (neighboring nodes) will be added to the seed set and the process will continue until no new nodes are discovered. WARNING: If you don't know the structure before, it is usually impossible to predict how many many steps it will take to finish. In addition, executing one query per node is not a very efficient way of using an SQL database, so it will be fairly slow.
Once the data have been loaded, either by parsing from a text file or querying from a database, the will appear in the Node Data and Arc Data panes of the Data tab. If any non-fatal errors were discovered during loading, they will be listed as Data Problems under the tables. Clicking on a data problem will hilite the appropriate row. Usually it is a good idea to fix the problems in the input data, but if necessary, the values in the tables can be edited like in a spreadsheet to correct errors before continuing.
Some types of errors can be resolved by running a filter to process the data. Select the desired filters in the Filters pane, and click Run Filter. After being processed by the filter(s) the data will be re-validated to check for new problem rows. Resolving some problems may create new ones, so some times it is necessary to run various filters multiple times.
If you write a filter than you find useful (or need one written) please communicate with the SoNIA user community.
Existing filters:
The Preview pane shows an editable text version of the .son file exactly how it will be written out. The Validate .son button will run the file through the .son parser (without launching SoNIA) to check for possible parsing errors. If errors are found, the appropriate row will be highlighted.
The Save button does pretty much what you would expect, it brings up a save dialog box so you can specify where you would like the output file to be written. The Launch in SoNIA button will write the .son data to a temp file and load it directly into SoNIA. However, quitting SoNIA will then also quit SonG.