Menu

Using_SonG_to_create_input_files

Skye Bender-deMoll
Attachments
SonG_MySQL_screen.png (62762 bytes)
SonG_textFile_screen.png (60219 bytes)

SonG (Sonia Getter) helper application

SonG is a helper application for use along with SoNIA. It generates SoNIA .son format input files by converting multiple text files or by using a set of queries to extract relationships from a MySQL database. It also includes several helpful error checking features, as well as filters for cleaning and debugging slightly messy input data.

Using SonG with text input files

SoNG can assemble relational data from text files such as .csv files in which the data values are separated into columns by delimiters of comma, tab, space, or pipe. The first row of the file must a header row with the names of the columns.

Generating a node set for a relationship file

When the input source is set to "File" an input file can be selected using the "Open arcs file" button. Each row of the input file is assumed to describe a relationship arc. The source and target of the relationship must be identified by the FromId and ToId columns. Most other columns will be passed through into the .son file as data, unless they set to "ignore" in Input Columns panel. If the Generate nodes from arc file option is checked, SonG will create the rows of the AlphaId column using the set of node ids which appear in the arc file.

Generating a node set for a relationship file with times

If the input arc file includes columns for StartTime and EndTime it is possible to generate the appropriate times along with the nodes. When the Generate node time option is also checked, a node record will be generated for each discrete integer time-step at which that the node id appears in an arc record. The EndTime will be set equal to the start time. Currently this is only appropriate certain underlying time models.

Importing node properties for a generated node set

Additional node-level attributes can be attached to the generated node records by checking the Attach node properties option, and setting a node file path in the Input Source pane. SonG will copy the attributes from rows in the node files that have value in the AlphaId column that matches with a node id. Multiple rows with the same id will trigger a warning, or a data problem if the values of multiple matching rows are different.

Merging an imported node set with a relationship file

If the Generate nodes from arc file option is NOT checked, SoNG will assume that the data for node properties and node definitions should be loaded intact from the node data file specified in the Input Source panel.

Using SonG with MySQL

SoNG will not auto-magically generate networks from your MySQL database. But if can right some simple queries to output the set of relationships and properties you are interested in, it will greatly simplify the process.

The first step is to connect to the database by filling in the appropriate connection settings values in the Input Source pane, and click the Connect to DB button.

Writing a relationships query

The relationship query should return columns with names that match with the .son arc definitions. For example:

  • SELECT node_i AS FromId, node_j AS ToId, cash AS ArcWeight , contribution_date AS StartTime FROM myContibutions;

Generating a node set and node properties from relationships query

If Generate node set from relations is checked, SonG will attempt to create a set of node records that match with all the FromId and ToId values found in the relationship query. If a query is found in the Node Properties Query field, the query will be executed once for each of the node ids and the columns returned will be used to fill in the attributes for each node. SonG will replace the string "$NODEID" in the properties query with each node id before sending the query to the database. So for example:

  • SELECT node_i AS AlphaId, contributor AS Label, sum(contribution) FROM myContributions WHERE node_i = "$NODEID" GROUP BY node_i;

would produce a query like

  • SELECT node_i AS AlphaId, contributor AS Label, sum(contribution) AS NodeSize FROM myContributions WHERE node_i = "node_12" GROUP BY node_i;

and would presumably set a label for each node, and set the width of the node equal to the sum of the contributions from each node_i.

Generating a node set and times from relationship query

Much like when working from text files (above), SonG can create appropriate node records at each observed ede time point (assuming you are using some kind of discrete time model) if the Also generate times is checked and the StartTime and EndTime columns are included in the relationships query.

If the nodes properties need to vary in time, an appropriate restriction to the query can be added using the $TIME substitution string. So:

  • ...WHERE node_i = "$NODEID" AND contribution_date = $TIME;

Merging a node set and a relationship query

It is also possible to construct the node set with a complete independent query by un-checking the Generate node set from relations option. In this case, the rows returned by Node Set Query will need to match with the edge set, but SonG won't be able to do anything to enforce it. In other words it is up to the user to write a query that matches, or use some of the filtering options to detect and remove any non-matches.

For example:

  • SELECT contributor_id AS AlphaId, contributor_name AS Label, special_ranking AS NodeSize FROM mySpecialContributorNodes;

"Crawling" a network in a database

Sometimes when dealing with very large networks it is useful to extract only a connected component instead of the entire graph. However writing SQL queries to do this can be extremely difficult, especially when the depth of the component is not known ahead of time. The Crawl relationships to define network option makes it possible to crudely search and and extract a component. The Seed node set query is used to define the set of node ids that the search will start from.

  • SELECT 391;

or

  • SELECT node_i FROM myContributions WHERE contribution_amount > 2300;

When crawling, the Arc relationship query must include the $NODEID substitution string and will be executed once for each node in the seed set. Any new node ids that are returned by the queries (neighboring nodes) will be added to the seed set and the process will continue until no new nodes are discovered. WARNING: If you don't know the structure before, it is usually impossible to predict how many many steps it will take to finish. In addition, executing one query per node is not a very efficient way of using an SQL database, so it will be fairly slow.

  • SELECT node_i AS FromId, node_j AS ToId, cash AS ArcWeight , contribution_date AS StartTime FROM myContibutions WHERE node_i = "$NODEID";

Checking and editing the loaded data

Once the data have been loaded, either by parsing from a text file or querying from a database, the will appear in the Node Data and Arc Data panes of the Data tab. If any non-fatal errors were discovered during loading, they will be listed as Data Problems under the tables. Clicking on a data problem will hilite the appropriate row. Usually it is a good idea to fix the problems in the input data, but if necessary, the values in the tables can be edited like in a spreadsheet to correct errors before continuing.

Filtering the data to resolve problems

Some types of errors can be resolved by running a filter to process the data. Select the desired filters in the Filters pane, and click Run Filter. After being processed by the filter(s) the data will be re-validated to check for new problem rows. Resolving some problems may create new ones, so some times it is necessary to run various filters multiple times.

If you write a filter than you find useful (or need one written) please communicate with the SoNIA user community.

Existing filters:

  • Remove all problem rows : simply deletes any row that has been flagged with a problem. Can create inconsistencies, but often useful when trying to get a quick network to look at dirty data.
  • SQL_date to decimal : converts MySQL formatted date strings ("2010-12-31") to decimal numbers appropriate for SoNIA.
  • Remove arcs with non-matching node times : removes any relationship that links to a node id that is not active at the time that the arc connects. Only useful if the nodes have not been generated from the arc times. Only works for discrete time?

Previewing and validating the file

The Preview pane shows an editable text version of the .son file exactly how it will be written out. The Validate .son button will run the file through the .son parser (without launching SoNIA) to check for possible parsing errors. If errors are found, the appropriate row will be highlighted.

Saving the .son file and launching in SoNIA

The Save button does pretty much what you would expect, it brings up a save dialog box so you can specify where you would like the output file to be written. The Launch in SoNIA button will write the .son data to a temp file and load it directly into SoNIA. However, quitting SoNIA will then also quit SonG.

Other notes and features

  • Most SonG properties and query strings (except DB password) are saved to a local properties file when you quit the application so they should still be there for your next session.

Related

Wiki: Main_Page
Wiki: Son_format

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.