Menu

Son_format

Skye Bender-deMoll
Attachments

The .son Input Format

The .son format is intended to deal with some of the limitations of the .net format, and facilitate storing and importing network event, attribute rich, data. In addition, it is set up to be as easy as possible to write export scripts from other applications or modify spreadsheet data. The underlying concept is that of an arc-list format (as opposed to a matrix format) with separate sections for node and arc records. The entries for each record are tab-delimited, and are order-insensitive because they are defined by column headings rather than inline tags. However, most of the attributes are optional and can be omitted, and the order of the columns is not important. The required attributes are some kind of unique identifying tag, and time coordinates for the event. This makes the format pretty flexible.

The idea for the .son parser is that it reads an edgelist format, but one that is "column based" rather than "token based" like the .net parser. This means that is is much easier to write scripts or translate spreadsheets of attribute data into the .son format, but it will be more cumbersome than .net for hand coding. Sonia will eventually be able to save files in the .son format.

The input file works as follows:

The first several lines of the file can contain comments if they begin with double slashes // and these comments will show up in Sonia's log file. Next will be a row of tab-delimited column headings indicating the node attribute categories. The first item in the row must be "NodeId", as this is how the parser finds the row. Also, the data must contain a column for NodeId, as this is the nodes' unique identifier and is used to reference "to" and "from" nodes in the arc data. NodeIds must form a complete set of integers. However, it is possible to use "AlphaId" in which case node ids and arc to and from records are strings. If the entries in a node row are cannot be parsed to the type specified by the column heading, an error will be thrown. The remaining headings can be in any order, and can be omitted, in which case a default value is used. However, If a column heading for an attribute is included, every record must contain an entry for that attribute. Blanks are not allowed, as they will cause the columns to misalign.

The following is a list of valid node column headings and values:

  • NodeId - must be an integer. values can be used more than once (to specify changes in a node's attributes over time) BUT MUST FORM A CONTINUOUS SEQUENCE. (if you try to leave out numbers it will throw an error, as this would mess up the matrix references.
  • AlphaId - any string. use instead of NodeId for string ids. Strings are mapped to ids in the order they are parsed, one for each unique string.
  • Label - text to be displayed as nodes' label.
  • X - realvalued number expressing nodes position in pixels, origin is at the upper left corner of the window.
  • Y - ditto x
  • ColorName - text specifying a color for the node, one of: Black DarkGray LightGray White Cyan Green Magenta Orange Pink Red Yellow Blue Alternatively, you can specify the color using a "Red-Green-Blue" color model by using the column headings:
  • RedRGB - real number between 0 and 1 specifying the red component
  • GreenRGB - "
  • BlueRGB - "
  • NodeShape - text specifying the shape for the node, current can only be "ellipse" or "rect"
  • NodeSize - positive real number specifying the size of the node in pixels
  • LabelColor - the color for the label text, must be a color name
  • LabelSize - size of label in points, default is 10
  • BorderColor - the color for the node's border, must be a color name
  • IconURL - url to load jpg image to use as node's icon.
  • BorderWidth - real number for the width of the node's border
  • StartTime - real value specifying the start time for the node
  • EndTime - real value specifying the end time for the node.

After the node records should be a row of column headings for the arc records. This line must begin with "FromId", as this is how the parser knows that the end of the node records has been reached. The rest of the entries for the arc column headings can be in any order.

  • FromId - integer (or string, if aplpha id is used) indicating the source node must match with a node id
  • ToId - integer indicating the destination node
  • ArcWeight - real value indicating the strength of the relation
  • ArcWidth - real value indicating how wide to draw the arrow
  • ColorName, RedRGB, GreenRGB, BlueRGB - see node colors
  • StartTime - real value indicating the arc's start
  • EndTime - real value indicating the arc's termination

Example://

//optional comments at the start of the file, proceeded by double slash

NodeId    Label    StartTime    EndTime    NodeSize    NodeShape    ColorName    BorderWidth    BorderColor 
1    129473    0.0    42.0    5.0    ellipse    lightGray    1.5    black
2    129047    0.0    42.0    5.0    rect    gray    1.5    black 
3    132996    0.0    42.0    5.0    ellipse    lightGray    1.5    black 
4    145242    0.0    42.0    5.0     ellipse    gray    1.5    black
5    127535    0.0    42.0    5.0    ellipse    lightGray    1.5    black
6    127319    0.0    42.0    5.0    rect    lightGray    1.5    black
7    129801    0.0    42.0    5.0    ellipse    darkGray    1.5    black
8    104456    0.0    42.0    5.0    ellipse    lightGray    1.5    black
FromId    ToId    StartTime    EndTime    ArcWeight    ArcWidth    ColorName
24    1    0.135    0.135    0.2    1.6    black
24    2    0.135    0.135    0.2    1.6    black
24    4    0.135    0.135    0.2    1.6    black
24    3    0.135    0.135    0.2    1.6    black
24    6    0.135    0.135    0.2    1.6    black
24    5    0.135    0.135    0.2    1.6    black
24    7    0.135    0.135    0.2    1.6    black
24    9    0.135    0.135    0.2    1.6    black
24    8    0.135    0.135    0.2    1.6    black
24    11    0.135    0.135    0.2    1.6    black
24    10    0.135    0.135    0.2    1.6    black
24    12    0.135    0.135    0.2    1.6    black
16    25    3.514    3.514    0.2    1.6    black
16    26    3.514    3.514    0.2    1.6    black
16    27    3.514    3.514    0.2    1.6    black
24    1    3.649    3.649    0.2    1.6    black
24    2    3.649    3.649    0.2    1.6    black
24    4    3.649    3.649    0.2    1.6    black
24    3    3.649    3.649    0.2    1.6    black
24    6    3.649    3.649    0.2    1.6    black
24    5    3.649    3.649    0.2    1.6    black
24    7    3.649    3.649    0.2    1.6    black
24    9    3.649    3.649    0.2    1.6    black

Column remapping

If your data are in a similar format but do not have the correct column names, SoNIA has the ability to remap the column names in the input file to the various functions using a dialog box. The left side of the window specifies which column name in the input file is matched to each network property. For example, if the input file contains a numeric column named "Wealth", when the file is parsed, the Unrecognized Column Names dialog will appear, and the wealth values can be used to control the node label size by setting column name for NODE_LABEL_SIZE to "Wealth".

Arbitrary node attributes

As of version 1.1.5, Additional columns in the .son input file can be assigned to the nodes as User Data attributes by selecting them in the right-hand "User Data" side of the Unrecognized Column Names dialog. These properties can be viewed in SoNIA using the [Node_inspector_panel], and mapped to colors using the [Color_mapping] functions.

Clusters

As of version 1.2.0, sonia can draw [Clusters] around groups of nodes that are defined in the input file. .

The column definitions for clusters must begin with "ClusterId". Valid cluster properties are:

  • ClusterId - Alphanumeric string used to identify the cluster so it can be refered to
  • NodeIds - Comma-delimited list of node ids corresponding to nodes that should be included in the cluster. Should be enclosed by {} for clarity.
  • ClusterWeight - recorded, but not currently used.
  • Parent - Id of cluster "above" (inclusive of a cluster's nodes an children) a cluster in a hierarchy. "{}" indicates null, or no parent. A cluster may have only one parent.
  • Children - Comma-delimited list of child clusters (below in the hierarchy). When rendered, clusters will include all nodes of children. List may be inclused by {,,,,} for clarity. "{}" (curly braces with no contents) indicates null or no children.
  • Color - a color name to shade the area inside the cluster
  • BorderColor - a color name to shade the border of the cluster (clusters are drawn with dashed borders)
  • StartTime - the starting time point when the cluster goes into effect.
  • EndTime - the ending time point for the cluster.

A basic example (from Samson) with no parent-child relations.

//...arc definitions above here
ClusterId    NodeIds    Color    BorderColor    StartTime    EndTime
loyal    {1,2,3,4,5,6,7}    blue    blue    0    3
turks    {8,9,10,11,12,13,14}    green    green    0    3
outcasts    {15,16,17,18}    orange    orange    0    3

Hierarchical relationships between clusters can be defined using "parent" and "children" columns. Each cluster may have 0 or 1 parents, and multiple children. Clusters will be drawn so that each cluster includes all of the nodes included in its children, depending recursively down the tree. A more complex example, using hierarchical cluster relationships:

//...arc definitions above here
ClusterId    ClusterWeight    Parent    Children    NodeIds    Color    BorderColor    StartTime    EndTime
Root    5.22    {}    100016,100005    {}    lightgray    blue    0    1
100016    2.83    Root    {100015,100008}    {}    lightgray    blue    0    1
100015    1.50    100016    100013,100014    {}    lightgray    blue    0    1
100013    0.00    100015    100012    16    lightgray    blue    0    1
100012    0.00    100013    100011    15    lightgray    blue    0    1
100011    0.00    100012    100010    13    lightgray    blue    0    1
100010    0.00    100011    100009    {12}    lightgray    blue    0    1
100009    0.00    100010    {}    3,8    lightgray    blue    0    1
100014    0.00    100015    {}    {14,18}    lightgray    blue    0    1
100008    0.00    100016    100007    10    lightgray    blue    0    1
100007    0.00    100008    100006    6    lightgray    blue    0    1
100006    0.00    100007    {}    2,5    lightgray    blue    0    1
100005    0.00    Root    100004    17    lightgray    blue    0    1
100004    0.00    100005    100003    11    lightgray    blue    0    1
100003    0.00    100004    100002    9    lightgray    blue    0    1
100002    0.00    100003    100001    7    lightgray    blue    0    1
100001    0.00    100002    {}    1,4    lightgray    blue    0    1
Root2    5.89    {}    {200016,200005}    {}    lightgray    blue    1    2
200016    2.67    Root2    {200015,200013}    {}    lightgray    blue    1    2
200015    0.67    200016    {200008,200014}    {}    lightgray    blue    1    2
200008    0.00    200015    200007    6    lightgray    blue    1    2
200007    0.00    200008    200006    5    lightgray    blue    1    2
200006    0.00    200007    {}    2,3    lightgray    blue    1    2
200014    0.00    200015    {}    {10,14}    lightgray    blue    1    2
200013    0.00    200016    200012    18    lightgray    blue    1    2
200012    0.00    200013    200011    15    lightgray    blue    1    2
200011    0.00    200012    200010    12    lightgray    blue    1    2
200010    0.00    200011    200009    11    lightgray    blue    1    2
200009    0.00    200010    {}    {7,8}    lightgray    blue    1    2
200005    0.00    Root2    200004    17    lightgray    blue    1    2
200004    0.00    200005    200003    16    lightgray    blue    1    2
200003    0.00    200004    200002    13    lightgray    blue    1    2
200002    0.00    200003    200001    9    lightgray    blue    1    2
200001    0.00    200002    {}    1,4    lightgray    blue    1    2
Root3    5.42    {}    300016,300015    {}    lightgray    blue    2    3
300016    3.45    Root3    {300013,300008}    {}    lightgray    blue    2    3
300013    0.00    300016    300012    18    lightgray    blue    2    3
300012    0.00    300013    300011    16    lightgray    blue    2    3
300011    0.00    300012    300010    12    lightgray    blue    2    3
300010    0.00    300011    300009    11    lightgray    blue    2    3
300009    0.00    300010    {}    6,10    lightgray    blue    2    3
300008    0.00    300009    300007    14    lightgray    blue    2    3
300007    0.00    300008    300006    8    lightgray    blue    2    3
300006    0.00    300007    300005    5    lightgray    blue    2    3
300005    0.00    300006    {}    2,3    lightgray    blue    2    3
300015    0.57    Root3    300004,300014    {}    lightgray    blue    2    3
300004    0.00    300015    300003    17    lightgray    blue    2    3
300003    0.00    300004    300002    9    lightgray    blue    2    3
300002    0.00    300003    300001    7    lightgray    blue    2    3
300001    0.00    300002    {}    1,4    lightgray    blue    2    3
300014    0.00    300015    {}    13,15    lightgray    blue    2    3

Note that parent and child relationships are defined using the cluster ids. Null sets (either for children or nodes) can be indicated using {}. Sonia will validate (and throw an error if not true) that all cluster ids refer to defined clusters, nodes ids refer to nodes, and that parent-child relations are mutually defined.


Related

Wiki: Clusters
Wiki: Color_mapping
Wiki: Features
Wiki: Node_inspector_panel
Wiki: Sonia_Data_Formats
Wiki: User_data_attributes
Wiki: Using_SonG_to_create_input_files