# Code Snippets

## Prerequisites

• Make sure you add them to the project libraries, if you are using an IDE (e.g., NetBeans).
• Optional: Make sure you have read Chapters 3 and 4 of my PhD thesis or any of my publications on n-gram graphs to get a basic

understanding of what n-gram graphs are about.

## Creating our first graph

We will use the DocumentNGramSymWinGraph class, which provides an implementation of the symmetric n-gram graph set.

We will create an n-gram graph with the default options. The (minimum n-gram rank, maximum n-gram rank, neighbourhood distance) parameters are set to (3,3,3) by default.

```import gr.demokritos.iit.jinsect.documentModel.representations.DocumentNGramSymWinGraph; public class FirstSteps { public static void main(String[] args) { DocumentNGramSymWinGraph dgFirstGraph = new DocumentNGramSymWinGraph(); } } ```

If we want to create an n-gram graph set, with varying n-gram sizes from 1 to 5 and a neighbourhood distance of 6, then we use the second constructor:

```DocumentNGramSymWinGraph dgFirstGraph = new DocumentNGramSymWinGraph(1,5,6); ```

Now, let's make the graph represent a "Hello graph!" text: ``` ```

``` dgFirstGraph.setDataString("Hello graph!"); ```

And, voilĂˇ, we have managed to create this graph:

I know it looks uglier than expected... but it is full of useful information.

## Creating graph from text file

There exists the method loadDataStringFromFile that will do our job.

``` try { dgFirstGraph.loadDataStringFromFile("path/to/file/myfile.txt"); } catch (FileNotFoundException ex) { ...; } catch (IOException ex) { ...;; } ```

## Comparing graphs

Let us create a second graph and try to compare it with our first one.

```DocumentNGramSymWinGraph dgSecondGraph = new DocumentNGramSymWinGraph(); dgSecondGraph.setDataString("Another hello graph!"); NGramCachedGraphComparator ngc = new NGramCachedGraphComparator(); GraphSimilarity gs = ngc.getSimilarityBetween(dgFirstGraph, dgSecondGraph); System.out.println(gs.toString()); ```

The result is something like:

`Value: 43.75% Containment: 87.50% Size: 50.00%`

In the above we see the NGramCachedGraphComparator class, which knows how to efficiently compare two n-gram graphs. It returns a GraphSimilarity object, which contains information on the Value Similarity, Containment (also termed Co-occurrence) Similarity and Size Similarity.

The Normalized Value Similarity can be calculated as follows:

```double NVS = (gs.SizeSimilarity == 0.0) ? 0.0 : gs.ValueSimilarity / gs.SizeSimilarity; ```

taking into account the case where at least one of the graphs is of zero size.

If we print out the NVS of our examples, we get a value around:

`0.8750000000000004`

## Creating a graph from a document set

Consider the case where we have a set of documents, each contained in a given file. We would like to create a representative graph of all the documents. To do this we will use the update operator of the n-gram graphs, which is implemented in the merge method of the DocumentNGramGraph class.

Let us say that we have the filenames of the documents in a List<String> variable. We can create the representative graph as follows.

```List<String> lsFiles = ...; // We keep count of how many merges have been performed int iMergeCnt = 0; for (String sCurFile: lsFiles) { // Create graph for current file DocumentNGramSymWinGraph dgCur = new DocumentNGramSymWinGraph(); try { dgCur.loadDataStringFromFile(sCurFile); if (dgRepresentative == null) // First merge: initialization dgRepresentative = dgCur; else // Consecutive merges, use the learning factor as indicated in the literature dgRepresentative.merge(dgCur, (1.0 / (1.0 + iMergeCnt))); iMergeCnt++; } catch (IOException ex) { Logger.getLogger(FirstSteps.class.getName()).log(Level.SEVERE, null, ex); return; } } ```

The resulting n-gram graph will be representative of the whole set of documents.

## The utils package: experience coded

You may want to take a look at the functions in the gr.demokritos.iit.jinsect.utils class. It is full of things that I have repeatedly needed to use and have coded there.

For example, the method graphToDot allows creating the DOT representation of an n-gram graph. This textual representation can be combined with the GraphViz library to create an image of a graph (as I have done above).

In our previous example, try calling:

`System.out.println(utils.graphToDot(dgFirstGraph.getGraphLevel(0), true));` to get the DOT format representation of the graph. The true parameter indicates that we expect the graph to be directed

The getGraphLevel(0) method asks for the first n-gram graph from the set of n-gram graphs in dgFirstGraph. In the case where dgFirstGraph was initialized to contain graphs of ranks between 1 to 5, it would return the 1-gram graph.