CodeXCavator - code indexing and search Wiki

Source code indexing and full text search tool based on Lucene.

Status: Alpha

Brought to you by: dust79

CodeXCavator Indexer

Labels: Indexer (2) Index configuration (4) Index creation (1) Index source directory (1) Index input files (1) File filters (1)

Authors:

CodeXCavator - Indexer

CodeXCavator - Indexer

![Indexer](https://sourceforge.net/p/codexcavator/wiki/Images/attachment/Indexer.png)

The indexer tool is responsible for creating a full text search index for a set of input source code or plain text files. The set of input files to be added to the index is defined through an XML index configuration file.

Creating an index configuration file

You can create an index configuration file simply by using a text editor. Just create a file with an arbitrary name and the .xml extension ( i.e. index.xml ).

Specifying the index target directory

You have to put at least an Index tag into the configuration file as the root element, with a FileSources tag below it.

The Path attribute can be used to specify the target directory in which the index should be created.

Example: index.xml

<Index Path="D:\Temp\TestIndex">
  <FileSources>
  </FileSources>
</Index>

This defines an index, which should be created in the directory D:\Temp\TestIndex, with no file sources.

Specifying a source directory containing multiple input files

In order to specify a directory containing input files, which should be included into the index, you can use the Directory tag.

The Path attribute will define, from which directory the input files should be read.
The Include attribute can be used to specify a list of wildcard patterns. Files, whose names match one of the patterns, will be included into the index. The elements of the list are separated by the pipe | character.
The Exclude attribute can be used to specify a list of wildcard patterns. Files, whose names match one of the patterns, will be excluded from the index. The elements of the list are separated by the pipe | character.
The Recursive attribute can be used to specify, whether the directory should be searched recursively ( i.e. whether sub directories should be searched too ), or not. Valid values are "true" and "false".

You can put multiple Directory elements below the FileSources element, in order to specify multiple different input directories, whose contents should be included into the index.

Example: index.xml

<Index Path="D:\Temp\TestIndex">
  <FileSources>
    <Directory Path="D:\Programming\C#\Projects" Recursive="true" Include="*.cs|*.xml"/>
  </FileSources>
</Index>

In this example an index will be created in the directory D:\Temp\TestIndex. The input files are all taken from the D:\Programming\C#\Projects folder and it's sub directories. Only files with the .cs and the .xml extension will be included.

Specifying a single specific input file

In order to specify a single specific file, which should be included into the index, you can use the File tag.

The Path attribute will define the path of the file, which should be included into the index.

You can put multiple File elements below the FileSources element, in order to specify multiple different and specific input files, which should be included into the index.

Example: index.xml

<Index Path="D:\Temp\TestIndex">
  <FileSources>
    <File Path="D:\Programming\C#\Projects\CodeXCavator\CodeXCavator.sln"/>
  </FileSources>
</Index>

In this example an index will be created in the directory D:\Temp\TestIndex. The only input file is D:\Programming\C#\Projects\CodeXCavator\CodeXCavator.sln.

Its possible to mix multiple File and Directory elements below the FileSources element.

Specifying an arbitrary file catalogue enumerator

In order to specify an arbitrary file catalogue enumerator as an input file source, you can use the Catalogue tag.

A file catalogue enumerator is responsible for enumerating the contents of a file catalogue. A file catalogue might simply be a file directory or it might be a project file containing source code file references.

The Path attribute will define the path whose contents should be enumerated. If only the path is specified the type of the file catalogue enumerator to be used to enumerate input files will be determined from the path.
The Type attribute will define, which type of file catalogue enumerator should be used to enumerate file catalogue contents.
The Recursive attribute can be used to specify, whether the catalogue should be searched recursively ( i.e. whether sub catalogues should be searched too ), or not. Valid values are "true" and "false".
You can use the Path and the Type attribute in combination. In this case the file catalogue enumerator specified by the Type attribute will be used to enumerate the contents of the path given by the Path attribute.

Example: index.xml

<Index Path="D:\Temp\TestIndex">
  <FileSources>
    <Catalogue Type="DirectoryFileEnumerator" 
               Path="D:\Programming\C#\Projects\CodeXCavator" 
               Recursive="True"/>
  </FileSources>
</Index>

In this example the DirectoryFileEnumerator file catalogue enumerator is used in order to recursively enumerate the contents of the D:\Programming\C#\Projects\CodeXCavator

Note that only registered file catalogue enumerators can be used or those which were loaded through the plugin system.

Configuration of a file catalogue enumerator

Some of the used file catalogue enumerators might be configurable. In this case the configuration is provided by placing a Configuration element below the Catalogue element.

Example: index.xml

<Index Path="D:\Temp\TestIndex">
  <FileSources>
    <Catalogue Type="DirectoryFileEnumerator" 
               Path="D:\Programming\C#\Projects\CodeXCavator" 
               Recursive="True">
        <Configuration>
            <!-- Additional configuration elements go here -->
        </Configuration>
    </Catalogue>
  </FileSources>
</Index>

What kind of sub elements or attributes can be used with the Configuration element depends on the specified file catalogue enumerator.

[Built-in File Catalogue Enumerators]

Specifying an arbitrary file enumerator

In order to specify an arbitrary file enumerator as an input file source, you can use the Source tag.

A file enumerator is responsible for enumerating files from an arbitrary source.

The Type attribute will define, which type of file enumerator should be used to enumerate files

Example: index.xml

<Index Path="D:\Temp\TestIndex">
  <FileSources>
    <Source Type="FixedFileEnumerator"/>
  </FileSources>
</Index>

In this example the FixedFileEnumerator file enumerator is used in order to enumerate a fixed set of files.

Note that only registered file enumerators can be used or those which were loaded through the plugin system.

Configuration of a file enumerator

Some of the used file enumerators might be configurable. In this case the configuration is provided by placing a Configuration element below the Source element.

Example: index.xml

<Index Path="D:\Temp\TestIndex">
  <FileSources>
    <Source Type="FixedFileEnumerator">
        <Configuration>
            <!-- Additional configuration elements go here -->
            <Files>
                <File>C:\Data\Programming\Projects\C#\CodeXCavator\CodeXCavator.sln</File>
            </Files>
        </Configuration>
    </Source>
  </FileSources>
</Index>

In this example the FixedFileEnumerator file enumerator is used in order to enumerate just the file C:\Data\Programming\Projects\C#\CodeXCavator\CodeXCavator.sln.

What kind of sub elements or attributes can be used with the Configuration element depends on the specified file enumerator.

[Built-in File Enumerators]

Specifying file filters

It is possible to filter the source files generated by a Directory, a Catalogue or a Source element. In order to do this, you can specify one or more Filter elements below one of the elements mentioned before.

The Type attribute will define, which type of file filter should be used to filter files.
The Mode attribute will define, whether a filter should be inclusive, which means that files meeting certain criteria should be passed through the filter, or exclusive, which means that only files, which do not meet the criteria should be passed through the filter. Valid values are "true" and "false". Not all filters support the Mode attribute. Only filters, which derive from the IInvertibleFileFilter interface support it.
If multiple Filter elements are specified, the results of each filter are passed to the next filter forming a filter chain.
Some filters allow to specify sub filters, by adding sub Filter elements.

Example: index.xml

<Index Path="C:\Data\Indexes\TestIndex1">
  <FileSources>
    <Directory Path="C:\Data\Programming\Projects\C#" Recursive="true">
        <Filter Type="WildCardFileFilter" Mode="Inclusive"/>
    </Directory>
  </FileSources>
</Index>

In this example the WildCardFileFilter is used in order to filter the files enumerated by the parent Directory element.

Note that only registered file filters can be used or those which were loaded through the plugin system.

Configuration of a file filter

Some of the used file filters might be configurable. In this case the configuration is provided by placing a Configuration element below the Filter element.

Example: index.xml

<Index Path="C:\Data\Indexes\TestIndex1">
  <FileSources>
    <Directory Path="C:\Data\Programming\Projects\C#" Recursive="True">
      <Filter Type="WildCardFileFilter" Mode="Inclusive">
        <Configuration>
          <!-- Additional configuration elements go here -->
          <Patterns>
            <Pattern>*.cs</Pattern>
          </Patterns>
        </Configuration>
      </Filter>
    </Directory>
  </FileSources>
</Index>

In this example the WildCardFileFilter file file is used in order to filter the files enumerated recursively by the parent Directory element located under C:\Data\Programming\Projects\C#. It is configured such, that only files are passed, which match the *.cs wild card pattern.

What kind of sub elements or attributes can be used with the Configuration element depends on the specified file filter.

[Built-in File Filters]

Relative paths and environment variables

You can use relative paths for directory source paths, index target paths and fixed file names. They will be treated as being relative to the current working directory of the application. This is the .NET default behaviour of handling relative paths.
Thus be careful when using indexer tool and searcher tool with different working directories and relative paths. To prevent problems the indexer will resolve the relative paths at indexing time into absolute paths and will store those aboslute paths in the index.

Furthermore it's now also possible to specify environment variables inside directory source paths, index target paths and fixed file names. They will also be resolved at indexing time. I.e. the index will contain resolved paths.

Building one or multiple indexes

In order to build or rebuild one or multiple indexes the CodeXCavator.Indexer executable has to be launched from the command line or a batch file and the paths to the configuration files of the indexes, which should be created have to be passed as command line arguments to the tool.

Example:

 CodeXCavator.Indexer index.xml

This will build an index based on the configuration contained in the index.xml file.

You can specify multiple indexes.

 CodeXCavator.Indexer index1.xml index2.xml index3.xml

This will build all three indexes index1.xml, index2.xml and index3.xml. Each index is created using a separate thread, thus creating multiple indexes by a single call will benefit from multi-core machines.

The indexes are always rebuilt from scratch.

The indexer tool will first determine the set of input files. This is indicated by 4 spinning circles right to the "Number of input files:" caption. In the next step it will recreate the index. The indexing progress is indicated using a progress bar. After the index has been created the indexer will determine the total size of the index. This is indicated again with 4 spinning circles right to the "Index size:" caption. When the index creation is finished the total size of the index is displayed.

Note that you cannot close the tool while index creation is in progress.

Index directories and index list files

Instead of specifying an index configuration file it's also possible to specify a directory instead. The indexing tool will then search the directory for files with the .xml extension and interpret them as index configuration files.

As an additional alternative one or multiple index configuration list files can be passed to the indexing tool.
An index configuration list file is simply a text file, where each line contains the path to an index configuration file. Empty lines will be skipped and lines starting with a dash or a single quote character will be treated as comments and also ignored. You can specify relative or absolute paths. Relative paths are treated as being relative to the list file not the current working directory.

You can mix index configuration files index configuration directories and index configuration lists arbitrarily.

Indexing errors and reporting

When errors occurr during indexing, the indexer tool will display a blinking error sign right to the statistics of the index on which errors occurred. Hover with the mouse over the blinking sign in order to display a tooltip containing information about the error, which occurred.

You can also left click the error sign in order to dislay a popup window containing an error log. The error log can be copied to the clipboard by using the corresponding buttons in the popup window.

When running in silent mode, the error messages will just be written to the standard output.

Additional switches

The indexer tool supports the following additonal command line switches:

Switch:	Description:
-autoclose	Lets the indexer tool close automatically after indexing has finished.
-silent	Surpresses all message and dialog boxes and runs the indexer without any user interface. All errors and logging is written to stderr and stdout.
-noprogress	Disables progress evaluation, i.e. the number of files to process is not computed for an index, and indexing starts immediately. However you won't get a proper progressbar in this case.
-nomultithreading	Disables multithreading during indexing. i.e. indexes are created sequentially and fully on the main/UI thread. This might be useful, when you are using a file storage provider, file catalogue enumerator or file enumerator implementations, which do not support multi threading.
-maxworkers=<number cpu=""></number>	Limits the total number of index workers. If you specify a number, the total number of simultanously running index workers is limited to the given value. If you specify CPU as the value, the number of workers is limited to the number of CPU cores. If not specified at all, a worker is created and launched for each index to be created by the indexer. Use this switch if you run into memory issues during indexing.

Code tagging

Source code can be tagged, by using [Code tags]. These tags are indexed separately and can also be searched separately.

[Home][CodeXCavator Finder][Built-in File Catalogue Enumerators][Built-in File Enumerators][Built-in File Filters]

Wiki: Built-in File Catalogue Enumerators
Wiki: Built-in File Enumerators
Wiki: Built-in File Filters
Wiki: Code tags
Wiki: CodeXCavator Finder
Wiki: Home

CodeXCavator - code indexing and search Wiki

Source code indexing and full text search tool based on Lucene.

CodeXCavator Indexer

CodeXCavator - Indexer

Creating an index configuration file

Specifying the index target directory

Specifying a source directory containing multiple input files

Specifying a single specific input file

Specifying an arbitrary file catalogue enumerator

Configuration of a file catalogue enumerator

Specifying an arbitrary file enumerator

Configuration of a file enumerator

Specifying file filters

Configuration of a file filter

Relative paths and environment variables

Building one or multiple indexes

Index directories and index list files

Indexing errors and reporting

Additional switches

Code tagging

Related