Disambiguation file

Tomaz Solc

Disambiguation file contains information about article disambiguations in a terse tab-separated format.

Each line contains one record (terminated by an ASCII line feed character (hex 0x10). Each record describes one bullet point on a disambiguation page (i.e. one attempt at disambiguation of the term in the title of the disambiguation page).

Each record contains 1 + 2N fields, separated by an ASCII tab character (hex 09), where N >= 0 and is equal to the number of internal links that appear behind the bullet point.

n field type description
1 Disambiguation page ID integer page ID of the disambiguation page where the bullet point appears.
1 + 2n + 1 Target page ID integer / "undef" page ID of disambiguating link target ("undef" if the link points to a page that doesn't exist - links appears read in the browser)
1 + 2n + 2 Anchor text unicode string actual anchor text of the disambiguating link.

Parsers of this file may depend on the fact that all records for a single disambiguation page appear consecutively in the file.

Example:

    661     14699765        Arg (mathematics)       5826    complex number  5826    polar coordinates
    661     1138322         argument principle
    661     604277          verb argument
    661

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks