modnlp-plugins / Tickets / #4 Mosaic: visualising sparseness of word distribution

#4 Mosaic: visualising sparseness of word distribution

Milestone: 1.0

Status: open

Owner: Shane

Labels: Feature (4) feature request (2)

Updated: 2024-03-22

Created: 2017-03-04

Creator: S Luz

Private: No

Improvements to mosaic; visualising sparseness of token distribution.

Approach considered:

a) Colour the mosaic tiles so that the luminance (in the HSL
colour model) of a tile is directly proportional to its
sparseness (i.e. the more texts a word occurs in in the
concordance, the lighter the background of its tile in the
mosaic; see attached mock-up). A simple sparseness metric
might be, for instance:

      s(t) = -log 1 - P(t|d)
           = log (D - D_t)/D

where P(t|d) is the probability that word t occurs in text
(document) d, D is the total number of documents, and D_t
the number of documents in which t occurs. (We will need to
normalise over all words s(t_i) on a column of the mosaic to
map these s(t) to a luminance value).

We could also consider using, as Shane mentioned, a function
of TFxIDF over all documents t occurs in as a possible metric,
where say,

      tfidf(t,d) = (#(t,d)/size(d)) * log D/D_t

where #(t,d) is the number of occurrences of t in d, size(d)
is the total number of tokens in d, and D and D_t are as
above.

The metric per word t on a mosaic tile would be a sum over all
texts d:

      t(t) = \sum_d tfidf(t,d)

and then normalised over all words in a mosaic column.

1 Attachments

sparsenessmosaic.pdf

Discussion

S Luz - 2017-03-04

labels: --> Feature
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

S Luz - 2017-11-22

assigned_to: Shane
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Shane - 2024-03-22

labels: Feature --> Feature, feature request
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Mosaic: visualising sparseness of word distribution

External plugins for modnlp/teccli

Milestone

Searches

Help

#4 Mosaic: visualising sparseness of word distribution

Discussion