This article focuses on providing technical information about how we use automated analytics. For how to set it up - please visit this article and for tips on how to use it - please visit this article.
This keyword visualization is a network visualization of concepts or ideas discussed in your project. It consists of the important and recurring keywords (excluding words such as articles & prepositions) automatically detected in your project's inputs. This visualization allows you to approach the analysis in "bottom-up" and less biased than setting categories before analysis.
Here is how it works:
How is the size of the keywords determined in the visualization?
Size represents the simple frequency of the keyword occurrence within your project. The more frequently the particular keyword or its variation, i.e., tree and trees appear in the project, the bigger the keyword appears in the visualization.
How are keywords selected and clustered together in color?
We first determine the keywords that appear in the same input. If the keywords appear together in the same input, we consider it a "connection" between them. Then we use the Louvain method, a mathematical model in which the number of connections between two "keywords" are maximized and clustered into an optimum number of color groupings. Optimizing the number of connections theoretically results in the best possible clusters of the keywords.
How is the proximity or the connection line between the keywords are considered?
The stronger the connection between the two keywords, i.e., the amount of co-occurrence, the closer the two keywords appear and the darker the line connections between them. Then we use the forced-directed layout algorithm, which positions the network in a two-dimensional, easy-to-read way.
Tip: In some instances, you may see a cluster of keywords appearing by itself apart from the rest of the clusters. This signals that this issue is discussed separately and not relevant to the other ideas.
Understanding the automated analytics
We use Natural Language Processing (NLP) to give you automated suggestions of inputs that are relevant to your Tags.
Here is how it works:
How do you analyze which inputs are relevant to the specific Tag?
We use a Natural Language Inference (NLI) model based on BERT, which determines the meaning and context of the Tag and determines whether the input provided is semantically related to the Tag. If the model determines a relevant relation between the Tag and the input with high confidence, we suggest the Tag.
For example, the language model would determine the word "Nature" is highly relevant to the input "I wish there are more trees in the city", and suggest this input for the Tag.
How many languages can Insights work on, and how accurate is it?
Our Insights feature currently works in 16 languages, including; English, Dutch, German, French, Spanish, Portuguese, and Danish. We are constantly working on adding support for more languages. It is considered one of the most accurate models and can detect the contexts of the keywords (e.g., the relation between trees and green). However, as with any computational model, there is a certain degree of false positives & negatives and requires human supervision.