Are we using any external AI providers? How are they processing the data?

The AI Sensemaking feature is making use of 2 external providers for AI purposes.

Microsoft Azure (OpenAI)

The Microsoft Azure APIs are used for the summarization, question asking and some auto tagging features within sensemaking. These features make use of the GPT-4-Turbo Large Language Model (LLMs) APIs.

Microsoft terms specify, among others, to

Only process data for the purpose of providing and supporting the service
Microsoft hosts the OpenAI models in Microsoft’s Azure environment and the Service does NOT interact with any services operated by OpenAI (e.g. ChatGPT, or the OpenAI API).
Microsoft processes the data in the region specified by the customer. In our case, this is in one of our 4 regions (Europe, UK, US and Canada)
Only access data for the purpose of abuse monitoring

More information on data privacy in Microsoft Azure’s can be found here.

NLPCloud

The NLPCloud APIs are used to power some of the auto-tagging features within sensemaking.

NLPCloud is a company based in France. NLP Cloud is HIPAA / GDPR / CCPA compliant, and working on a SOC 2 certification.

NLPCloud commits to

Not storing any data that is sent to their API
Not selling or renting any information to marketers or third parties
Maintaining strict administrative, technical and physical procedures to protect information stored in their servers

Their privacy policy can be found here. More information on their security measures can be found here.

What data is sent to these subprocessors and does it include PII?

Both subprocessors only receive textual data that CitizenLab end-users (residents) wrote in their contribution (idea or survey response) to a project on the platform. This happens when an admin visits the survey results page (in case of surveys), or actively chooses to start an AI analysis (in case of ideation).

We are not sending any information about the user (email, username, picture, demographic information, …) to these subprocessors, and as such are not structurally sending any PII. Sometimes it can happen that users mention PII within their contribution, which will send such data to the subprocessors.

Are these subprocessors using the data to train and improve their models?

No, both subprocessors explicitly state that they are not using the data for such purpose.

Where is Microsoft processing the data?

Microsoft allows us to specify the region of processing. We currently make use of 4 regions, where our customers use the region most local to them. The regions are:

Europe (France)
UK
US
Canada

Why are the answers of AI not in my language?

Our AI feature tries to answer as much as possible in the language of the input it’s receiving. Exceptionally, in cases where there are mixed languages, there are very few inputs, or when the AI gets it wrong, it might generate answers in the wrong language. In such cases, it mostly suffices to retry.

How accurate are the generated summaries?

Summarizing inherently means discarding information, while trying to retain the most common and important elements. Current technology is good at interpreting common elements, but deciding what is most important requires context, domain knowledge, and is somewhat subjective.

To draw correct conclusions, It’s crucial to have a human in the loop, and offer maximal transparency on how the AI draws its conclusions.

Our AI analysis has been conceived, from the ground up, to let you use AI responsibly, and offer maximum transparency and control to the human, while having the machine at its side for highly efficient assistance. To that end, we have built in several mechanisms:

Before and after you generate an AI summary, there is an indication of expected accuracy, expressed as a percentage
The summary contains in-line references to resident inputs that where used to base its conclusions on, which can be opened with a single clickAt all times, all inputs contributing to the project are easily browseable and readable, exposing the user constantly to the raw, direct inputs for control and interpretation
The tagging feature lets you easily segment the received inputs in smaller groups, which let you summarize each group separately. This makes it easier to keep the overview, and boosts the accuracy of the summaries.
Auto-tagging helps you do the tagging more efficiently. There are a variety of tagging methods to choose from, which offer more or less control. At all times, the user can override the tags, or decide to do the tagging manually for maximal control.
Our software if source-available and the source code can be found on Github. This ultimately allows for the deepest level of understanding on how the tool behaves, and how it draws its conclusions.

In summary, while the accuracy of current state of the art LLMs is very impressive, there is no such thing as 100% accuracy. We have chosen to build a human-centric interface, where the machine is there to assist while you retain maximal transparency and control.

What languages will the AI Sensemaking tool offer / support?

All our core languages (with the exception of Greenlandic)

Using the Insights tool to analyze text-based input

Setting up and moderating an online workshop + 📹

AI Analysis