UJJI AI Privacy Policy
Last updated: 10th September 2024
At UJJI, we've always approached new technology with caution. Since launching our AI in 2020, we've developed a robust process and a skilled team. This past year, advancements in large language models (LLMs) have impressed us, especially their ability to address user issues like information overload. Ninety percent of AI adopters reported increased productivity. Launching AI products requires meeting UJJI’s strict data stewardship standards. Our goal is to build trusted and effective AI. The generative model industry is young and research-focused, with few enterprise-grade security and privacy patterns to follow.
We built UJJI AI from principles, focusing on maintaining our security, compliance, and privacy standards. These principles clarified our architecture design, though it was sometimes challenging. We’ll walk through how these principles shaped UJJI AI today.
Customer data
Our most important decision was ensuring we could use a top-tier model while keeping customer data within UJJI-controlled VPCs. In the generative model industry, most customers used hosted services directly, with few alternatives.
UJJI, and our customers, have high expectations around data ownership.
Internally, we process this content on our servers. The only external action is sending prompts with context to the GPT API. This workflow starts by uploading a file. While we send data externally, we don't send whole documents. Instead, we store small chunks of the document's content and send only essential parts. Even if something goes wrong, we never use all the data, only the necessary context.
Based on our needs, we retrieve the most relevant parts from the database and send this small fragment to GPT. GPT processes this and provides a response.
We do not train large language models (LLMs) on customer data
We used off-the-shelf models instead of training new ones due to privacy concerns. Traditional ML models ranked search results without data leaks, unlike generative AI models trained with UJJI’s customer data.
We chose Retrieval Augmented Generation (RAG), which includes all context in each request to ensure the model doesn't retain data. For instance, summarizing a channel involves sending a prompt with messages and instructions. RAG’s statelessness ensures privacy and bases results on your company’s knowledge, not the public Internet.
RAG limits models to those with large context windows, meaning slower processing. Summarizing all messages in a channel requires a lot of data, making it challenging to find a top-tier model with large context windows and low latency.
RAG is becoming faster and more efficient with larger context windows and better data synthesis, ensuring quality results and protecting customer data. We do not train large language models (LLMs) on customer data.
UJJI AI only operates on data that the user can already see
One of our core tenets is that UJJI AI only accesses data visible to the requesting user. UJJI AI’s search feature, for example, will never show results that standard search would not. Summaries will never include content the user couldn't otherwise see while reading channels.
We ensure this by using the requesting user’s Access Control List (ACLs) when fetching data to summarize or search and leveraging our existing libraries to fetch the data displayed in channels or on the search results page.
This wasn’t technically difficult, but it required a deliberate choice; the best way to guarantee this was to build on and reuse UJJI’s core feature sets while adding some AI enhancements at the end.
Only the user who invokes UJJI AI can see the AI-generated output. This builds confidence that UJJI is your trusted AI partner: Only the data you can see goes in, and only you can see the output.
UJJI AI upholds all of UJJI’s enterprise-grade security and compliance requirements
UJJI AI integrates all of our compliance and security offerings, storing only the necessary data for the required duration. Often, no data is stored; outputs like conversation summaries and search answers are not saved on disk.
When data storage is necessary, we use UJJI’s existing compliance infrastructure and build new support where needed. Our infrastructure includes Encryption Key Management and International Data Residency. We've added support to ensure derived content, like summaries, are aware of their source messages. For instance, if a message is removed due to Data Loss Protection (DLP), related summaries are invalidated. This ensures DLP and other controls are effective on both UJJI’s message content and UJJI AI outputs.
This overview highlights our commitment to security and privacy, showing how seriously we protect our customers' data.