How incentives could solve privacy

AI is on the order of Fire in terms of inventions that change the course of human history. Who wields that power will chart our history.

AI holds the potential to create a world of unprecedented abundance, prosperity, and advancement—or, conversely, an authoritarian hellscape. The outcome will likely fall somewhere in the middle, determined by how we navigate the underlying dependency tree. This tree largely hinges on two key factors:

1. Who Trained the Model and the Data Used: The entity that develops the model and the dataset it’s trained on significantly influences the AI’s behavior, capabilities, and biases.

2. Model Access:

  • API Access vs. Open Source: Can you run inference on the model locally (open source), or do you need to hit an API?

  • API Governance: If API access is required, how strictly is access controlled, and who governs this access?

This fate will largely depend on the labs that have received hundreds of millions to billions in funding, and established companies (Google). This layer of the stack seems largely untouchable to a startup. In what areas can startups, with limited resources get a toehold in the market?

The Privacy Problem in AI

Although not yet a mainstream concern, privacy issues will inevitably arise in AI, much like they have in the search space. Privacy-focused search engines, like DuckDuckGo, have built a niche by removing metadata associated with queries. Similar concerns will emerge in AI regarding:

• Metadata: The IP address, network data, and other personally identifiable information (PII) associated with an inference request.

• PII within Inference Requests: Sensitive data such as names, locations, and financial information embedded within requests.

Open Source vs. API: Running open-source models locally is a straightforward solution, as data never leaves the device. However, once you use an interface connected to the internet (e.g., a browser or app) or hit an API, privacy issues re-emerge. Many projects now host open-source models and market them as private. However, this is misleading since the data is still exposed to the model host, merely replacing OpenAI or another entity as the data collector.

These labs like OpenAI and Anthropic, as well as companies like Google and Microsoft, are incentivized to collect as much data as possible, not less. They aren’t incentivized to go into this product space.

The LLM Privacy Dilemma: A Market Opportunity

The current landscape of LLM providers mirrors that of search engines:

• Google → OpenAI

• Bing → Anthropic

• DuckDuckGo → Yet to emerge for LLMs

While privacy hasn’t yet become a significant topic for LLMs, it will. A service that removes any trackers and PII from LLM requests, if its similar to the search space, could become invaluable. Just as DuckDuckGo has built a $300-$400 million per year business catering to a growing audience that values privacy, a first-mover privacy layer for LLM inference with feature parity to ChatGPT could dominate the market.

Although privacy is secondary to most users, a significant and growing population does prioritize it. Moreover, many large companies are hesitant to use APIs like OpenAI’s due to concerns over giving access to sensitive data that could be used to train other models. Here, starting with a digital twin, then eventually utilizing technologies like Fully Homomorphic Encryption (FHE) and Zero-Knowledge Proofs (ZKPs) can unlock the potential for private computation on sensitive data without ever decrypting it.

Protocol Incentives

Ethereum PoS secures the network by financially incentivizing validators to act honestly, as they risk losing their staked ETH through slashing penalties for malicious behaviors such as double-signing (proposing conflicting blocks) or surrounding attacks (manipulating block proposals to control block sequences), thus aligning their interests with the network's integrity. While the tech enables it, the incentives drive the censorship resistance and honesty surrounding the state of the chain. Why aren’t we doing this for other use cases like privacy?

Having a network in which users contribute a GPU/CPU that runs inference requests, with an incentive behind the honesty (reward) and dishonesty (slashing) around the inference request and preservation of privacy might actually help solve this privacy problem in AI.

Privacy Layer Mechanics

Metadata and PII Protection

• Proxy Network for Request Routing: Utilize a proxy network to route inference requests, with an option for multi-hop routing to enhance privacy.

• Incentive Mechanism for Node Operators: Implement an incentive mechanism to ensure node operators act honestly, coupled with penalties for non-compliance.

PII Management in Inference Requests

• Digital Twin: Deploy a small model on each user’s device to scan and replace various types of Personally Identifiable Information (PII) within inference requests. This can include cities, credit card numbers, birthdates, names, and more. The original data is replaced upon return.

• FHE: Once Fully Homomorphic Encryption (FHE) becomes available, it will allow inference to be run directly on encrypted data, eliminating the need for PII substitution. In the interim, a Version 1 solution could involve using a Secure Enclave Trusted Execution Environment (TEE) for running inferences securely.

Existing Partial Solutions

EZKL: A tool for creating and deploying Zero-Knowledge Proof (ZKP) models, providing privacy through cryptographic proof without revealing sensitive data. However, it primarily focuses on computation integrity rather than comprehensive privacy.

PrivateGPT: Allows local execution of large language models (LLMs) without accessing external APIs, ensuring that no data leaves the device. However, its scalability is limited by local hardware capabilities.

PySyft: A library enabling secure and private deep learning, leveraging techniques like federated learning and Secure Multi-Party Computation (SMPC). While robust in training privacy, it only partially addresses inference privacy concerns.

Relevant Products

FreedomGPT: An open-source AI model designed to run locally on users’ devices, ensuring data privacy by keeping everything under user control. However, like PrivateGPT, it is constrained by the user’s local hardware.

Mithril Security: Provides a platform for secure AI inference using Trusted Execution Environments (TEEs) and potentially FHE in the future. Mithril enhances privacy by combining local execution with secure external processing, addressing both metadata and PII concerns more comprehensively than existing solutions.

How to Architect a Privacy Protocol and Open Source It

The simplest way to build a privacy protocol is to start with an existing proxy network. By forking and modifying one for a random number of hops, you can create a specialized network that anonymizes metadata associated with inference requests by routing them through multiple encrypted relays.

Integrating privacy-preserving techniques such as homomorphic encryption or differential privacy into the digital twin handling of the inference requests would prevent PII from getting sent to the APIs within the inference request.

Benefits to users and corporations

AI is the frontier. The future of the models themselves will largely be decided by the labs and companies with tens of billions of dollars. The next layer that's up for grabs is making tools that helps users access to base models in ways that Labs aren’t incentivized to and build on top at the application layer. In the best case, a privacy layer will be a tool used by companies and users who want to interact with models without sharing sensitive data. In the worst, it will enable people to access these models who the powers that be don’t want to. We’ve seemingly forgot that privacy is a human right as it’s been eroded in the name of safety. Hopefully in this next era of computing we can bring it back.

Loading...
highlight
Collect this post to permanently own it.
Subscribe to 🌅🍵🍣 and never miss a post.
#privacy#ai