Skip to main content

E2E Encrypted Collaborative Document Editing

Client: ChainSafe Files (files.chainsafe.io)


Overview

The Files team wants to look into E2E encrypted real-time document editing and collaboration that they can use with their implementation. The contents of the document should be private to the collaborators. Our first step would be to look for tools that already implement at least a basic level of the required functionality on top of which we can build our solution.

Existing tools

  1. CryptPad - open-source, exactly what we are looking for and can be used with Files because of the AGPL license. Has a central server for authentication and key exchange.
  2. Skiff - A startup doing exactly what we want to do with Files but is not decentralized, closed source but a tool of its own. Cannot use it in Files.
  3. OnlyOffice - Has a feature called Private rooms for E2E encrypted real-time document editing. The community edition can be deployed on private cloud, NextCloud has built-in support for it. The downside that I can see right now is that it specifies that it only works with the desktop app. Not sure if this meets the Files' team requirements.

CryptPad integration into Files: Specification

CryptPad is a suite of office applications and cloud drive where everything is end-to-end encrypted. The official instance is available here and the documentation is here. It's open-source and available to use under the AGPL license. To use Cryptpad directly, it is possible to host an instance ourselves either in development mode or production mode.

We want to integrate this into Files' existing architecture. For this, we will need to pull out the parts which we can use in conjunction with Files.

First, Cryptpad uses a different encryption algorithm, namely xsalsa-poly1305, than Files (AES). The details of which technique is used for what purpose are available here. Because Files has its way of encrypting and decrypting documents stored on IPFS, we would want to use that scheme. This is because, in the future, if we decide to change the encryption algorithm, we will need to change it in one place and it should be reflected in the entire system. Therefore, we do not need to use the encryption mechanism from CryptPad. This implies, the Files UI will have to take care of the encryption/decryption logic for the document collaboration as well.

Second, Files has its way of managing users and we are assuming here that the same would be used for the document collaboration feature as well. So we would need to remove that part from CryptPad as well.

Third, as discussed with Michael, we would be building our UI using DraftJS and hence would also not need the UI components from CryptPad.

Fourth, storage is handled a bit differently in CryptPad. Instead of using a traditional database, they are using the file system explained here. As we would be using IPFS to store the encrypted document, we can also ignore the database part of the CryptPad.

In conclusion, the stripped-down version of the CryptPad that we need is the one that can work with encrypted content for managing the latest version of the document when multiple users are collaborating on the same document. In the first iteration, we will try to keep it simple and just use the real-time collaborative editor algorithm and add our implementation logic on top of it.

Overview of CryptPad's architecture

CryptPad's code is split into 5 distinct levels, 2 server side and 3 client side.

  • Server side

    • The "server" which contains the code launched in the main process. It manages the websocket connections and all calls to the server go through this level.
    • The "workers" that manage all database connections and scripts that require more CPU resources. The main "server" calls them when it receives certain commands from users. They are launched in separate sub-processes in order to be able to make the most of the available CPU cores.
  • Client side

    • The base level, called "outer" in the code. This level is loaded with the "unsafe" URL (the one visible in the browser address bar) because it has access to sensitive data, including user account encryption keys.
    • The iframe containing the user interface, called "inner" in the code, is launched as a daughter of "outer" and uses the "safe" URL. This iframe represents the entire screen visible per the users. No interface element is outside of the iframe. It has only access to the data that needs to be displayed on the screen.
    • The upper level, called "worker", which manages the connection to the server and keeps all the user account data in memory. This level is loaded in a SharedWorker when the browser supports it (Firefox, Chrome, Edge) with the "unsafe" URL, which means that all the browser's tabs loaded on this CryptPad instance will have access to the same worker. It allows us to load the account data only once for all open tabs and to use only one Websocket connection.

To give an idea of how CryptPad is structured, please look at the following diagram:

As mentioned in the previous section, all we need is the real-time collaborative editor algorithm that can work with encrypted content. CryptPad has neatly pulled out the algorithm and called it ChainPad. This is exactly what we need.

What is ChainPad?

ChainPad Algorithm is a Realtime Collaborative Editor algorithm based on Nakamoto Blockchains. ChainPad is the basis of CryptPad's "real-time engine". Starting from the same fixed starting point (which can be empty), several users can open the same document. ChainPad works exclusively with documents in "text" format in theory, but solutions exist in practice for other JavaScript data formats.

Each user initializes a local ChainPad "object" and fills it with the patch history of the database from a given starting point. The patches contain a set of simple operations to perform on the document (e.g. "remove X characters and add 'abc' at the Y position") as well as a "parent". Storing a parent identifier in the patch allows for the document to be re-built with the patches in the correct order. Applying the full set of patches from a starting point (usually an empty string) produces the latest version of the document.

Each patch references the SHA-256 Hash of the previous state of the document. The availability of these hashes makes it possible for clients to independently verify the authenticity of a patch, and validate that a patch can successfully be applied to a document at a particular point within its history. Patches that are determined to be invalid are rejected.

At present, Chainpad can only be used for determining consensus on text documents. Its patches consist of Operations which consist of:

  1. an offset from the start of the document where changes are to be applied
  2. a number of characters to remove
  3. a string of characters to insert

Checkpoints

In order to avoid having to synchronize the complete history of the document since its creation each time a collaborative document is loaded, ChainPad uses a system of checkpoints. For every 50 patches stored in the database, the user creating the patch will in fact create a special patch called a checkpoint. Such a patch consists in a single operation that deletes the entire document and re-inserts it at the same time. Checkpoints have the particularity that they can be used as starting points for the system. Checkpoints also have a special marking added after encryption that allows the server to identify them as such.

When a user loads an existing document and asks for its history, the server will only send all patches starting from the penultimate checkpoint (in theory the last checkpoint is sufficient, but the penultimate one makes it easier to solve some problems). This system saves considerable loading time for documents that are heavily used over long periods of time. It also allows document owners to delete the old history, only keeping the messages from the last two checkpoints in the database.

Setting up the ChainPad server

A list of all helper repositories is available here.

ChainPad server is based on the Netflux specification for enabling p2p or client-server connectivity using websockets for a WebGroup (a group trying to interact with each other for a specific purpose e.g. editing a text document together). The ChainPad server is different from the ChainPad repository (used for client side).

As we already have a server instance running for maintaining the Files API, we suggest using the client-server architecture here rather than p2p. To make things easier and have a reference, we can setup ChainPad server in a similar manner as CryptPad is doing. The following steps are a stripped down version and a rough idea of how things will connect. The level of details should be enough to start the implementation, but ofcourse, do expect that not everything is covered. It focuses mostly on the chainpad-server repo, and not on the general dependencies. That would, I assume, evolve during the implementation phase. For reference, the code that I referred to is here. Particularly, api.js and historyKeeper.js.

3 different identifiers are required to differentiate users, channels and documents. Therefore, a userId, channelId and documentId should be used. Each channel should correspond to exactly one document opened for collaboration. This is dependent on user and document management that Files already has and will leave it up to the team to identify how to set it up. Please note that userId and documentId are persistent whereas channelId is not.

Step 1: Create an Express or any other framework based server instance

I think this can be very generic or depending on how Files wants to structure it with middleware and configs.

Step 2: Create a new chainpad-server instance and bind event handlers with historyKeeper

chainpad-server uses Netflux for creating websocket connections.

historyKeeper is a fake user which would be managed by the server for data storage. For every channel created, a special user historyKeeper joins it by default. When a user joins a room (i.e. they open a document), their browser will send a direct message to the history keeper member to ask for the history of the document. History keeper will send the part of the history necessary to rebuild the last version of the document. Once synchronized, the user will receive all new changes in real time and will be able to send his own changes. The code for history keeper is available here.

const WebSocketServer = require('ws').Server //Websocket dependency
const NetfluxSrv = require('chainpad-server')

require('./historyKeeper.js').create(Env, function (err, historyKeeper) { //callback function

// spawn ws server and attach netflux event handlers
NetfluxSrv.create(new WebSocketServer({ server: Env.httpServer})) // Env is to load environment config
.on('channelClose', historyKeeper.channelClose)
.on('channelMessage', historyKeeper.channelMessage)
.on('channelOpen', historyKeeper.channelOpen)
.on('sessionClose', historyKeeper.sessionClose)
.on('error', function (error, label, info) {
// error handling
})
.register(historyKeeper.id, historyKeeper.directMessage);
});

Step 3: Use the event handlers to watch for incoming messages on web socket

Once the server is running and Netflux's event handlers are bound with historyKeeper, the workers should take care of keeping the updated version of the document. The datastore and workers are available in/lib folder.

Setting up the ChainPad client

First, a network connection has to be established to the websocket on the server and then an instance of ChainPad should be opened up to handle Patches.

For opening the network connection, netflux-websocket library can be used which is also used by CryptPad. An example connection would look like the following:

let channelId = "123";
Netflux.connect('ws://lcoalhost:3000/cryptpad_websocket').then(function (network) {}
// on success
network.join(channelId).then(function (channel) {
// on success

// listen for new messages
channel.on('message', function (message, senderNetfluxId) {
console.log('Message received:' + message);
});
// send a message
channel.bcast("Hello world!");
}, function (error) {
// on error
});

}, function (error) {
// on error
});

This "network" contains the list of channels joined by the user, as well as the list of members present in each channel. It allows us to perform all the operations allowed by the protocol:

  • Join a channel : (Promise) network.join(channelId) (provides a channel object)
  • Send a private message: (Promise) network.sendTo(netfluxId, message)
  • Get the channels list: (Array) network.webChannels
  • Listen to events in a network: network.on('message', handler) (events: message, disconnect)

And for each channel obtained from "network.join":

  • Send a message: (Promise) channel.bcast(message)
  • Leave a channel: channel.leave(reason)
  • Listen to eventst: channel.on('message', handler) (events: message, join, leave)
  • Get the channelId: (String) channel.id
  • Get the members list: (Array) channel.members

The second step is to spawn a ChainPad instance and bind it to the transport layer and the user interface by using the repository here.

Because the readme covers how to set up the instance and all the functions available, I would highly recommend the developer to read that. But for the sake of continuity, the following describes instantiating ChainPad instance and binding it to the data transport layer.

var chainpad = ChainPad.create(config);

// binding the ChainPad session to Data Transport
// Because we have used the Netflux-Websocket, continuing in the previous code:
Netflux.connect('ws://lcoalhost:3000/cryptpad_websocket').then(function (network) {}
network.join(channelId).then(function (channel) {
// listen for new messages
channel.on('message', function (message, senderNetfluxId) {
chainpad.message(evt.data);
});
// send a message
channel.bcast("Hello world!");

// onMessage in ChainPad refers to when sending a message. Might cause confusion, so watchout!
chainpad.onMessage(function (message, cb) {
socket.send(message);
// Really the callback should only be called after you are sure the server has the patch.
cb();
});
}, function (error) {
// on error
});

}, function (error) {
// on error
});

chainpad.start();

For binding the instance to the user interface, please refer here. The text which is being replaced as patches in the document being edited, can be sent to the server in encrypted form. That is where the Files' implementation of the hybrid encryption scheme will come into place.

For distributing the shared keys for a document, the same methodology as used by Files for sharing the files can be used. Or, like in CryptPad or Mega, have the decryption keys in the link when it is shared with someone after #. And have the application not read anythin after the #. Signatures can also be embedded in this way.

Diffing logic

Patches are essentially applied whenever there is a change in the document. Depending on how we use the UI rendering for editing the document, we can either apply the patches to encrypted texts and have the HTML tags as plain text or apply it to the entire encrypted version of the change i.e. including the HTML tags.

The diffing logic can be achieved by using a library like DiffDOM. CryptPad also uses the same library.

Things to look out for

  • Because we want to have a real-time document collaboration, doing that directly with IPFS as a store of the document while editing might not be ideal with respect to latency issues. IMO, I would suggest storing the encrypted file locally while editing and commit the final version or commit checkpoints to IPFS.

Conclusion

This specification gives an overview of how CryptPad and ChainPad are structure and which parts we can use to integrate into Files architecture for enabling E2E encrypted document collaboration. The implementation level details of which dependencies to use and how to manage the data store, users and authentication, if left upto the team as they would know better.