I’ve hinted, in last week’s post, about the development of a feature that could help Minds.com evolve into a decentralised social network and bring P2P into the mainstream, thereby solving the growing privacy and censorship concerns that are associated with a centralised social network. This feature is based on an application layer protocol known as ‘DAT’. There are reasons to believe it’s likely to succeed where previous ideas failed: Since DAT works entirely at the application layer, and is implemented using Node.js, there’s very little effort or learning curve involved for developers and users of DAT applications. Web applications can be extended to support it, if the demand is there, using already published libraries that are extensively documented.
For those who aren’t developers, there is a working browser that anyone, without technical skills or knowledge, can use to browse and publish sites on the DAT Web.
What is DAT?
DAT started life with the scientific community, which had a need for a more effective method of distributing, tracking and versioning data. In the conventional Web, data objects are moved, Web pages are deleted and domains expire – this is referred to as ‘content drift’. We’ve all come across an example of this in the form of ‘dead links’. When using a hyperlink to reference a data object or Web page, there is no guarantee that link would be valid at some point in the future. DAT was proposed as a solution to this.
But what does this have to do with censorship and privacy, you’re probably asking? The answer to this question is in how data is distrubuted, discovered and encrypted.
Merkle Trees, Hashing Algorithms and Public Key Encryption
The DAT protocol is essentially a real-world implementation of the Merkle Tree data structure, with the BLAKE2b and Ed25519 algorithms for identification, encryption and verification (other docs state that SHA256 is used as the hashing algorithm). It’s not necessary to understand this concept in order to develop DAT applications, since there are already libraries for implementing this, but for the curious, I reccommend reading Tara Vancil’s explanation first before moving on to the whitepaper.
An important point Vancil made was DAT is about the addressing and discovery of data objects, not the addressing of servers hosting those objects. Data objects are not bound to IP addresses or domains either. Each data object has its own address, and that address is determined by its cryptographic hash value – a file’s hash digest will be static, regardless of where it’s hosted. This is important, because we’re accustomed to thinking of the Internet/Web in terms of the client/server model, and proposed solutions for privacy and anti-censorship typically try to deal with the problem of decentralised host discovery in a peer-to-peer (P2P) network.
Some form of data structure is required to make the data objects addressable and to enable their integrity to be verified. A DAT peer-to-peer network uses Merkle Trees for this, where all ‘leaves’ and nodes contain the hash values of the data objects they represent, and the root node contains the hash digest of all its child nodes. In other words, as the whitepaper puts it, ‘each non-leaf node is the hash of all child nodes‘.
Not only does this provide a way of verifying the integrity of the data objects – the root node’s digest will change if there’s any modification to a data object represented in the tree – it provides the means to an efficient lookup system, as the root hash digest becomes the identifier for a dataset.
Obviously, this means clients would need to fetch the root node’s value for a given dataset from a trusted source, which might be one of many designated lookup peers on the network. If the client wanted a given data object, it wouldn’t need to fetch everything referenced under the root node, but just the root node value, the parent node of the requested objects, and the hash values of the other parent nodes.
Addressing, References and Security
Now, let’s get into the more specific aspects of how Merkle Trees are implemented in the context of DAT. All the ‘leaf’ nodes in the DAT Merkle Tree contain a BLAKE2b or SHA256 (depending on the docs being read) hash digest of the referenced object. All parent nodes contain the hash digest and a cryptographic signature. The signature is generated by creating Ed25519 keys for each parent node and using them to sign the hash digest.
When sharing a locally-created site in the Beaker browser, or viewing one already shared on the network, you might notice the URI following ‘DAT://’ is a long hexadecimal string. This is actually the Ed25519 public key of the archive containing the referenced object being shared, and it’s used to encrypt and decrypt the content. The corresponding private key is required to write changes to the DAT archive.
The public key is, in turn, hashed to generate a discovery key, which is used to find the data objects. This ensures no third-party can determine the public key of a private data object that hasn’t been publicly shared.
The Beaker browser looks very much like the standard Firefox browser on the surface, and it can be used to browse both DAT:// and HTTP:// addresses. As we can see, DAT sites are rendered just as well as those on the conventional Web. The only problem is that, as with Tor and I2P, sites are hosted on machines that aren’t online 24/7, so many of them are unreachable at a given time.
From the Welcome dialogue, we can get straight to setting up a personal Web site dor publishing on the DAT Web. A default index page, script.js and styles.css are included ready for us to customise. In addition, Beaker allows us to share the contents of an arbitrary directory on the machine it’s running on.
Previously-created sites are available under the ‘Library‘ tab in the main menu. Sites that aleady exist will be listed under the ‘Your archives‘ section, and can be modified and/or published.
What happens to a published site when the local machine is offline? There is a method to keep a site accessible, by somehow getting another person or machine to ‘seed’ the data. This is a short-hand way of saying another person could fetch a copy of the site and re-share it over the network. Seeding happens automatically as a user is actively browsing a DAT site.
The Node.js Modules
Several Node.js modules provide libraries that developers can use to implement DAT features in their applications.
- hypercore: A component for creating and appending feeds, and verifying the integrity of data objects. The API exposes a number of methods under the ‘feed’ namespace for reading, writing and querying feeds.
- hyperdrive: This is a distributed filesystem for P2P. One of the design principles is to reproduce, as closely as possible, the APIs as the core Node.js filesystem component, thereby making it transparent to application developers. This module enables a local file system to be replicated on other machines.
- dat-node: A high-level component that developers could use to bring together other DAT modules and build DAT-capable applications.
- hyperdiscovery: Module for network discovery and joining. Running two instances of a hyperdiscovery module will result in a given archive key being replicated.
- dat-storage: The DAT storage provider. Used for storing secret keys, among other things, using the hyperdrive filesystem.
Node Discovery in Practice
Two components are used for this: discovery-channel and discovery-swarm. The discovery-channel component searches BitTorrent, DNS and Multicast DNS servers for peers, and advertises the address/port of the local node. Therefore, it is based on the bittorrent-dht and dns-discovery modules. Using discovery-channel, the client can join channels, terminate sessions, call handlers on session initiation and fetch a list of relevant channels. The network-swarm module uses discovery-channel to connect with DAT peers and control the session.