The Immutable Project Gutenberg

The Immutable
4 min readDec 13, 2023

--

Learning is its own reward. Nothing I can say is better than that.”

— Michael Stern Hart, founder Project Gutenberg

Today Dara is proud to announce the release of The Immutable Project Gutenberg Collection. It’s been a while coming but we made it, and in time for Christmas!

The story of Project Gutenberg, creators of the first eBook, starts in 1971 with a man named Michael S. Hart typing The Declaration of Independence into the mainframe computer at the University of Illinois.

54 years later and with the help of Distributed Proofreaders which “provides a web-based method to ease the conversion of Public Domain books into e-books” around 70,000 books now form this extraordinary corpus of public domain knowledge.

The entire Project Gutenberg library, until today, has been hosted on a couple of university servers. To protect this information and to help in Michael’s mission to “save the wold” by improving literacy and education — to smash the barriers between the information rich and the information poor — we are delighted to share with you The Immutable Gutenberg Collection.

Technical Documentation

The original Project Gutenberg library (PG) is about 2 terabytes (TB) in size.

Our processed PG strips “attic” contents while keeping the main asset file; if HTML is found, that will be the key file, if not, then text is used, if neither are found (in case of mp3 for instance) then it’s not indexed.

Out of all the PG titles, 69055 books are processed.

Every book underwent the clean up, and restructured in-line with ZWI standard then is signed by DARA’s private key, verifiable via DID:WEB and DID:PSQR.

Title numbers are retained across the library, so book ID 1 in PG is 1.zwi in DARA. The aforementioned cleaning results in a library overhead of ~112GB total!

All the books are stored inside a MFS Directory on IPFS. This is published to IPNS at k51qzi5uqu5dh0negpidss3lsz59ngevump5kysbcl0g07hd4ig1e102sbhtzq… for instance: https://cloudflare-ipfs.com/ipns/k51qzi5uqu5dh0negpidss3lsz59ngevump5kysbcl0g07hd4ig1e102sbhtzq/ will load up the PG library same as https://cloudflare-ipfs.com/ipfs/QmW6UZQ946t1sJSChPuA6fuMNL3qwwDtxhFKmw6ydiamZu/ would (first URL is /ipns/ and the second is /ipfs/ )

You can download the ZWI files by changing “opfs” to “ipfs” in the URL of a book that you’re viewing, OR simply add “&download” to the end of the URL.

For instance:
https://opfs.dara.global/ipfs/QmZEZcQkHEfAP9G9qw63ZqxKzvaLdEvJ8d4mboF9BVHNkf/55.zwi will render book ID 55 on the fly

https://ipfs.dara.global/ipfs/QmZEZcQkHEfAP9G9qw63ZqxKzvaLdEvJ8d4mboF9BVHNkf/55.zwi will download book ID 55 as a ZWI

“You can imagine there’s quite a bit more processing to drop down from terabytes to gigabytes” -BGNLouie, Dara developer and co-founder.

Blockchain?

The MFS (Mutable File System) directory in which the books are contained has a unique CID (Content Identifier), or hash, which will be stored on-chain.

New PG books will be added to this directory periodically and a new hash subsequently generated. This new hash will then also be added on-chain. In this way we will create a sequence of chronologically ordered, tamper-proof and immutable snapshots of our PG library as it evolves over time.

Signing all 69,055 eBooks individually to blockchain is expensive, although we are exploring a way in which this could be done cost-effectively, and will update on this soon.

Wrapping Up

There haven’t been many revolutions or rethinks in the 2600 year history of libraries.

While great at sharing and disseminating information the Internet is ephemeral and ultimately much more likely to lose information than a book, clay tablet, or even papyrus scroll.

With the sun-setting of the printing industry it is more important than ever to archive valuable knowledge, such as this library, to as many computers as possible using the most censorship-resistant and data-redundant technology available to us today. In this way we hope we helped Michael to “save the world”.

Please think of our immutable collection as a Christmas present you can share with the all those around you. Just point them to gutenberg.dara.global!

It is complete, concise (5% the size of the original) and mobile-friendly. It is both updatable (mutable) and immutable (snapshots). It leverages IPFS, IPNS, Dara’s OPFS, blockchain records, and ZWI files. In many way this collection is a culmination of what we’ve learned from all our work to-date.

Read online, download, and share this essential collection of forever books!

Thanks to all those at the PGLAF (The Project Gutenberg Literary Archive Foundation), Greg Newby, all those at the KSF (Knowledge Standards Foundation), Larry Sanger, Tim Chambers and Sergey Chekanov.

--

--