The two most obvious historical antecedents to the Wayback Machine are search engines and libraries.
The Wayback Machine works similarly to a search engine because you enter a query (in this case, a URL) into a search bar and results are presented to you. However, what sets the Wayback Machine apart from Google is how its crawls are conducted; they are collaborative, unique, and decentralized – much like a library.
As Kalev Leetaru said in his 2016 article for Forbes, “The Archive has essentially taken the traditional model of a library archive and brought it into the digital era, rather than take the model of a search engine and add a preservation component to it.”
I agree with Leetaru’s stance that a library provides a superior understanding of the historical underpinnings of the Wayback Machine.
Leetaru says, “as web archives transition from being simple “as-is” preservation and retrieval sites towards being our only records of society’s online existence and powering an ever-growing fraction of scholarly research.” This illustrates how libraries are transitioning away from brick and mortar locations into digital archives like the Wayback Machine. Nowadays, most research doesn’t begin on the library shelves. Instead, they start on the web. The Wayback Machine operates as a digital recordkeeper for this online research.
Collaboration is a key functionality of libraries. No single library can house all of the books ever written, but there’s the hope that by joining together through programs like Davidson’s own ILL system, that all of the books ever written will be available to the public. The Wayback Machine also relies on a network of partner crawlers, many of them coming actual libraries, to create its archive.
Another shortcoming of brick-and-mortar libraries that the Wayback Machine faced and overcame was limited storage. If storage is not infinite, you must choose which books to shelve and which websites to archive. In the beginning of the Wayback Machine, the creators were anxious about making the right choices. “We don’t know what the right things are to be collecting,” founder Brewster Kahle admitted back in 2001. “By making this collection available, we’re hoping to find out what we should be collecting to create a library that is of enduring value.”
Nowadays, however, storage limitations are not an impediment to the Wayback Machine thanks to technological innovations.
In an article by Heather Green from Bloomber Businessweek in 2002, the Wayback Machine is continuously compared to a library. Interestingly, the only way that I was able to find access to this article was through the Wayback Machine; it no longer exists on the web.
Even back in 2002, the Wayback Machine was already eclipsing all of the standing libaries. It had over five times the amount of information stored in the Library of Congress, thanks to Kahle’s internet software company called Alexa (perhaps named after the Library of Alexandria, which Kahle explained had a charter to have a copy of every book in the world).
Green makes comparisons to libraries beyond just to illustrate the magnitude of the Wayback Machine. It was also a powerful analogy to explain the copyright restrictions the Wayback Machine was becoming entangled in.
“The American Library Assn. is struggling with how libraries can maintain their traditional role if they have to pay for each use or access of digital content,” says Green. At the same time, the Wayback Machine was struggling to gain permission to archive things that were otherwise unavailable to the public.
Even the founder of the Wayback Machine seemed to realize that it was rooted in traditional libraries. The Internet Archive’s logo resembles a greek building, and their headquarters, which resemble the logo, were bought for that exact reason. According to a 2015 New Yorker article by Jill Lepore, “when Kahle started the Internet Archive, in 1996, in his attic, he gave everyone working with him a book called ‘The Vanished Library,’ about the burning of the Library of Alexandria. ‘The idea is to build the Library of Alexandria Two,’ he told me. (The Hellenism goes further: there’s a partial backup of the Internet Archive in Alexandria, Egypt.)” Kahle embraces the roots that his library has and is trying to improve on them. Instead of a library for the learned, he’s making a library for everyone.
As the icing on top of the cake, the Internet Archives headquarter’s cornerstone was laid in 1932; anything that was published before that date lies in public domain, out of reach from the copyright scruples Kahle found himself caught up in circa 2002.