Comment on this article

Inside the Internet Archive

by Howard Trace

With the recent news of theNational Emergency Library there has been more focus on the Internet Archive, but do you really know what they have made available since launching in 1996? As of late June 2020, they provide access to over 45 petabytes of data that includes:

•        330 billion web pages

•        20 million books and texts

•        4.5 million audio recordings (including 180,000 live concerts)

•        4 million videos (including 1.6 million Television News programs)

•        3 million images

•        200,000 software programs

But did you also know that anyone with a free account can upload material to the Internet Archive? Does your library have a collection that could be available to the public that would compliment the vast amounts of material already available?  Why recreate the wheel with a separate system when you could add to a collection accessed by millions of people every day.


The first purpose of the Internet Archive was to preserve the web in a way no one else was. Through the Wayback Machine there are over 20 years of web history that are still accessible much as they were when they were originally available.  In addition, over 600 institutions are ensuring that their web history is preserved using the Archive-It service. If those products are not saving a web page you need you can also request for an attempt to save a specific page.

Books and Texts

In addition to freely available scanned books, there are some interesting materials that you might be surprised to find scanned to search online. One example is the microfilm collection that includes over 174,000 items. For anyone that has used microfilm on a regular basis the preservation aspects of the format far outweigh its accessibility. With this much more searchable format, there is the possibility of finding a hidden gem in this material.

For those looking to debunk or support the latest conspiracy theory look no further than the National Security Internet Archive. With over 2 million records from the FBI, CIA, NSA, DoD, Department of State, and many others, it won’t be too long before you get lost down the rabbit hole of loose connections and sometimes heavily redacted documents.

One other collection of note is the genealogy material, because there can never be enough resources for locating that one relative that opens up the next branch of the family tree. In addition to the Census records that you would expect to find, there are also passenger lists from New York, Baltimore, and Philadelphia along with over 3,000 published family genealogies.


For those who prefer audio recordings, there are nearly 20,000 classic audiobooks in addition to over 300,000 radio programs. Throughout the Internet Archive there is material available in multiple languages but it is readily apparent in the audio material with over 20,000 Russian audiobooks and over 40 languages other than English with more than 1,000 items available.


One of the major problems with archiving modern life and preserving our history is ensuring that we have the proper tools to access material in the future. While we have come a long way in utilizing common formats there is still a significant amount of proprietary data that needs very specific software to run. Some of the resources available include firmware and system ROMs, CD-ROM images, DOS files, and Linux distributions. Also available are classic PC and console games, some of which are playable within a web browser, but others only have the demo version to try.

As noted above there are also video and photo collections available, but the Internet Archive would often be a secondary resource to the many other options currently available.  However, there may be unique materials not found in other resources so it never hurts to do a quick search to see what comes up.

When doing a search, if the results do not appear to be as expected, try selecting the “text contents” option as by default only item metadata is searched. This should significantly expand results, and as always, check out the advanced search for more options to improve your success in finding the right hidden gems.

The Internet Archive will, or course, not have every answer that cannot be found somewhere else, but it is an important asset that compiles many different types of information in one place, but it is also as a resource that all libraries can support and help build as a beacon to our shared responsibility to collective memory and history.

Copyright 2020 by Howard Trace

About the author:

Howard Trace serves as director of the American Legion National Headquarters Library & Museum Division, a position he has held since 2008. His library experience spans three decades in public, academic, and special libraries. He holds a bachelor’s in history and religious studies from Purdue University, master of science in space studies from the University of North Dakota, and a master in library science from Indiana University.