The Endangered Internet Archive Is Full of Treasures

By Whitney Kimball on at

The Internet Archive set out in the 1990s with an improbable mission to become the “Library of Alexandria Two”; by 2020, they’ve arguably surpassed that goal, plus delivered their collection straight to the masses. It’s the only repository where a NASA recap of the Space Shuttle Challenger disaster logically coexists with a 1990 recording of the Grateful Dead live in the US state of Connecticut and a 1979 hip hop mix tape. Whether you need to settle a dispute over the origins of the Buffyverse, or you’re litigating trademark infringement, the Wayback machine’s vast archive of old webpages is admissible evidence. You can search for TV news videos by quotes. At this writing, the archive bot is quietly, surgically extracting rotten links and replacing them with Wayback pages (millions so far), and the archive is filling in over 100,000 book references with live links to pages in full texts. These are the guys who tell us to archive our shit and save it for us anyway when we don’t.

We don’t need to tell you, but we’re doing it, because many are worried that a recent copyright lawsuit brought by major publishing companies will decimate the Internet Archive. Maybe that’s hype – as Vox has pointed out, the maximum penalties under the lawsuit would amount to a little over $19 million/£15 million (the Internet Archive’s revenue in 2018 was $20 million/£16 million) – and the complaint asks for a permanent injunction and destruction of “unlawful copies” of works. That would amount to 1.4 million scanned books, but it wouldn’t touch the Wayback Machine or public domain works. The Internet Archive seems to disagree with the tempered assessment; its native wiki currently lists its status as “endangered,” with a reference link to a Vice article on the lawsuit.

This could be the first of several battles, which have been gearing up since the inception of the Open Library project, the Internet Archive’s depot of scanned books, both donated directly to the archive and shared from partner libraries. For years, the Internet Archive has been defending its controlled digital lending practice that allows readers to check out one book per each hard copy the Internet Archive owns, similar to a physical library. Publishers and authors argue that they’re skirting around the traditional licensing payments libraries are expected to make for e-books and therefore violating copyright law. Libraries and universities have stood by controlled digital lending, but after the outbreak of the pandemic, the Internet Archive temporarily lowered that defensive shield by eliminating its waitlist for all books in order to facilitate online learning in quarantine. More cases could loom, and copyright claims reach beyond the book collection, as Viacom recently proved when it demanded the archive remove its collection of old MTV broadcasts. And then there’s a US senator who for some reason seems to be hunting for copyright violations in the Internet Archive, lately aiming at its crowd-sourced repertoire of 78 rpm discs from 1898 to the 1950s.

The suit raises the threat level and the public anxiety that’s always surrounded the Internet Archive. If the archive were to die by a thousand legal cuts or a solar flare, hardcopies could rot in shipping containers; code would simply vanish. Here are a few of those treasures.

The Marion Stokes Collection

Image: From an episode on “Input” (The Marion Stokes Collection, the Internet Archive)

Marion Stokes towers over the recent dignitaries of Internet Archive history – those who hunt down thousands of obscure CD-ROMs of DOS games, they who spend years pulling miles of film from dumpsters. Stokes, a communist civil rights activist and librarian from the US, has assumed a historical throne for recording over three decades of cable TV broadcasts, with up to eight VCRs running at all times.

A documentary about her life, Recorder, debuted last year; the film portrays an activist driven by a critical lens on how cable news edits and distorts reality, whose undertaking became all-consuming. Internet Archive founder Brewster Kahle told Gizmodo that they’ve taken on her “shipping container sized” TV tape collection in order to digitise it.

That Herculean effort is still underway, but so far, the collection heavily features Stokes’s recordings of “Input,” a 1960s-1970s social justice talk show from the US city of Philadelphia which hosted debates on prison reform and featured members of the Black Panther Party on social conditioning. You can hear Stokes refute Western ideas about violence in this episode.

The Abolitionist Papers and Slavery Stories

Image: The Guilt of Slavery and the Crime of Slaveholding (The James Birney Collection, the Internet Archive)

The Internet Archive worked with Johns Hopkins University’s library to digitise and host 19th century American abolitionist publisher James Birney’s collection of anti-slavery pamphlets. It includes descriptions of life in slavery, a famous appeal by 18th century British pamphleteer William Fox, and a lawyer’s 1849 argument before the US Supreme Court for desegregating Massachusetts schools.

The Historical Slavery Collection pulled from the Federal Writers’ Project, a Work Projects Administration program, features photos and volumes of first-hand accounts from former American slaves, recorded during the Great Depression. The sometimes painfully offensive outcome reflects stereotyping transcriptions that butcher the teller’s story, but the significance of the accounts is clear. (This has been posted online elsewhere on Project Gutenberg and the Library of Congress, but the Internet Archive has posted them in a reader-friendly book format with text search.)

Educational films, B Westerns, and NASA footage

Image: “Ascent” (The NASA Collection, the Internet Archive )

Back in the 90s, the decade that brought us the Internet Archive, Skip Elsheimer was going to surplus auctions, picking out neat stuff, as magpie archivists do. But it was a batch of 16 millimetre films – 500 films for 500 bucks, mostly hygiene and driver’s ed type stuff – that would explode into a trove of material that he’s made available to the world, in large part via the Internet Archive. By 2020, he’s amassed a collection of 27,000 films from dumpsters, landfills, school auctions, and closets. Over a thousand are on the archive, including a little song about venereal diseases, a scared-straight reenactment of teen grand theft auto, and a 1957 film moral debate over atomic energy. (Cherry-picking here). Elsheimer has also digitised 500 public domain feature-length films for the Internet Archive, mostly B-movie Westerns, about which little or no information was available online. “That launched a hundred Roku channels,” Elsheimer joked, after later finding people charging for films he’d first uploaded.

When asked for a work that sticks out, though, Elsheimer pointed to a video he digitised for NASA. It’s a video from inside the cabin of a shuttle, just the back of astronauts’ heads, from launch to zero gravity. “It’s raw footage, it’s not very interesting. But because it’s not very interesting, to me it’s very interesting. You see their heads bobbing, and eventually as they get into zero gravity, their hair starts to lift up...the sights and the’s really fascinating.” It refreshingly lacks the historicity of news broadcasts and the spectacular sheen of NASA Instagram posts.

Similar to Elsheimer’s work, the Internet Archive also hosts a large chunk of the Prelinger Archives, a collection of around 17,000 digitised films, mostly of home movies and amateur works. (It includes the all-time classic “Duck and Cover,” the musical cartoon educational film from the Cold War era on how to survive a nuclear bomb by ducking under a classroom desk.) There are also the 1940s-1970s educational films including delightfully explicit sexual content and an accurate depiction of the miracle of birth. Most of these items, the Archive confirmed, are not available anywhere else.

Historic games

Image: Caper in the Castro (The Internet Archive )

“The Internet Archive is the only place in the world people can access and play the two earliest know LGBTQ video games: Caper in the Castro (1989) and GayBlade (1992),” gaming studies scholar and founder of the LGBTQ Video Game Archive Adrienne Shaw told Gizmodo. (Caper in the Castro is a murder mystery game featuring a lesbian detective and a missing drag queen; GayBlade is a role-playing game where a team of “drag queens, queers, lesbians, and others” rescue an empress from “disgusting right-wing creatures.”) Shaw tracked down and interviewed the designers of both – and she added that she’s relied heavily on the Wayback Machine for much of her research, especially discussions on now-shuttered forums like GayGamer.Net,, and (Shaw shared this post on the Playboy Mansion: “It’s one of the only places I can find a designer speaking openly on the intention to include gay options in the game.”)

“The Internet Archive played an instrumental role in helping me and the creators get those games online and playable for the first time in 30 years,” Shaw told Gizmodo. “These are games that the only known copies are those that the creators had, that were then uploaded to the archive. These crucial pieces of LGBTQ game history would be lost,” she said, if not for the archive.

The same would likely go for a vast repository of games. Initiatives like the eXoDOS project, a group of archivists who tracked down thousands of DOS games on CD-ROMs, has allowed the archive to make thousands of lost titles available for browser and laptop play.

The directories and manuals! Yes, really!

Image: Library books (The Internet Archive )

Don’t discount the staggering magnitude of manuals and directories! There are manuals for antique tractors and vending machines, directories for mid-20th century Black-owned businesses in America and parishes in the US city of Raleigh, North Carolina. This content isn’t just kind of interesting to skim: it’s also a critical historical resource, so much so that librarian Jessamyn West immediately pointed Gizmodo to “keyword searchable volumes of Library Journal!” as her Internet Archive treasure.

Why a dense monthly trade publication dating back to 1879?

For one, they tell the story of America's state library associations, historical cultural gatekeepers, whose history West has been documenting and filling out on Wikipedia. Their histories reflect institutional biases, historically privileging white staff and excluding representation of the populations they serve. “Many state associations in the Southern US were segregated, some until the Civil Rights Act of 1964 MADE them desegregate,” West told Gizmodo. “So if you look at, say, the North Carolina Library Association and the North Carolina Negro Library Association (which is well-documented) you really get a sense of just where this ‘whiteness problem’ came from.”

The Myspace Dragon Hoard

Image: The Myspace Dragon Hoard (The Internet Archive )

When Myspace “lost” 50 million songs under dubious circumstances (a “server migration”), wiping out work by deceased users and rough early songs that never made studio albums, it was the Internet Archive that saved the day. To a degree. Anonymous academics offered the archive nearly half a million songs, used for a 2008 to 2010 study, and within weeks of the Myspace tragedy reports, they were up on the archive. We may never recover the majority of Myspace’s music library, but at least one Twitter user found their old stuff again, and the recovery inspired a group of Gen Z archivists (all of whom grew up with a post-Myspace digital landscape) to spend months sorting through metadata and renaming the files.

Attention K-Mart Shoppers

Image: Kmart October 1989, from “Attention Kmart Shoppers” (The Internet Archive)

It is, therefore, it’s archived. See this collection started by a former employee of the US big box department store chain Kmart, who uploaded tapes upon tapes of 1970s to 1990s Kmart shopping jams. Shoppers got an earful of the glorious history of Kmart, mixed with generic pop and soft rock. Contributors have also added midcentury employee training clips from the S. S. Kresge Company, the department store chain that preceded the Kmart name. Why am I including this? Because that’s the MO: why not.

Featured image: Gayblade (Windows 3.1 Version, 2.0) by RJBest Company (Ryan Best and John Theurer), 1992 (The Internet Archive)