PetaBox

The PetaBox, custom-designed by Internet Archive staff and C. R. Saikley, was originally created to safely store and process one petabyte (a million gigabytes) of information. The Internet Archive data center now houses ~3PB of PetaBox storage technology and is expanding steadily. One PetaBox rack provides 1.4 petabytes of storage. There is no air conditioning on the PetaBox deployed at 300 Funston Avenue. Instead, the excess heat from running the machines helps heat the building.

Project goals

 * Low power
 * High density storage
 * Local computing to process the data
 * Multi-OS possible, with Linux standard
 * Co-location friendly
 * Shipping container friendly: Able to be run in a 20' by 8' by 8' shipping container.
 * Easy Maintenance: One system administrator per petabyte
 * Software to automate full mirroring
 * Easy to scale
 * Inexpensive design
 * Inexpensive storage

History
The initial PetaBox stored 100 terabyes, and was deployed at the European Archive in June 2004. Another 80 terabyte rack went live at 300 Funston Avenue soon after. Later that year, the Internet Archive spun off PetaBox production to Capricorn Technologies. Capricorn replicated the Internet Archive's successful deployment of the PetaBox for major academic institutions, digital preservationists, government agencies, HPC and major research sites, medical imaging providers, digital image repositories, storage outsourcing sites, and other enterprises around the globe. By 2007, the Internet Archive data center was using approximately three petabytes of PetaBox storage.

Articles

 * "The Fourth Generation PetaBox", Internet Archive Blog
 * "Big storage on the cheap", CNET