From Internet Archive Unoffical Wiki
Jump to: navigation, search

Items on IA can have arbitrary metadata assigned to them. There are LOTS of magic keys that do various things, some fraction of which are documented below.

Considerations about Metadata on the Internet Archive

While the term "metadata" has varying meanings in the archives/libraries and content management space, at the Internet Archive, this term refers specifically to database settings on each item in the database. Some of these are self-explanatory like title or date, but some are not as clear, and a number of them cause the Archive to treat the item differently. Some even take actions and modifications based on being set.

For this revision of this document, the different classes of metadata can be thought of as System Reserved, Magic and Everything Else, that is, settings which can be put on any item but won't be treated differently than any other text field.

Due to the nature of the Internet Archive's stores, there is no "one true schema" with items, because various balkanized groups use the Archive's content different ways. A partner organization might have very explicit, refined declarations for metadata pairs that exist nowhere else in the system, while a term as simple as "subject" could be used many different ways depending on which group is using them.


The unique identifier used to reference the directory of material.
The descriptive title shown for the item in the details page.
The listed "author" or "creator" for the content of the item. Usually an author with books, or a production company/director with movies, and so on.
Where the item is filed under.
A search query that creates a "virtual" collection.
Currently movies, texts, audio, software, data, and web each with different ways of being automatically displayed in their details page. It is incredibly easy to misclassify an item into the wrong mediatype, meaning it will likely be properly derived but not properly displayed.
The official "date" of the item, be it the release date of a movie, or published date of a book, or creation date of the software. Different than the creation date of the item on the archive (which is handled by publicdate and addeddate).
The term or terms that function as "keywords" for an item, such as what discipline it is, what sort of contents are in it, or who appears in it.
A description of what license the item is licensed under.
Whether the item is the "pick" for the collection it's in. Mostly deprecated.
Whether the item is the in the search engine/search results.
A deprecated version of noindex. (Likely to be merged/removed in the future.)
Read-only. The date that the item was "published". Only different from addeddate if it is part of the internal scanning operations of the archive.
Read-only. The date and time that the item identifier was added to the internet archive and reserved.
For the Internet Archive's internal scanning operations, what employee or contractor added the item's identifier to the database.
The account that originally uploaded the item into the archive.
For the Internet Archive's internal scanning operations, what employee or contractor updated the item's information last.
For the Internet Archive's internal scanning operations, when the item was last updated.
A general "additional information" metadata entry for commentary on the quality of the item, when it was added, factors or considerations related to the item itself.
Known information as to who or what entities have additional rights or considerations for the item above what is mentioned in licenseurl
A sponsoring body or source for the item, such as a library, external archive, or fund/individual who provided the item (in analog or digital form) this item was created from.
The known "publisher" of the item or the original artifact the item was generated from.
The official "language" of the item, for assistance with classification but most importantly, OCR recognition.
What the item "covers" in a geographic or country sense.
(Question to research: What does this do?)
(Question to research: What does this do?)


If a URL at is provided, other metadata will be automatically populated from there whenever an internal script is run.
Entering an ISBN number will populate the item (generally) with the information gleaned from MARC and other sources of book metadata. (Note: Sometimes MARC data isn't accurate. Best to name the metadata item something other than ISBN, in that case.)
Setting the ppi-direct metadata pair will provide a dpi override, instead of what the software determines by looking at the material.

Everything Else

Assuming a metadata name is not used by the system or is a magic word, almost anything can be added to the archive's metadata entry for an item. For example, a collection of magazines might all have a special "platforms" setting to indicate what computer platforms those issues cover, or a sewing pattern collection can have a "stitch" entry for what type of stitches are needed to complete the pattern.

See Also