IA Command-Line Tool

From Internet Archive Unoffical Wiki
Jump to: navigation, search

There is an official library for interacting with the Internet Archive (uploading, searching, etc. -- not including the Wayback Machine or Open Library).

ia search --field

Many, but not all metadata fields are indexed and returnable by `ia search`. A list of returnable ones are below, with notes on discrepancies.

access-restricted-item, addeddate, aspect_ratio, audio_codec, audio_sample_rate, backup_location, ccnum, closed_captioning, collection, color, contributor, coverleaf, curation, date, description, filesxml, frames_per_second, identifier, identifier-access, identifier-ark, imagecount, language, licenseurl, mediatype, notes, ocr, program, publicdate, repub_state, runtime, scandate, scanningcenter, sound, source, source_pixel_height, source_pixel_width, sponsor, start_localtime, start_time, stop_time, subject, title, tuner, updatedate, updater, utc_offset, video_codec, year

Discrepancies:

  • The search index returns addeddate, updatedate, and publicdate with a T between the date and time, and a Z afterward, unlike how it is stored (with a space between date and time).
  • The search index returns a full timestamp for date even if only the date (not the time or timezone) are stored.
  • The search index returns imagecount as a number, even if it is stored as a string.
  • The association from items to the users who marked them as a favorite is returned by the search index (in the collection key), but not stored.
  • The search index returns the subject field as a list, while the metadata has it as a single semicolon-separated value.
  • The search index strips HTML from the description and rights fields.

Metadata fields that are not returnable include:

  • uploader