Uploading With Python

The Internet Archive provides a Python library on PyPI to help with uploading items. You can.

The documentation is here, but it doesn't provide much of a linear breakdown.

Aside, the  package comes with a command line tool called   that already provides a high-level API for interacting with the Archive's data. It would behoove you to see if that already meets your needs.

This tutorial assumes that you have a basic user account on archive.org. Accounts with higher permissions may be able to do things like set "write-once metadata" more than once.

Logging in
First thing's first: you have to create the config file for. This is not one of those tutorials where you can skip to the fun stuff. You must authenticate before you can work with the Archive's toys.

Option 1: ia configure
If you never want to write your credentials into your code, that command line tool  provides   for this. Fill it out and you'll have your config file.

Option 2: internetarchive.configure
The  function within   will also create the config file for you. Running this program once will create your config file. You can delete this after you run it.

from internetarchive import configure configure('myemail!@example.com', 'password')

Double-checking your config file
The config file will be found at  or   unless you specify a custom path elsewhere. It will be created correctly, since it doesn't have any moving parts. It should look something like this:

[s3] access = aCcEsSkEy secret = SeCrEtKeY

[cookies] logged-in-user = my_email!%40example.com; expires=Sun, 04-Oct-2020 02:08:30 GMT; Max-Age=31536000; path=/; domain=.archive.org logged-in-sig = SiGnAtUrEalsdnfanfaFEKA:WEFASDfadsfvaodsnfasdFAsdvnieranv; expires=Sun, 04-Oct-2020 02:08:30 GMT; Max-Age=31536000; path=/; domain=.archive.org

[general] screenname = lethargilistic

Those are IA-S3 keys rather than Amazon S3 proper, but that's a background detail. You can find the key associated with your account here.

The  library auto-populated all of this config file information, including those keys, without me having to write them anywhere. I'm not sure if that key transfer is secured, to be honest, but there is also a way to add the keys to your code manually. You can write that part of the config file inline:

from internetarchive import get_session c = {'s3': {'access': 'aCcEsSkEy', 'secret': 'SeCrEtKeY'}} session = get_session(config=c)

Creating a session is not necessary if your config file already exists. If you've written these lines, you can delete them.

Uploading
Having created our config file, we're now ready to send things to the Archive.

You get an item by its identifier
Every single thing within the Internet Archive's system has a unique "identifier." This is a string that contains ASCII letters, numbers, hyphens, underscores, or periods. To access any item within the archive, you request it with its identifier:

from internetarchive import get_item cool_podcast = get_item('amicus_lectio_0013')

That item identifier is already taken, and I control it, so you cannot upload to it, but I can. You can view the metadata within this [`Item`](https://archive.org/services/docs/api/internetarchive/internetarchive.html#internetarchive.Item) object with  or download it with. If you want to see a progress bar,  has a   flag.

But how do you register an identifier for the item that you're going to upload? It happens automatically during the upload process, so you don't have to think about it as long as your identifier is unique already. If you use  on an identifier that is not registered yet, but you do not upload anything, then the identifier will not be registered.

So, at this point, you're ready to upload, but, before that, let's have a word about metadata.

Uploading without context
Please do not do this! Metadata is gold! You should include everything you know about an item when you upload it to the Archive.

A great deal of the Internet Archive's utility as a repository of information comes from its rich metadata. There are people at the Archive who regularly comb through the millions of records and sort things out after the fact. You can make their jobs a lot easier and make your item far more discoverable by actually describing it while it still has your full attention.

You can upload items without metadata, but I will not show you how to do it.

Uploading with metadata
The  function takes metadata as an argument. You can set it up as a dictionary.

The Internet Archive's service tries as much as possible to be metadata agnostic, which means you can use anything you want for keys and upload away. What's important to you is important to the Archive's records. That said, the Archive does reserve a number of keys for display and filtering purposes. The following is a list of the most basic keys.


 * : The human-readable title. Required.
 * : FILL THIS OUT; SEE NEXT SECTION. Required.
 * : FILL THIS OUT; SEE NEXT SECTION. Required.
 * : The "YYYY-MM-DD' date the item was created or published, outside the context of the Archive item. A separate key,, will be auto-populated to indicate day you added it to the Archive. "YYYY" or "YYYY-MM" are also acceptable. You can write the value in brackets (e.g., "[YYYY]" or "YYYY-[MM]") to indicate you are uncertain.
 * : The human-readable description of the item. It supports HTML.
 * : A list of strings that denote topics the item relates to. A podcast episode might include "podcast," the show title, and its topics.
 * : A list of strings that denote entities that created the item. If you want to list more than one entity, each entity should be its own string in the list you send. If it's an ongoing or collaborative show, I usually also include the show title here with each participant as a separate string.
 * : The item's language. For example, "Spanish" or "Urdu." "English (handwriting)" is separate from "English."
 * : The canonical URL that points to the copyright license. If you use a common license like any of the ones from Creative Commons, the item page will display the license's name and the proper symbols.

The Required key-value pairs will technically auto-populate with general values (the  will match the identifier and the   will be "data"), but it is imperative that you fill them out before you run the upload function. Some required fields are write-once and require admin privileges to change after the initial upload, see the next section for details.

Uploading metadata to the Archive is all-or-nothing. If any of the key-value pairs causes an error of some kind on the back end, then none of the metadata in that dictionary will be reflected after the upload. This happens if you accidentally include an admin-access-restricted key in your metadata, for example. Again, you can see the next section for more information about that.

So, without further ado, to upload something to the archive, you can write:

from internetarchive import get_item cool_podcast = get_item('amicus_lectio_0013')

md = {'title': '"Pokémon Go and the Law: Privacy, Intellectual Property, and Other Legal Concerns" by Tiffany C. Li (2016) - Amicus Lectio 0013' 'mediatype': 'audio', 'collection': 'opensource_audio', 'date': '2019-09-17', 'description': ' Pokémon GO was an immediate sensation when Niantic released it in 2016, and it continues to be one of the highest-grossing apps on mobile devices. While the hype was still high, Tiffany C. Li wrote about potential legal rankles Niantic might face on the road to becoming a Poké Fan Master.

The Paper. Mike Overby (@lethargilistic) reads Amicus Lectio (@AmicusLectio ).', 'subject': ['law', 'pokemon', 'pokemon go', 'amicus lectio', 'privacy', 'trespass', 'augmented reality', 'copyright', 'trademark', 'intellectual property'], 'creator': 'Ruha Benjamin', 'language': 'English', 'licenseurl': 'http://creativecommons.org/publicdomain/zero/1.0/'}

cool_podcast.upload('path_to_your_file.mp3', metadata=md, verbose=True)

Some stray notes:
 * The  flag optionally displays a progress bar for you.
 * If you want to run this as a test before you actually upload, set the  flag to.
 * If you want to upload multiple files with one call to this function, you can include a list of filepath strings, too. Python file objects also work.
 * The function returns a  object, so you can check the HTTP status code.

Congrats, the item is live and you've archived something forever on the Internet Archive! You can now visit https://archive.org/amicus_lectio_0013 (rather, whatever identifier you used) to check if it is appearing correctly.

DO NOT FORGET THE WRITE-ONCE METADATA
Some metadata keys are reserved for use by Internet Archive's staff only. Updating metadata is all-or-nothing. If you accidentally include any of these fields after the first upload, none of the metadata within the dictionary you send will be reflected on the item within the Archive.

I make special mention of this because you have only one chance to set values for write-once fields on the first upload.

To find out if a field is write-once, consult the documentation's list of reserved metadata. Write-once fields are indicated by "edit access: IA admin."

These are the most important write-once metadata, and I've included the most common values as examples:
 * : The mediatype indicates the overall silo your Archive item will be within.
 * , as in data files in formats like XML or CSV. The default.
 * , as in all video content.
 * , as in websites.
 * : Collections pair your item with others like it and provides further filtering. Input the identifier of the collection you wish to be included in. These are the Open Collections everyone has upload access to.
 * If you want "Community Texts", use . The default.
 * If you want "Community Audio", use.
 * If you want "Community Video", use.
 * If you want "Community Data", use.
 * If you want "Community Images", use.
 * If you want "Community Software", use.
 * If you want to upload as a "Test Item" which will be deleted after 30 days, use.
 * If you want "Community Data", use.
 * If you want "Community Images", use.
 * If you want "Community Software", use.
 * If you want to upload as a "Test Item" which will be deleted after 30 days, use.

If you accidentally upload a file without these metadata set, or you set them incorrectly, you will need to send an email to  with your request to change them. This is not an imposition upon them, although there's no guarantee of when they'll get back to you. They provide this sample email for you to use:

To: info@archive.org

Subject: Please move my item(s)

Body:

Please move these items:

archive.org/details/[item1identifier]

archive.org/details/[item2identifier]

To this collection:

archive.org/details/[collectionidentifier]

Updating metadata
Oh no! I just realized that I wrote the wrong name in the  field.

To alter metadata with Python, we would write:

from internetarchive import get_item cool_podcast = get_item('amicus_lectio_0013')

md = {'creator': 'Mike Overby'}
 * 1) To be clear, this dictionary can still include any number of pairs.

cool_podcast.modify_metadata(md)

Phew. Now the item is live and correct at https://archive.org/amicus_lectio_0013.

I hope you enjoy throwing stuff onto the human race's digital pile! Now that you know how the metadata works, it'll be easier to explore it, too!

"The book I uploaded doesn't look right?"
This tutorial covers how to upload individual items to an identifier, but the way certain kinds of Archive items appear is responsive to different kinds of structured input. For example, the way it organizes the images of book pages so they function more like a proper book requires you to format the images in a specific way with specific names and send it as a zip file. Here is an explanation of that process for books.

Shortcut functions
The  package also includes top-level   and   functions. I used the  approach for this tutorial because that made it easier to explain the Archive's system. Personally, I think the  approach encourages better habits, too. For instance, you need to create the  object to check if the identifier is already registered on the Archive.

Rate limiting
The upload API has rate-limiting and the  library does not provide assistance with complying with it. If you go over the rate limit, your account will be locked out of uploading via this API. The Archive is under the impression that this block will automatically lift after a short amount of time, but it does not. To unlock your API uploading privilege, you will have to contact. Keep in mind that that is a general contact point and the person you reach will not necessarily know about  or this rate limiting issue or the API difference between uploading via this API and the web interface. This issue has been raised on the project GitHub here and here, but was closed both times. (In the latter, lethargilistic mentions that the lock comes off after two weeks, but it does not. He was in contact with the Archive at that time and, although they were confused by the request, someone at the Archive must have lifted it without telling him they had figured out how to do so. He ran into the same issue later and the lock did not lift automatically after two weeks.)

Your web uploading privileges will not be affected by this.