User requirements
- DPLA-created Primary Source Sets will include embedded high-resolution images, PDFs, audio, and video.
- Audio and video clips will average 1-5 minutes, with 15 minutes as the maximum.
- DPLA staff will upload media files.
- DPLA staff can pre-format media files so they are an acceptable size and file type.
- Copyright for media files will be held by providers; we are using them under the auspices of educational fair use.
- Allow for embeds of audio and video from other sites, for example, Vimeo, YouTube, SoundCloud, Internet Archive, and so on. (added by Mark B. Approved?) This is something that we may need to plan for in the future, but our top priority will be uploading audio and video files (AA).
Uploading
- Acceptable file size for uploading:
- We can currently accept media files up to 50MB. Staff should alert the tech team if they need to upload a larger file.
- Acceptable file types for uploading:
- Images: JPG, PNG, or GIF
- Audio: MP3, M4A, Ogg Vorbis, others. It depends on the formats the transcoder software will read.
- Video: QuickTime, WMV, etc. As with audio, it depends on the transcoder we select.
- Output formats
- Images: same as upload formats. It would be great if we could avoid making derivatives (transcoding, resizing). Unless we need to make thumbnails, then we should revisit this.
- Audio: MP3, M4A (H.264), Ogg Vorbis (Every file gets transcoded to these three formats)
- Video: webM, Ogg Theora, MP4 (H.264) (Every file gets transcoded to these three formats)
- Dimensions for image files
- Full-sized images that represent individual sources: try for 1500px on the longest side; less is okay.
- We plan to use the carrierwave gem to upload media files.
Storage
- We plan to use S3 buckets to store media files.
Transcoding
Amazon Elastic Transcoder looks promising: http://docs.aws.amazon.com/elastictranscoder/latest/developerguide/introduction.html. If we can have maintainers upload a file to a temporary directory on the application server, we can upload that to an S3 bucket (using Amazon's internal network), and then have Amazon do the transcoding. I will research this this soon as possible. (Questions: 1. Do we need to monitor a job's status, and how would we do it? (We may not need to for the first iteration.) 2. Can we reliably predict the URLs to the video assets, or do we need to use their API to get this?) – Mark B.
Related Links
Meeting notes