Legacy Environment Post-Installation

Covering post-installation tasks that have to be performed with the platform and ingestion applications to get a system set up for ingests.


  1. Create the dpla, dashboard, dpla_api_auth, and bulk_download databases in the CouchDB Futon control panel. (e.g. http://local.dp.la:5984/_utils/index.html)  This is easier than trying to do it with the rake tasks in platform, and there are no rake tasks for the dashboard and bulk_download databases. The username and password for the control panel are in the contentqa Ansible group_vars file.

This is the only step that needs to be run when building CQAi3 boxes

In the platform application:

  1. Run rake tasks as api (sudo -u api -i) in /srv/www/api (You may need to 'rbenv shell 1.9.3-p547' first).
    $ bundle exec rake v1:create_and_deploy_index
    $ bundle exec rake v1:recreate_repo_api_key_database  # Even though you created dpla_api_auth above; for adding a view.
    $ bundle exec rails generate delayed_job
    $ bundle exec rails generate delayed_job:active_record
    $ bundle exec rake db:migrate 

Ensure that delayed_job is running, if you are using the contentqa engine. (Skip this paragraph if you don't know what contentqa is or don't need it yet.) Our configuration manager (automation) installs an init script as /etc/init.d/delayed_job_api.  Unfortunately, that script has no "status" command, so you can use ps aux | grep [d]elayed_job to find out if it's running. You should be able to use sudo service delayed_job_api start to start it, if necessary.

Install pyenv on the system where you will run ingestion. If you're using our VMs, this should be on your local system, not one of the VMs.

Install pyenv
# run as 'ingestion' user
git clone https://github.com/pyenv/pyenv.git ~/.pyenv

echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
echo 'eval "$(pyenv init -)"' >> ~/.bashrc

exec $SHELL

Install Python 2.7.6 by running pyenv install 2.7.6.  If you're on a server where the legacy ingestion system is the only Python application, make that the global default by typing pyenv global 2.7.6.

Install virtualenv by typing pip install virtualenv.

On a server that is dedicated to ingestion, we tend to use /v1/ingestion as the virtualenv and put the application in /v1/ingestion/ingestion via git clone. You'll need to create this /v1 directory as root, which means you should  run

sudo mkdir /v1
sudo chmod a+rwx /v1

Create a virtualenv environment where you will install the ingestion application. The example below shows where we put it on our dedicated ingestion server, but the location is really up to you if you're doing this locally. To configure and set up the virtrualenv:

$ virtualenv /v1/ingestion
$ source /v1/ingestion/bin/activate

Then, cd into /v1/ingestion and clone the ingestion application with

git clone https://github.com/dpla/ingestion.git

In the ingestion application:

  1. Install the necessary Python packages by running pip install -r requirements.txt in /v1/ingestion/ingestion.
  2. In ingestion, edit your akara.ini file, as suggested at https://github.com/dpla/ingestion. If you need to run the contentqa engine, set SyncQAViews=True.
    1. Please ensure that your [Twofishes] configuration is correct and uses the IP address (not the host name!) of the geo-prod box. This is a temporary patch until a long term solution using either hosts file or DNS is put in place for these "stand-alone" boxes that are not truly standalone anymore since they depend on an external Twofishes server. 
  3. Run python setup.py install
  4. Create the /v1/ingestion/ingestion/logs directory: mkdir logs
  5. Run sync_couch_views.py:

    $ python scripts/sync_couch_views.py dpla
    $ python scripts/sync_couch_views.py dashboard
    $ python scripts/sync_couch_views.py bulk_download