class: center, middle # Repeatable Workflows with
Docker and IPython Notebooks ## Andrew Odewahn @odewahn ### O'Reilly Media _February 7, 2015_ http://odewahn.github.io/repeatable-workflows-docker-ipython --- # We're not this guy... .center[
] --- # We're these guys... .center[
] --- # I'm this guy... * Founder of [Atlas](https://atlas.oreilly.com), O'Reilly Media's next generation authoring tool * Author of [DDS Field Guide](http://sites.oreilly.com/odewahn/dds-field-guide/) and [Docker Jumpstart](http://odewahn.github.io/docker-jumpstart/) * Currently exploring how to produce "next generation content" * I'm really excited about [IPython Notebooks](http://ipython.org/notebook.html) and the workflows needed to produce them .center[
] --- # Agenda ## Quick introduction to [IPython Notebooks](http://ipython.org/notebook.html) ## O'Reilly experiments with IPython Notebooks ## Docker as a critical tool for shareable, repeatable environments --- # IPython Notebooks ### "an open-source, interactive computing environment that lets you combine live code, narrative text, mathematics, plots and rich media in one document." * Founded by [Fernando Perez](http://fperez.org/) from U.C. Berkeley * 500K – 1.5M users * Used in scientific community to "narrate computation" * Applications in * Biology * Chemistry * Data Science * Economics * Geo-Spatial data * Machine Learning * Physics * Psychology * Social sciences * Statistics --- # Selections from the [Gallery](https://github.com/ipython/ipython/wiki/A-gallery-of-interesting-IPython-Notebooks#reproducible-academic-publications) ### [An Introduction to Applied Bioinformatics](https://github.com/odewahn/An-Introduction-To-Applied-Bioinformatics) ### [A Crash Course in Python for Scientists](http://nbviewer.ipython.org/gist/rpmuller/5920182) by Rick Muller ### [Warming Ocean Threatens Sea Life](http://www.scientificamerican.com/article.cfm?id=warming-ocean-threatens-sea-life) by Roberto de Almeida ### [St. Louis County Segregation Analysis](http://nbviewer.ipython.org/github/buzzfeednews/2014-08-st-louis-county-segregation/blob/master/notebooks/segregation-analysis.ipynb) by Jeremy Singer-Vine ### [Automatic segmentation of odor maps in the mouse olfactory bulb using regularized non-negative matrix factorization](http://www.sciencedirect.com/science/article/pii/S1053811914003103) by J. Soelter ### [Fractals, complex numbers, and your imagination](http://nbviewer.ipython.org/github/cfangmeier/ipython_notebooks/blob/master/Imagination.ipynb), by Caleb Fangmeier --- # Editing in Markdown and Latex
--- # "Kernels" for code and graphs
--- # Interactive Demo * The Notebook was featured in _Nature_'s [Sharing the code](http://www.nature.com/news/interactive-notebooks-sharing-the-code-1.16261) * The article included a live demo powered by [tmpnb](https://github.com/jupyter/tmpnb)
--- # O'Reilly and IPython Notebooks * Powerful tool for creating learning experiences * Looking to offer Notebook-based learning content (online and in person) * [Just Enough Math](https://github.com/odewahn/jem-docker) by [Paco Nathan](http://liber118.com/pxn/) at Strata (~200 attendees) .center[
] --- # Content originated in [Atlas](https://atlas.oreilly.com) .center[
] --- # JSON data format * Atlas source was [HTML5](https://github.com/oreillymedia/HTMLBook) * Converting to Notebook format (`ipynb`) was straightforward * Internally, Notebooks uses a JSON format composed of "cells" * Heading * Markdown * Code ```javascript ... { "cell_type": "markdown", "metadata": {}, "source": [ "### The Units of Probability Density\n", "\n", "This is an important thing to remember: the units of a ", "probability density are the inverse of the units of the", "quantity in question. Keeping this in mind will help you", "recognize all sorts of common mistakes in computations of probability." ] }, ... ``` --- # Collaborative Workflows using git and GitHub * GitHub provides easy way to share work with colleagues and collaborators, and is the preferred way to share Notebook content * Notebooks are plain JSON text, so you _can_ collaborate effectively using git / GitHub (more on this later) * Supports the classic [git flow](http://nvie.com/posts/a-successful-git-branching-model/) model * Clone the repository * Make a local branch * Commit locally * Push local commits to a new branch * Send a pull request --- # Sharing Notebook text is easy, but ensuring a common *computing environment* emerged as a key challenge * We started with [conda](http://conda.pydata.org/) * This broke down quickly * Different versions of Python (2 vs 3) * Different package managers (pip, brew, easy_install, etc) * Non-Python based packages (i.e., Fortran-based numerical computing libraries) * Missing backing services (databases, redis, etc) --- # Docker to the rescue .center[
] * Easy way to reuse and modify "base" images * Simple formats (Dockerfile and Fig files) for specifying project-specific dependencies * Ability to package everything (code, server, and dependencies) into a shareable image * Easy to take snapshots and audit across time --- # Pre-built base images * Many project, IPython Notebooks included, offer pre-built images with many of the dependencies already installed (and maintained!) * I started with [ipython/scipystack](https://registry.hub.docker.com/u/ipython/scipystack/) * cython, h5py, matplotlib, numpy, pandas, patsy, scikit-learn, scipy, seaborn, sympy, yt .center[
] --- # Dockerfile The Dockerfile allows you to specify a base image, add your dependencies and code, setup the environment, and other "traditional" sysadmin-style tasks. Here's an example. The highlighted lines select the base image: ```bash *FROM ipython/scipystack MAINTAINER Andrew Odewahn "http://odewahn.com" RUN apt-get update RUN pip install Flask WORKDIR /usr/src CMD ipython notebook --ip=0.0.0.0 --no-browser ``` --- # Dockerfile -- Adding Dependencies The Dockerfile allows you to specify a base image, add your dependencies and code, setup the environment, and other "traditional" sysadmin-style tasks. Here's an example. The highlighted lines select add some dependencies: ```bash FROM ipython/scipystack MAINTAINER Andrew Odewahn "http://odewahn.com" *RUN apt-get update *RUN pip install Flask WORKDIR /usr/src CMD ipython notebook --ip=0.0.0.0 --no-browser ``` --- # fig.yml [Fig](http://www.fig.sh/) provides a simple way to build and start Docker containers should start. The highlighted lines map the current directory on the host to `/usr/src` in the container: ```yml web: build: . * volumes: * - .:/usr/src:rw ports: - "8888:8888" ``` --- # Simple to package and share ```bash $ docker build -t odewahn/jem-docker . $ docker push odewahn/jem-docker ``` .center[
] --- # Audit trail `docker history` shows the build process for a specific image: ```bash $ docker history
IMAGE CREATED CREATED BY SIZE 8f7d431f32fa 2 weeks ago /bin/sh -c #(nop) CMD [/bin/sh -c ipython2 no 0 B 88dfeda79e2c 2 weeks ago /bin/sh -c #(nop) WORKDIR /usr/src/notebook 0 B de650a94a6e0 2 weeks ago /bin/sh -c #(nop) EXPOSE map[8888/tcp:{}] 0 B 56f049f53134 2 weeks ago /bin/sh -c apt-get install -y libsm6 libxrend 34.12 kB ... ``` `docker diff` shows changes made to the container as it runs: ```bash $ docker diff
C /root C /root/.cache C /root/.cache/matplotlib ... ``` --- # JEM Demo Local demo of [Just Enough Math](https://github.com/odewahn/jem-docker). *NB*: This is a fork of [Paco's main project](https://github.com/ceteri/jem-docker) ### Requirements Currently, this project only runs on Linux or on a Mac (using boot2docker). Windows does not currently support local volumes, although that is an active area of development. * Docker 1.3 * boot2docker 1.3 * fig 1.0.0 ### Instructions Follow along the
demo script
--- # Challenges ## Keeping up with changes in Docker ## Issues around boot2docker and local volumes ## Merge issues with JSON in IPYNB ## Still very much a "hacker/developer" experience --- # References * [IPython Notebook](http://ipython.org/). Main home for IPython and the IPython Notebook. * [Notebook Gallery](https://github.com/ipython/ipython/wiki/A-gallery-of-interesting-IPython-Notebooks#reproducible-academic-publications). An inspiring collection of notebooks from a variety of fields. * [Docker](https://www.docker.com/). Main homepage for Docker. * [boot2docker](http://boot2docker.io/). An additional tool for running Docker on a Mac or Windows. * [Fig](http://www.fig.sh/install.html). A tool for managing Docker containers. * [Docker Jumpstart](http://odewahn.github.io/docker-jumpstart). A free guide to Docker I put together based on my personal notes. * [http://odewahn.github.io/repeatable-workflows-docker-ipython/](http://odewahn.github.io/repeatable-workflows-docker-ipython/). This presentation. Andrew Odewahn
@odewahn
http://www.odewahn.com