An image is a virtual machine setup for SciServer Compute, which comes pre-installed with important data analysis packages.
When you create a container in SciServer Compute, you have the option of selecting an Image to use with that container. Some images are designed to support specific programming languages, while others are designed to support research within a specific science domain.
The lists below describe each of the images that you can select when you create a new container in SciServer Compute. You may only select one image per container, and you may only create up to three containers.
All SciServer Images
All SciServer images – except Matlab – come pre-installed with the SciServer modules/packages/libraries, which allow SciServer Compute to communicate with all other SciServer components (e.g. CasJobs, SkyQuery, etc.). Although these SciServer communication packages come pre-installed with each image, you still must import them within your scripts. You can do this with the import command in Python or the install command in R. For further information on what the SciServer modules/packages/libraries contain and how they work, see SciServer API Documentation.
All images are based on Scientific Linux 7’s official Docker images. All contain the following packages:
- The CentOS “Development Tools” package group, which provides access to GCC’s C and C++ compilers, gfortran, Autotools, ctags, flex/bison, make, git, subversion, and other useful tools
- Several X11-related libraries, including cmake and wget
- The time zone database (the so-called “Olson Database”) is explicitly re-installed so that programs making use of it will continue to work this is the case for packages using the
OlsonNamesfunction in R
To see the full list of packages in the CentOS Development Tools package group, open a terminal and run the following command:
curl http://mirror.centos.org/centos/7/os/x86_64/repodata/repomd.xml 2>/dev/null | sed -r 's#xmlns[^=]*?="[^"]*"##g' | xmllint --xpath "repomd/data[@type='group']/location/@href" -|sed -e 's#href="#http://mirror.centos.org/centos/7/os/x86_64/#' -e 's/"//' |xargs -n1 wget -O- -q | xmllint --xpath "comps/group/id[.='development']/parent::group/packagelist/packagereq[@type!='optional']" - | sed -e 's##\n#g' -e 's#
If none of these images contains the packages that you need for your work, you can always use pip or conda to install new packages. To install a new package, create a new notebook (or open an existing notebook) and type the following in its own Code cell at the top of the notebook:
!pip install [package]
!conda install [package]
replacing [package] (and the surrounding brackets) with the with the name of the package you want to install. Don’t forget to include the exclamation point at the beginning of the line.
Python + R
The Python + R image is a good default image for working with SciServer using those scripting languages. When you create a new notebook, use the dropdown menu to specify whether that notebook will use Python 2, Python 3, or R. You can also upload existing scripts as .ipynb, .py or .r files.
Python scripts can be written in either Python 2 or Python 3; if you are new to Python, we recommend Python 3, as some features of Python 2 will no longer be supported in the future.
The list below provides full details about the Python + R image.
- The Python + R image, like all SciServer images, can be accessed through various user interfaces, but the underlying image is the same. The default is Classical Jupyter, which is ideal for most research and education use cases. The JupyterLab user interface has some advanced features that may be useful, while the RStudio user interface is optimized for working with R scripts.
- There is a known issue that JupyterLab will remember the files opened and try to restore them, even between different containers
- The image can use any of the following versions:
- The Anaconda Python 2 distribution of Python 2.7
- The Anaconda Python 3 distribution of Python 3.5
- The Anaconda R Essentials distribution of R 3.4
- Python 2, Python 3, and R versions of our SciScript libraries are installed
- The image has redis and libdynd installed for both R and Python (2 and 3)
- The image comes installed with both Python 2 and Python 3 versions of the following packages:
- In addition to the Anaconda R Essentials distribution, the image also comes installed the bit64 and jpeg R packages.
- The RStudio image uses RStudio 1.1.453 as the interface.
- For Python 2 users: Python 2 and its packages are installed in a conda environment called “py27.” To use this version of Python in scripts within a SciServer Compute terminal, it will be necessary to run the command
source activate py27. This is only necessary for running scripts in a terminal; when you create a new Python 2 notebook from the dropdown menu, it happens automatically.
The Matlab R2016a image is the only one in which you can write notebooks in Matlab. This image is not based on the Python + R image described above.
The BeakerX image provides access to the BeakerX package on top of the Python+R image. The primary advantage of using it is that it allows notebooks to use JVM languages, such as Java, Kotlin, Scala, Clojure, and Groovy.
The Julia image, built on top of the Python + R image, allows you to write notebooks in Julia. It comes installed with the
PyCall packages, and is available with either the Classical Jupyter or JupyterLab interfaces.
While there is no SciServer library available for Julia, the Python version of SciScript is available. For example, the following retrieves the list of Compute Jobs:
@pyimport SciServer.Jobs as Jobs
We would like to hear from users of this image, and are open to feedback concerning what packages would be useful.
LSST Science Pipeline (Astronomy)
The LSST Science Pipeline image is designed to address the use cases of the upcoming Large Synoptic Survey Telescope (LSST). The LSST is an 8.4-meter telescope, now under construction in Chile, that will conduct the largest-ever survey of the night sky. The LSST will not obtain first light until 2019, but its science team is now developing the data processing and analysis pipelines to support its ambitious mission. This Compute Image is optimized to support that design work.
- This image is not based on the Python + R image
- Red Hat’s devtoolset-7 software collection is installed, providing much newer development tools (e.g., GCC 7)
- Version 15.0 of the LSST pipeline is installed – specifically, the lsst_distrib package, which contains almost all the is installed, containing almost all packages required to run the LSST pipeline
- The startup script for Classical Jupyter ensures that the LSST packages are setup, so relevant commands like
eupsshould immediately work upon starting this image.
The Recount image is associated with Recount, a genomics project has created a searchable online database of RNA gene sequences from more than 2,000 published studies. The image is designed for use with the Recount public data volume, which can be mounted onto a new container at the same time the image is selected.
The Recount image is based on the Python + R image, and comes installed with the R-based Bioconductor genomics analysis package, version 3.6.
In addition to the packages already installed in the base Python + R image, the Recount image comes with the following packages:
- knitcitations from CRAN
- GenomeInfoDbData from bioconductor
- regionReport from bioconductor
The bioconda channel has been added as the preferred repository for packages in this image.
JH Turbulence DB (Fluid dynamics)
The JH Turbulence DB image on SciServer provides functionalities to access directly datasets archived and maintained on the Johns Hopkins Turbulence Databases (JHTDB, http://turbulence.pha.jhu.edu/). The system contains space-time data of turbulent flows from the output of world-class high-resolution direct numerical Navier-Stokes simulations. The data are publicly available to the research community. The package pyJHTDB (https://github.com/idies/pyJHTDB) provides a Python interface for querying, downloading, and analyzing the data. The built-in functions include evaluating simulation fields and computing spatial differentiation, interpolation, filtering and particle tracking directly on data clusters. By providing this open simulation laboratory for turbulence research and the python notebook capabilities of SciServer, we hope that broader access to data from simulations will further accelerate turbulence research in coming years.
In addition to the packages from the Python + R image above, the following packages are installed for both the Anaconda Python 2 and Python 3 environments:
Geo (Earth/social sciences)
The Geo image comes with packages to create maps and conduct geospatial analyses with Geographic Information Systems (GIS). This image is ideal for research in fields such as earth science and social science, where important quantities vary with geography.
The Geo image is built from the Python + R image, with the following additional packagaes preinstalled:
The Oceanography image is designed to work with the Johns Hopkins Ocean Circulation Models described on the Datasets page.
The analysis of these large datasets is often restricted by limited computational resources. To address this issue, a team led by Mattia Almansi has developed OceanSpy, a python package that facilitates extracting information from model output fields. SciServer users can use the modules included in this Image to run analyses online and store post-processing files within SciServer Compute.
Here is a list of packages available in the oceanography image (on top of miniconda):
OceanSpy: Extraction of oceanographic properties
Extracting information from the model output can be done along ‘surveys’, ‘mooring arrays’, or at point locations. The ‘survey’ resembles a hydrographic ship survey, with equidistant ‘stations’ (vertical profiles) along a great-circle path between two points on the Earth. The data (e.g. temperature and salinity) on the vertical profiles are interpolated from the regular grid onto the station locations. The ‘mooring array’ creates a zig-zag path along the model grid through user-defined mooring locations in the ocean. It enables exact calculation of the transport through an arbitrary curve in lat-lon space. Last, oceanographic properties can be extracted in random locations in the model 4D space, facilitating comparison with floats in the ocean.
OceanSpy: Computation of useful diagnostics
Apart from extracting readily available information, OceanSpy can be used to calculate new diagnostics that are not part of the model output. For example, OceanSpy can calculate the velocity component orthogonal to a ‘survey’, it can calculate the Brunt–Väisälä frequency, the Ertel Potential Vorticity, the eddy kinetic energy, the horizontal divergence, volume fluxes, etc. For regions in the model domain where all the required model diagnostics are available, it can also calculate all heat and salt budget terms to machine precision.