List of exercises
Full list
This is a list of all exercises and solutions in this lesson, mainly as a reference for helpers and instructors. This list is automatically generated from all of the other pages in the lesson. Any single teaching event will probably cover only a subset of these, depending on their interests.
Recording dependencies
Dependencies-1: Time-capsule of dependencies
Situation: 5 researchers (A, B, C, D, E) wrote code that depends on a couple of libraries. They uploaded their projects to GitHub. We now travel 3 years into the future and find their GitHub repositories from the respective publications. We would like to try to re-run their code before adapting it. Which of the following do you think you will get to work?
A: You find a couple of library imports across the code but that’s it.
B: The README file lists which libraries were used.
C:
You find a environment.yml file with:
name: student-project
channels:
- conda-forge
dependencies:
- scipy
- numpy
- sympy
- click
- python
- pytorch
- pip
- pip:
- git+https://github.com/someuser/someproject.git@master
- git+https://github.com/anotheruser/anotherproject.git@master
D:
You find a environment.yml file with:
name: student-project
channels:
- conda-forge
dependencies:
- scipy=1.3.1
- numpy=1.16.4
- sympy=1.4
- click=7.0
- python=3.8
- pytorch=1.10
- pip
- pip:
- git+https://github.com/someuser/someproject.git@d7b2c7e
- git+https://github.com/anotheruser/anotherproject.git@sometag
E:
You find a environment.yml file with:
name: student-project
channels:
- conda-forge
dependencies:
- scipy=1.3.1
- numpy=1.16.4
- sympy=1.4
- click=7.0
- python=3.8
- pytorch=1.10
- someproject=1.2.3
- anotherproject=2.3.4
A: You find a couple of library imports across the code but that’s it.
B: The README file lists which libraries were used.
C:
You find a requirements.txt file with:
scipy
numpy
sympy
click
python
git+https://github.com/someuser/someproject.git@master
git+https://github.com/anotheruser/anotherproject.git@master
D:
You find a requirements.txt file with:
scipy==1.3.1
numpy==1.16.4
sympy==1.4
click==7.0
python==3.8
git+https://github.com/someuser/someproject.git@d7b2c7e
git+https://github.com/anotheruser/anotherproject.git@sometag
E:
You find a requirements.txt file with:
scipy==1.3.1
numpy==1.16.4
sympy==1.4
click==7.0
python==3.8
someproject==1.2.3
anotherproject==2.3.4
A:
You find a couple of library() or require() calls across the code but that’s it.
B: The README file lists which libraries were used.
C: You find a DESCRIPTION file which contains:
Imports:
dplyr,
tidyr
In addition you find these:
remotes::install_github("someuser/someproject@master")
remotes::install_github("anotheruser/anotherproject@master")
D: You find a DESCRIPTION file which contains:
Imports:
dplyr (== 1.0.0),
tidyr (== 1.1.0)
In addition you find these:
remotes::install_github("someuser/someproject@d7b2c7e")
remotes::install_github("anotheruser/anotherproject@sometag")
E: You find a DESCRIPTION file which contains:
Imports:
dplyr (== 1.0.0),
tidyr (== 1.1.0),
someproject (== 1.2.3),
anotherproject (== 2.3.4)
Solution
A: It will be tedious to collect the dependencies one by one. And after the tedious process you will still not know which versions they have used.
B: If there is no standard file to look for and look at, it might become very difficult to create the software environment required to run the software. At least we know the list of libraries, but we don’t know the versions.
C: Having a standard file listing dependencies is definitely better than nothing. However, if the versions are not specified, you or someone else might run into problems with dependencies, deprecated features, changes in package APIs, etc.
D and E: In both of these cases exact versions of all dependencies are specified and one can recreate the software environment required for the project. One problem with the dependencies that come from GitHub is that they might have disappeared (what if their authors deleted these repositories?).
E is slightly preferable because version numbers are easier to understand than Git commit hashes or Git tags, but is most often out of scope for research projects, as it requires a significant overhead to submit these versions to the respective repositories
Recording environments
(optional) Containers-3: Explore two really useful Docker images
You can try the below if you have Docker installed. If you have Singularity/Apptainer and not Docker, the goal of the exercise can be to run the Docker containers through Singularity/Apptainer.
Run a specific version of Rstudio:
$ docker run --rm -p 8787:8787 -e PASSWORD=yourpasswordhere rocker/rstudio
Then open your browser to http://localhost:8787 with login rstudio and password “yourpasswordhere” used in the previous command.
If you want to try an older version you can check the tags at https://hub.docker.com/r/rocker/rstudio/tags and run for example:
$ docker run --rm -p 8787:8787 -e PASSWORD=yourpasswordhere rocker/rstudio:3.3
Run a specific version of Anaconda3 from https://hub.docker.com/r/continuumio/anaconda3:
$ docker run -i -t continuumio/anaconda3 /bin/bash