Use Microsoft’s VSCode editor (code
), Docker Containers, and other open-source tools for scientific Python software collaboration, development, and use on Linux and Windows. Securely connect offices, remote workers, storage resources, compute resources, and the cloud with Tailscale as a replacement for traditional VPN.
System Packages
#
VSCode Editor
#
Download Visual Studio Code from Microsoft.
For a Debian-based GNU/Linux distribution like Ubuntu or Pop OS, the .deb can be installed with sudo dpkg -i code_$version_amd64.deb
.
Tailscale
#
Setup Tailscale in for machines in your office, lab, or at home to create your own secure set of personal direct connections between computers. You do need to sign up, but it is free for personal use.
https://tailscale.com/kb/start/
What is Tailscale?
Tailscale is a VPN service that makes the devices and applications you own accessible anywhere in the world, securely and effortlessly. It enables encrypted point-to-point connections using the open source WireGuard protocol, which means only devices on your private network can communicate with each other.
The Benefits
Building on top of a secure network fabric, Tailscale offers speed, stability, and simplicity over traditional VPNs.
Tailscale is fast and reliable. Unlike traditional VPNs, which tunnel all network traffic through a central gateway server, Tailscale creates a peer-to-peer mesh network (called a tailnet)…
Docker on GNU/Liunux
#
The following set of commands are a combination of the instructions found in the Docker Community Edition (docker-ce) Debian/Ubuntu installation and setup documentation.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
|
# Update the apt package index and install packages to allow apt to use a repository over HTTPS:
sudo apt-get update
sudo apt-get install \
ca-certificates \
curl \
gnupg \
lsb-release
# Add Docker’s official GPG key:
sudo mkdir -m 0755 -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
# Use the following command to set up the repository:
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# Update the apt package index:
sudo apt-get update
# install latest docker et al.
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
# Verify that the Docker Engine installation is successful by running the hello-world image:
sudo docker run hello-world
# The docker group grants root-level privileges to the user. For details on how this impacts security in your system, see Docker Daemon Attack Surface.
# To run Docker without root privileges, see Run the Docker daemon as a non-root user (Rootless mode).
# May not be necessary depending on your OS
#sudo groupadd docker
sudo usermod -aG docker $USER
sudo systemctl enable docker.service
sudo systemctl enable containerd.service
sudo systemctl start docker.service
sudo systemctl start containerd.service
|
Docker Desktop and WSL2 (Windows only)
#
Docker Desktop is the Windows version of Docker available for installation here.
The easiest way to install WSL for Windows is now through the app store.
Install Python and Libraries in Containers
#
I recommend starting with an Ananconda Python distribution, and then installing/updating anything extra with apt for system packages and pip for Python libraries.
Generic Python modules recommended for science-focused work: black, pandas, numpy, scipy, xarray, matplotlib, bokeh, panel, plotly, jupyterlab, nvitop, dash, jupyter-widgets, zarr, datashader, fastapi, unyt, dask/distributed, diskcache, pandas and polars.
Essential metocean and geospatial modules: proj, geos, curl, wavespectra, eccodes, cartopy, rasterio, netcdf4, xarray, cfgrib, cmocean, pyephem, metpy, wrf-python, geojson, xpublish, and shapely.
Cartopy lazily downloads and then locally caches some of the geographic datasets it uses which is problematic for use in ephemeral containers. I recommend add the local dataset cache in during the build (below), or forcing the a manually download into the image during the build process itself.
Here is a starter Dockerfile template using build stages.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
|
# syntax=docker/dockerfile:1
FROM continuumio/anaconda3 AS base
RUN apt-get update && apt-get install -y \
apt-transport-https \
ca-certificates \
curl \
zip \
gnupg2 \
software-properties-common && \
curl -fsSL https://download.docker.com/linux/debian/gpg | apt-key add - && \
add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/debian \
$(lsb_release -cs) \
stable" && \
apt-get -y update && apt-get install -y docker-ce && \
apt-get autoclean
RUN cat /etc/apt/sources.list
RUN apt-get install -y gfortran \
build-essential \
cmake \
nodejs \
npm \
memcached \
libsqlite3-0 \
libsqlite3-dev \
libtiff-dev \
libcurl4 \
&& apt-get autoclean
ENV NetCDF_C_INCLUDE_DIR /usr/local/include
ENV NetCDF_C_LIBRARY /usr/local/lib
ENV CURL_LIBRARY /opt/conda
ENV CURL_INCLUDE_DIR /opt/conda
RUN conda install libcurl
FROM base AS eccodes
WORKDIR /build
RUN wget https://confluence.ecmwf.int/download/attachments/45757960/eccodes-2.23.0-Source.tar.gz
RUN bash -c 'tar -xzf eccodes-2.23.0-Source.tar.gz && \
cd eccodes-2.23.0-Source && \
mkdir build && \
cd build && \
cmake ../ && \
make -j8 && \
# ctest && \
make install && \
ldconfig && cd /build && rm -r eccodes-2.23.0-*'
FROM base AS proj
WORKDIR /build
RUN wget https://download.osgeo.org/proj/proj-9.0.1.tar.gz
RUN tar xvzf proj-9.0.1.tar.gz && \
cd proj-9.0.1 && \
mkdir build && cd build && \
cmake -DCURL_LIBRARY="$CURL_LIBRARY" -DCURL_INCLUDE_DIR="$CURL_INCLUDE_DIR" \
-DBUILD_TESTING=OFF -DENABLE_CURL=OFF -DBUILD_PROJSYNC=OFF .. && \
cmake --build . --target install && \
cd ../../ && rm -r proj-9.0.1*
FROM base AS geos
WORKDIR /build
RUN wget https://download.osgeo.org/geos/geos-3.11.0.tar.bz2
RUN tar xvfj geos-3.11.0.tar.bz2 && \
cd geos-3.11.0 && \
mkdir build && cd build && \
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr/local .. && \
make -j 4 && make install && \
cd ../../ && rm -r geos-3.11.0*
# Install Python Modules on the base system stage including the manually compiled eccodes, proj and geos
FROM base AS pypkgs
COPY --from=eccodes /usr/local /usr/local
COPY --from=proj /usr/local /usr/local
COPY --from=geos /usr/local /usr/local
RUN ldconfig
# Install most of your Python Packages
RUN pip install timezonefinder wavespectra black fastapi rasterio fiona pygc uvicorn wrf-python metpy cmocean plotly dash s3fs xarray unyt pyephem zarr h5netcdf datashader adlfs intake xpublish geojson pre-commit netcdf4 OWSLib eccodes recommonmark dask==2022.5.2 distributed==2022.5.2 diskcache panel hvplot holoviews ipyleaflet folium pydeck pymemcache polars
# Configure container apt installed geos to override anything installed in the conda environment
RUN ln -s /usr/local/lib/libgeos_c.so /opt/conda/lib/libgeos_c.so
ENV GEOS_CONFIG /usr/local/bin/geos-config
RUN pip uninstall -y cartopy shapely && pip install shapely cartopy>=0.20.3 --no-binary shapely --no-binary cartopy>=0.20.3
# Install selenium and firefox backend to use with it
RUN pip install selenium
RUN apt install -y firefox-esr
# Reconcile optional pandas dependency
RUN pip install --upgrade "pandas>=1.5.0" "jinja2==3.0.0"
RUN conda list
ENTRYPOINT /opt/conda/bin/python
# Copy cartopy cache directory into build stage
FROM base AS cartopydata
COPY cartopy_data /root/.local/share/cartopy
RUN ls /root/.local/share/cartopy/
# Install other system utilities in other build stages if necessary
FROM base AS bats
RUN apt install -y bats
# Combine the separate stages into a finalized production environment
FROM pypkgs AS production
COPY --from=cartopydata /root/.local/share/cartopy /root/.local/share/cartopy
COPY --from=bats /usr/bin/bats /usr/bin/bats
|
Jupyter Notebook and Lab
#
VSCode Extensions
#
These extensions allow everyone in your organization to easily get on the same page.
- Remote Development Extension Pack by Microsoft (Includes WSL connections, Dev Containers, and Remote-SSH connections)
- Docker Extension by Microsoft
- Dev Containers Extension by Microsoft
- Python Intellisence Extension by Microsoft
- Pylance Extension
- Jupyter Notebook Support by Microsoft
- YAML Language Support by Red Hat
- Ansible by Red Hat
- Diff by Fabio Spampinato
- Image Gallery by GeriYoco
- isort by Microsoft
- Bats Extension by J-Et. Martin
- Bash Beautify by Ahmed Hamdy
- Markdown Lint by David Anson
- Instant Markdown by David Bankier
- Makefile Tools by Microsoft
- ShellCheck by Timon Wong
VSCodium is a fully open-source alternative to VSCode which is distributed from the same codebase but doesn’t include Microsoft’s closed source components and telemetry. Most of extensions above should be available for installation in VSCodium except for potentially a few of the most usefull extensions, like Microsoft’s Remote Development Extension Pack and Pylance/Python-Intellisense.
.devcontainer.json
#
The Remote Development for Containers extension uses this file to configure the container runtime used for development in VSCode. This file should be located in folders that will use your Python container for development or testing, and the editor will recognize the file and allow you work with the directories files within the containerized environment.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
|
{
"image": "192.168.100.80:5005/oceanwx/anaconda3:20230310",
"mounts": [
"source=/mnt,target=/mnt,type=bind,consistency=cached",
"source=/home,target=/home,type=bind,consistency=cached",
"source=/media,target=/media,type=bind,consistency=cached",
],
"containerEnv": {
"MPLBACKEND": "AGG",
"PYTHONPATH": "/mnt/O/Python/Sub",
"localuser": "${localEnv:USER}",
},
"customizations": {
"vscode": {
"extensions": [
"ms-python.python",
"jetmartin.bats",
"ms-python.isort",
// "ms-toolsai.jupyter",
// "ms-toolsai.jupyter-renderers",
"redhat.vscode-yaml"
],
"settings": {
"terminal.integrated.shell.linux": "/bin/bash",
"terminal.integrated.inheritEnv": false,
"python.condaPath": "",
"python.pythonPath": "/opt/conda/bin/python",
"python.defaultInterpreterPath": "/opt/conda/bin/python",
"python.testing.pytestEnabled": true,
"python.formatting.provider": "black",
"python.terminal.activateEnvironment": false,
"terminal.integrated.env.osx": {
"PATH": "/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
},
"terminal.integrated.env.linux": {
"PATH": "/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
},
"terminal.integrated.env.windows": {
"PATH": "/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
}
}
}
}
}
|
Change the “mounts” section to reflect any local or mounted filesystems that you will be accessing for development. I add all my office and cloud network storage which are located in /mnt
, all of my home folders, and anything external storage I have plugged into my PC from /media
.
1
2
3
4
5
|
"mounts": [
"source=/mnt,target=/mnt,type=bind,consistency=cached",
"source=/home,target=/home,type=bind,consistency=cached",
"source=/media,target=/media,type=bind,consistency=cached",
],
|
Shared Storage
#
- Network attached storage (cifs, smb, nfs, sshfs, …)
- Access files from remote server with SSH remote extension in VSCode
- Centralized cloud storage (s3, s3fs, dropbox, …)
- Peer-to-peer syncronization with syncthing
Tips and Good Practices
#
- Shell Scripts using BASH-compatible syntax
- py.test unit-testing for python methods, modules and small chunks of code to ensure expected outputs of your building blocks
bats
testing for BASH and Python scripts
Container Security
#
The many layers of software dependencies within containers can add extra security considerations. Luckily there are a number of tools like Trivy and Docker scan
to scan Docker images for vulnerabilities.