Glob

24 Feb 2023

Data Catalog From Filesystem

This previous post demonstrated a way to use generic website html source to create an inventory of web-accessible files using discovered URL links. A similar methodology can be applied to search through filesystem directory tree to establish a catalog for any matching filetypes.

Use glob to recursively globstar-match filepaths #

In this case I use Python’s builtin glob and regular expression modules to list files and match extensions in the names. I used the os.path collection of utility methods to pull directory names from the full paths, but a more modern way would probably to use the builtin pathlib. Pandas is the only non-builtin package used, which could be removed if a DataFrame is not the desired output.

14 Sep 2022

Bash Globstar `**`

In recent versions of bash, the ** expression can be used to indicate matches for a particular directory while including sub-directories. This can be a big help when dealing with some messy filesystem structures.

1
2
shopt -s globstar
data/$year/**/*.nc

This example would match .nc files in $year and the folders below $year. Depending on the shell configuration or if running within a script, shopt -s globstar may be necessary to enable the capability.