Inventory

24 Feb 2023

Data Catalog From Filesystem

This previous post demonstrated a way to use generic website html source to create an inventory of web-accessible files using discovered URL links. A similar methodology can be applied to search through filesystem directory tree to establish a catalog for any matching filetypes.

Use glob to recursively globstar-match filepaths #

In this case I use Python’s builtin glob and regular expression modules to list files and match extensions in the names. I used the os.path collection of utility methods to pull directory names from the full paths, but a more modern way would probably to use the builtin pathlib. Pandas is the only non-builtin package used, which could be removed if a DataFrame is not the desired output.