Examples of accessible data inventories

Simple over Standards #

Frequently the biggest hurdle to sharing data is the pre-existing organization of files. While open standards based API access can smooth over the bumps inherent to locally stored heterogeneous collections of files, the amount of effort to setup an existing API webservice to share your data can be overwhelming. (Not to mention having to roll your own in an standards complant way.)

The NOAA example below is a great example of a simple solution that can be used as a template to provide easy programatic access to any dataset.

HTML as Catalog #

The “pretty” part of the web that we interact with daily is based on HTML files, which can serve as data catalogs in themselves if leveraged appropriately by smart organizations. Needing to download a set of files that listed as links on a webpage, I often use the following hacky method scrape a static webpage for links with specific extensions.

The static HTML scrape hack for a set of webpages I often need to check the the links point to re,al files and the content paths sometimes change without warning. Nine times out of ten this method gets the job done.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

import pandas as pd
import bs4
import requests
from urllib.parse import urlparse

ext = {"json", "pdf", "png", "gif", "jpg", "tcw", "txt", "xml", "kml", "kmz", "zip"}

catalog = []
url = ["https://www.navcen.uscg.gov/north-american-ice-service-products",
       "https://www.metoc.navy.mil/jtwc/jtwc.html",
       "https://www.metoc.navy.mil/jtwc/rss/jtwc.rss"]
for mount in url:
    r = requests.get(mount)
    domain = mount.split(":")[0] + "//" + urlparse(mount).netloc
    links = bs4.BeautifulSoup(r.text, features="lxml").find_all(name="a")
    for link in links:
        if "href" in link.attrs:
            if link.attrs["href"] is not None:
                iext = link.attrs["href"].split(".")[-1]
                if iext in ext:
                    linkpath = link.attrs["href"]
                    catalog.append((link.text, domain, f"{domain}{linkpath}", iext,))
catalog = pd.DataFrame(catalog, columns=["Description", "Folder", "Fullpath", "Extension",])
if catalog.shape[0] > 1:
    catalog["Pattern"] = catalog.Fullpath
    catalog["LowerPath"] = catalog.Fullpath.str.lower()
    catalog["LowerDescription"] = catalog.Description.str.lower()

catalog.to_csv("temp.csv", index=False)

temp.csv output: #

Description Folder Fullpath Extension Pattern LowerPath LowerDescription
2023 ANNOUNCEMENT OF SERVICES https//www.navcen.uscg.gov https//www.navcen.uscg.gov/sites/default/files/pdf/iip/Announcement_of_Services_2023.pdf pdf https//www.navcen.uscg.gov/sites/default/files/pdf/iip/Announcement_of_Services_2023.pdf https//www.navcen.uscg.gov/sites/default/files/pdf/iip/announcement_of_services_2023.pdf 2023 announcement of services
NAIS CUSTOMER SURVEY https//www.navcen.uscg.gov https//www.navcen.uscg.gov/sites/default/files/pdf/iip/CG16700.pdf pdf https//www.navcen.uscg.gov/sites/default/files/pdf/iip/CG16700.pdf https//www.navcen.uscg.gov/sites/default/files/pdf/iip/cg16700.pdf nais customer survey
NAIS ICEBERG INFORMATION AND SERVICES https//www.navcen.uscg.gov https//www.navcen.uscg.gov/sites/default/files/pdf/iip/VOOP.pdf pdf https//www.navcen.uscg.gov/sites/default/files/pdf/iip/VOOP.pdf https//www.navcen.uscg.gov/sites/default/files/pdf/iip/voop.pdf nais iceberg information and services
TODAY’S NAIS ICEBERG CHART https//www.navcen.uscg.gov https//www.navcen.uscg.gov/sites/default/files/images/iip/data/current_NAIS65.gif gif https//www.navcen.uscg.gov/sites/default/files/images/iip/data/current_NAIS65.gif https//www.navcen.uscg.gov/sites/default/files/images/iip/data/current_nais65.gif today’s nais iceberg chart
NAIS ICEBERG CHART INFO SHEET https//www.navcen.uscg.gov https//www.navcen.uscg.gov/sites/default/files/pdf/iip/Iceberg_Chart_Information_Sheet_2019.pdf pdf https//www.navcen.uscg.gov/sites/default/files/pdf/iip/Iceberg_Chart_Information_Sheet_2019.pdf https//www.navcen.uscg.gov/sites/default/files/pdf/iip/iceberg_chart_information_sheet_2019.pdf nais iceberg chart info sheet
DOWNLOAD TODAY’S NAIS ICEBERG SHAPEFILE https//www.navcen.uscg.gov https//www.navcen.uscg.gov/sites/default/files/iip/shape/currentShape.zip zip https//www.navcen.uscg.gov/sites/default/files/iip/shape/currentShape.zip https//www.navcen.uscg.gov/sites/default/files/iip/shape/currentshape.zip download today’s nais iceberg shapefile
TODAY’S NAIS ICEBERG BULLETIN https//www.navcen.uscg.gov https//www.navcen.uscg.gov/sites/default/files/iip/bulletin/IcebergBulletin.txt txt https//www.navcen.uscg.gov/sites/default/files/iip/bulletin/IcebergBulletin.txt https//www.navcen.uscg.gov/sites/default/files/iip/bulletin/icebergbulletin.txt today’s nais iceberg bulletin
NAIS ICEBERG BULLETIN INFO SHEET https//www.navcen.uscg.gov https//www.navcen.uscg.gov/sites/default/files/pdf/iip/Iceberg_Bulletin_Information_Sheet_2019.pdf pdf https//www.navcen.uscg.gov/sites/default/files/pdf/iip/Iceberg_Bulletin_Information_Sheet_2019.pdf https//www.navcen.uscg.gov/sites/default/files/pdf/iip/iceberg_bulletin_information_sheet_2019.pdf nais iceberg bulletin info sheet
DOWNLOAD TODAY’S NAIS ICEBERG AND SEA ICE KML FILE https//www.navcen.uscg.gov https//www.navcen.uscg.gov/sites/default/files/iip/kml/currentKml.kml kml https//www.navcen.uscg.gov/sites/default/files/iip/kml/currentKml.kml https//www.navcen.uscg.gov/sites/default/files/iip/kml/currentkml.kml download today’s nais iceberg and sea ice kml file
DOWNLOAD THE CURRENT ICEBERG OUTLOOK FILE https//www.navcen.uscg.gov https//www.navcen.uscg.gov/sites/default/files/pdf/iip/outlook/IcebergOutlook.pdf pdf https//www.navcen.uscg.gov/sites/default/files/pdf/iip/outlook/IcebergOutlook.pdf https//www.navcen.uscg.gov/sites/default/files/pdf/iip/outlook/icebergoutlook.pdf download the current iceberg outlook file
TC Warning Text https//www.metoc.navy.mil https//www.metoc.navy.milhttps://www.metoc.navy.mil/jtwc/products/sh0823web.txt txt https//www.metoc.navy.milhttps://www.metoc.navy.mil/jtwc/products/sh0823web.txt https//www.metoc.navy.milhttps://www.metoc.navy.mil/jtwc/products/sh0823web.txt tc warning text
TC Warning Graphic https//www.metoc.navy.mil https//www.metoc.navy.milhttps://www.metoc.navy.mil/jtwc/products/sh0823.gif gif https//www.metoc.navy.milhttps://www.metoc.navy.mil/jtwc/products/sh0823.gif https//www.metoc.navy.milhttps://www.metoc.navy.mil/jtwc/products/sh0823.gif tc warning graphic
Prognostic Reasoning https//www.metoc.navy.mil https//www.metoc.navy.milhttps://www.metoc.navy.mil/jtwc/products/sh0823prog.txt txt https//www.metoc.navy.milhttps://www.metoc.navy.mil/jtwc/products/sh0823prog.txt https//www.metoc.navy.milhttps://www.metoc.navy.mil/jtwc/products/sh0823prog.txt prognostic reasoning
JMV 3.0 Data https//www.metoc.navy.mil https//www.metoc.navy.milhttps://www.metoc.navy.mil/jtwc/products/sh0823.tcw tcw https//www.metoc.navy.milhttps://www.metoc.navy.mil/jtwc/products/sh0823.tcw https//www.metoc.navy.milhttps://www.metoc.navy.mil/jtwc/products/sh0823.tcw jmv 3.0 data
Google Earth Overlay https//www.metoc.navy.mil https//www.metoc.navy.milhttps://www.metoc.navy.mil/jtwc/products/sh0823.kmz kmz https//www.metoc.navy.milhttps://www.metoc.navy.mil/jtwc/products/sh0823.kmz https//www.metoc.navy.milhttps://www.metoc.navy.mil/jtwc/products/sh0823.kmz google earth overlay
IR Satellite Imagery https//www.metoc.navy.mil https//www.metoc.navy.milhttps://www.metoc.navy.mil/jtwc/products/08S_241200sair.jpg jpg https//www.metoc.navy.milhttps://www.metoc.navy.mil/jtwc/products/08S_241200sair.jpg https//www.metoc.navy.milhttps://www.metoc.navy.mil/jtwc/products/08s_241200sair.jpg ir satellite imagery
Satellite Fix Bulletin https//www.metoc.navy.mil https//www.metoc.navy.milhttps://www.metoc.navy.mil/jtwc/products/sh0823fix.txt txt https//www.metoc.navy.milhttps://www.metoc.navy.mil/jtwc/products/sh0823fix.txt https//www.metoc.navy.milhttps://www.metoc.navy.mil/jtwc/products/sh0823fix.txt satellite fix bulletin
ABPW10 (Western/South Pacific Ocean) https//www.metoc.navy.mil https//www.metoc.navy.milhttps://www.metoc.navy.mil/jtwc/products/abpwweb.txt txt https//www.metoc.navy.milhttps://www.metoc.navy.mil/jtwc/products/abpwweb.txt https//www.metoc.navy.milhttps://www.metoc.navy.mil/jtwc/products/abpwweb.txt abpw10 (western/south pacific ocean)
ABIO10 (Indian Ocean) https//www.metoc.navy.mil https//www.metoc.navy.milhttps://www.metoc.navy.mil/jtwc/products/abioweb.txt txt https//www.metoc.navy.milhttps://www.metoc.navy.mil/jtwc/products/abioweb.txt https//www.metoc.navy.milhttps://www.metoc.navy.mil/jtwc/products/abioweb.txt abio10 (indian ocean)
Satellite Image https//www.metoc.navy.mil https//www.metoc.navy.milhttps://www.metoc.navy.mil/jtwc/products/abpwsair.jpg jpg https//www.metoc.navy.milhttps://www.metoc.navy.mil/jtwc/products/abpwsair.jpg https//www.metoc.navy.milhttps://www.metoc.navy.mil/jtwc/products/abpwsair.jpg satellite image

Martin County DEM Data #

My first impulse was to scrape this site for the DEM tile .img files given that they all are provided as url links on the page.

However, in this case they have saved me some time and the website already provides a list of the publically available tile files, and ancillary files like a tile index, and metadata.

To download all the .img files, all I have to do is the following from a terminal:

1
curl https://chs.coast.noaa.gov/htdata/raster2/elevation/FL_Martin_County_DEM_2016_6254/urllist6254.txt | grep ".img" | parallel wget {}

So simple, but so useful for folks who would not be keen to try to write code to parse html and extract the relevent links and possible to construct with minimum effort.

Next time I provide files for a client I think I’ll provide a link to my web accessible file list and the sample download command right in my email.