Shell script data loader examples

In Observable Framework, data loaders can be created in shell scripts — they will be called with the system shell "sh".

Parquet

The data loader below accesses data on alternative fuel stations from the U.S. Department of Energy, simplifies to only California stations in SQL, then returns an Apache Parquet file.

Create a file in your project source root with the .parquet.sh double extension (for example, docs/data/my-data.parquet.sh), then paste the code below to get started.

Then, to access the output (data/alt-fuel-stations.parquet) in a page, we create a helper function:

function absoluteFA(FA) {
  const {url} = FA;
  FA.url = async function() {
    return new URL(await url.apply(FA), document.location.href).href;
  }
  return FA;
}

And read in the file using FileAttachment:

const caAltFuel = await DuckDBClient.of({
  fuelstations: absoluteFA(FileAttachment("data/alt-fuel-stations.parquet"))
});

caAltFuel

const fuelTable = caAltFuel.query("SELECT * FROM fuelstations");

Inputs.table(fuelTable)

JSON

Sometimes, all you need is curl!

The data loader below accesses geojson of CalTrans districts from the California Open Data Portal.

Create a file in your project source root with the .json.sh double extension (for example, docs/data/my-data.json.sh), then paste the code below to get started.

Access the output of the data loader from the client using FileAttachment:

const caltrans = FileAttachment("data/caltrans-districts.json").json()

The file attachment name does not include the .sh extension. We rely on Framework’s routing to run the appropriate data loader.

We can now explore the JSON output:

caltrans

CSV

Working in a shell script is flexible. Within the shell script, work in whatever you language you like to access and prep your data, then write to standard output.

The data loader example below starts a Python script, accesses the penguins data data from a local file and does some basic wrangling, then writes a CSV to standard output.

Create a file in your project source root with the .csv.sh double extension (for example, docs/data/my-data.csv.sh), then paste the code below to get started.

Access the output of the data loader from the client using FileAttachment:

const penguins = FileAttachment("data/penguin.csv").csv({typed: true})

The file attachment name does not include the .sh extension. We rely on Framework’s routing to run the appropriate data loader.

Inputs.table(penguins)