Chaining data loaders #332

Fil · 2023-12-06T14:19:19Z

Suppose we want to 1. download a dataset from an API and 2. analyze it. Currently a data loader must do both at the same time, and will run the download part again if we update the analysis code.

Ideally we'd want to separate this into two chained data loaders, that would still be live, i.e. if a page relies on 2 that relies on 1, editing 1 would tell the page to require the analysis again, which would trigger a new download. But editing 2 would only run the analysis again, not the download.

This would also make it easier to generate several files from a common (and slow) download.

mbostock · 2023-12-06T15:56:48Z

From #325 (comment):

Tthe API for chaining loaders isn’t really about FileAttachment since data loaders can be written in any language. Instead a data loader needs to be able to fetch a file from the preview server (and we need an equivalent server during build). Maybe we set an environment variable which data loaders can read to know the address of the preview server. We’d also need to detect and error on circular dependencies ideally.

mythmon · 2024-01-21T00:47:11Z

I've been working on a small project to get some first hand experience with the CLI. In it I'm downloading a zip file from some hobbyist website, extracting couple dozen text files, and then running them through a custom parser. Parsing them takes about 30 seconds, which is a bit longer than I want to do in the markdown file. I'm iterating on the parser itself, so I'm re-running it every few minutes.

The options I see for my case are:

Do everything in one data loader, and naively download the zip every time. That seems rude to the small site.
Do everything in one data loader, and implement my own caching functionality. This seems tedious and exactly what I'd want a data loader to do for me.
Run the parser in the client. This feels slow, though it is nice for development to see the data load in as it generates.

From this, I have two wish list items. One is chained data loaders, the other is incremental data loaders that can somehow stream their results in to the client. I don't really know how that would work, and it's probably better suited for Notebooks anyways.

I can sympathize with wanting data loaders to be in any language, but it is very jarring to go from writing my code in a JS fenced code block script and having it work easily, to working in a .json.js data loader and suddenly all of my imports break and I lose all of the nice tools I was using a moment ago. It makes me feel like to properly use Observable CLI I need to be fluent in three varieties of JS: Markdown code blocks, browser imports, and file attachments.

Perhaps once we have the server-based dataloader workflow that Mike mentioned, we could then wrap that in a FileAttachment facade that makes it feel just like it does in Markdown files.

Fil · 2024-01-29T14:58:21Z

@mythmon FileAttachment supports streaming, see https://observablehq.com/@observablehq/streaming-shapefiles

trebor · 2024-02-02T17:17:29Z

an example of the use case in the chess bump chart example. any changes to the data transformation in the data loader require a full download of all of the data.

espinielli · 2024-02-17T22:33:41Z

This seems to point to something similar to having a dependency graph, like in the targets 📦 in R.
And the dependency is not only for the data loaders but also for assets (computational cells are already covered, aren't they?)

Fil · 2024-02-26T13:53:35Z

Tangentially related to #918.

palewire · 2024-06-25T17:50:33Z

an example of the use case in the chess bump chart example. any changes to the data transformation in the data loader require a full download of all of the data.

I'm not seeing this implemented in the chess bump example. Am I missing something?

mythmon · 2024-06-25T17:56:57Z

It's not implemented in the chess bump example. The example is a case where implementing this feature would improve the data loader(s), if Framework gained this features.

palewire · 2024-06-25T18:22:12Z

Gotcha. Here's my use case, for anyone interested.

I'd like Data Loader 1 to be a Python script that downloads a dataframe from s3, transforms the data with filter-y tricks and then writes out a JSON file that's ready to serve.

Then Data Loader 2 would be a Node.JS script that would open that very large JSON file, build a D3 graphic in a canvas object, and then write out a PNG file that could be ultimately served by the static site.

Fil added the enhancement New feature or request label Dec 6, 2023

Fil mentioned this issue Dec 6, 2023

import FileAttachment #325

Merged

cinxmo added this to the Future milestone Jan 16, 2024

mbostock mentioned this issue Feb 20, 2024

Fetch and cache data at runtime #839

Closed

Fil mentioned this issue Mar 6, 2024

Watch multiple files for data loaders with multiple sources #990

Open

mbostock removed this from the Future milestone Mar 24, 2024

kaaloo mentioned this issue Apr 24, 2024

Refactoring API data pipeline for fetching institutions data and updating Readme OKN-CollabNext/KnowHax#18

Merged

mbostock mentioned this issue May 22, 2024

US dams example dashboard #1350

Merged

Fil linked a pull request Jul 17, 2024 that will close this issue

chaining data loaders #1522

Draft

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chaining data loaders #332

Chaining data loaders #332

Fil commented Dec 6, 2023

mbostock commented Dec 6, 2023

mythmon commented Jan 21, 2024 •

edited

Loading

Fil commented Jan 29, 2024

trebor commented Feb 2, 2024

espinielli commented Feb 17, 2024

Fil commented Feb 26, 2024 •

edited

Loading

palewire commented Jun 25, 2024

mythmon commented Jun 25, 2024

palewire commented Jun 25, 2024 •

edited

Loading

Chaining data loaders #332

Chaining data loaders #332

Comments

Fil commented Dec 6, 2023

mbostock commented Dec 6, 2023

mythmon commented Jan 21, 2024 • edited Loading

Fil commented Jan 29, 2024

trebor commented Feb 2, 2024

espinielli commented Feb 17, 2024

Fil commented Feb 26, 2024 • edited Loading

palewire commented Jun 25, 2024

mythmon commented Jun 25, 2024

palewire commented Jun 25, 2024 • edited Loading

mythmon commented Jan 21, 2024 •

edited

Loading

Fil commented Feb 26, 2024 •

edited

Loading

palewire commented Jun 25, 2024 •

edited

Loading