Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Technique example: data loader, Python to parquet #1422

Merged
merged 14 commits into from
Jul 15, 2024

Conversation

@allisonhorst allisonhorst requested review from Fil and mbostock June 3, 2024 13:48
@Fil
Copy link
Contributor

Fil commented Jun 6, 2024

can we link to https://arrow.apache.org/docs/python/generated/pyarrow.parquet.write_table.html explicitly mentioning that there are many options (and recommend compression); and maybe show compression in action?

@allisonhorst
Copy link
Contributor Author

@Fil I added the compression codec explicitly in the loader (compression="snappy"), and include a sentence pointing to the write_table docs and different compression algorithms. Look okay?

can we link to https://arrow.apache.org/docs/python/generated/pyarrow.parquet.write_table.html explicitly mentioning that there are many options (and recommend compression); and maybe show compression in action?

Copy link
Member

@mbostock mbostock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind copying the new virtual environment pattern from #1468?

<div class="note">

To run this data loader, you’ll need python3 and the geopandas, matplotlib, io, and sys modules installed and available on your `$PATH`.

</div>

<div class="tip">

We recommend using a [Python virtual environment](https://observablehq.com/framework/loaders#venv), such as with venv or uv, and managing required packages via `requirements.txt` rather than installing them globally.

</div>
@jaanli
Copy link

jaanli commented Jun 15, 2024

Quick question - would dbt work here?

Comment on lines 66 to 69
Plot.barX(dams,
Plot.groupY({x: "count"}, {y: "Primary Purpose", fill: "Hazard Potential Classification", sort: {y: "x", reverse: true}
})
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please prettier this. 🙏

Also, you can use sort: {y: "-x"} to shorten.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep will do!

@allisonhorst allisonhorst merged commit 7dcae8b into main Jul 15, 2024
4 checks passed
@allisonhorst allisonhorst deleted the allison/py-parquet-loader branch July 15, 2024 14:45
@@ -53,7 +53,7 @@ const dams = FileAttachment("data/us-dams.parquet").parquet();
We can display the table using `Inputs.table`.

```js echo
Inputs.table(dams)
Inputs.table(dams);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This prettier edit will prevent the table from displaying.

Copy link
Contributor Author

@allisonhorst allisonhorst Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was correctly updated in a subsequent commit (I turned prettier off after formatting, to leave the semicolon after FileAttachment but remove in the Inputs.table and Plot code:

https://observablehq.observablehq.cloud/framework-example-loader-python-to-parquet/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right you are, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
4 participants