Skip to content
View brandonrobertz's full-sized avatar

Organizations

@html-extract @dosbox-staging @next-LI
Block or Report

Block or report brandonrobertz

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
brandonrobertz/README.md

Um Yes Hello

I'm Brandon Roberts. I'm an independent data journalist specializing in open source and bringing computational techniques to journalism projects. You can read more on my site: bxroberts.org

Pinned Loading

  1. SparseLSH SparseLSH Public

    A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.

    Python 138 26

  2. propublica/django-collaborative propublica/django-collaborative Public

    ProPublica's collaborative tip-gathering framework. Import and manage CSV, Google Sheets and Screendoor data with ease.

    Python 95 18

  3. autoscrape-py autoscrape-py Public

    An automated, programming-free web scraper for interactive sites

    HTML 103 17

  4. chatgpt-document-extraction chatgpt-document-extraction Public archive

    A proof of concept tool for using ChatGPT to transform messy text documents into structured JSON

    Python 119 12

  5. html-extract/hext.js html-extract/hext.js Public

    Use Hext in a browser or with node. Hext is a domain-specific language for extracting structured data from HTML documents.

    C++ 5 1

  6. tabula-draw-columns tabula-draw-columns Public

    Simple tool to visually build column config strings for tabula-java

    HTML 1