Skip to content

Projects: twitter/scalding

Search results

Improve text formats

Updated Oct 7, 2017

We currently have a legacy reflection based TSV/CSV implementation. We also have a typeclass based FieldsDescriptor that can make such text formats correct.

We need to migrate everything to the new format, and we need to make FieldsDescriptor composable so users can more easily add their own implementations and also interop with the macros.

Search results

Improve the Serialization macros

Updated Oct 7, 2017

The current OrderedSerialization and Serialization macros have a couple of concerns.

  1. they don't use implicit recursion, so if you have a nested case class holding a custom type inside, we can't generate the OrderedSerialization. This is confusing for users and blocks some use cases.
  2. they don't compose as well as we would like: comparing a tuple2, without a length header and extra hoops, means you have to read both sides of the tuple. Even if the second part has a static size (or an easily readable size).
  3. The imports are very weird. There is some deeply nested function to import to get the macro implementations which is nearly undiscoverable.

We don't want to break source compatibility if we can help it for the most common use cases, but we want to address these issues.

Search results

Modularize the typed API

Updated Feb 15, 2018

Make the typed API independent of cascading allowing backends such as Spark and Flink. Modularize the optimizer so rules can be shared across backends.