Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert APIs to always use forge ByteBuffer or a new forge buffer class #146

Open
dlongley opened this issue Jul 18, 2014 · 17 comments
Open
Assignees

Comments

@dlongley
Copy link
Member

In making forge APIs more consistent and to better take advantage of the features available on various platforms (eg: TypedArrays), the forge APIs should be changed to work with forge ByteBuffers or a new forge *Buffer class (name TBD) instead of binary-encoded strings where applicable. The general approach should always be to accept a forge *Buffer and output a forge *Buffer, but APIs can be more flexible as needed; however, if a string can be accepted, then an encoding parameter should be given or a default should be spelled out clearly in the documentation.

Work has begun on a TypedArray-based ByteBuffer vs. the binary-encoded string-based ByteBuffer -- and this should be advanced along-side work to convert APIs to *Buffer inputs/outputs. Some of this new work started returning TypedArrays but it should be changed to return *Buffers by default.

@jduncanator
Copy link
Contributor

@dlongley I saw that code. Was about to start implementing something myself. What isn't done? I'd happily keep working on this if you could point out which areas are still "experimental".

I think the general idea of implementing an "interface" of sorts (sorry, I come from OO land) is a good idea. Having a generic Buffer base class that exposes all the methods and then implementing them in "sub-classes". The majority of the forge API wouldn't need to even know which implementation it was using, it would just use the generic Buffer class which would handle the feature detection itself.

@dlongley
Copy link
Member Author

dlongley commented Aug 5, 2014

I saw that code. Was about to start implementing something myself. What isn't done? I'd happily keep working on this if you could point out which areas are still "experimental".

Here's what needs to be done:

  1. Settle on a name for the new ByteBuffer or decide to reuse that name. As usual, naming is hard -- we want to accurately convey that this thing is an interface for storing binary data without confusing it with node Buffers (as forge is often used in node.js), without confusing it with some kind of bitstream, allowing for the name to be consistent with providing access to the underlying bytes via getInt*() calls, and making clear that this thing can grow or shrink (though we may offer an option to lock it to a particular size).
  2. I want the new implementation to allow a ByteBuffer to simply wrap an existing ArrayBuffer (or its view) to prevent unnecessary data copying in certain scenarios. This means figuring out how to deal with auto-grow as well. By "figuring out" I mean ensuring that it doesn't behave in an unexpected way.
  3. We need to find those places in the forge API that do not accept ByteBuffer as an input and fix that. That means allowing those API calls to accept both ByteBuffer and a binary-encoded string (for backwards compatibility). We also want to add a string 'encoding' parameter in these cases to allow for a different encoding (eg: 'utf8') to be specified, with 'binary' as the default. It would be nice for 'utf8' to be the default (like node), however, I think backwards compatibility is more important. I imagine that once we finish updating everything to accept ByteBuffers (or their alternative name), we can do a major version bump and change string encoding defaults to 'utf8'. Note that I'm not a fan of functions in JS that have lots of parameters -- so when that's the case, we prefer the "options" parameter approach that essentially allows you to provide named parameters instead. If we need to add 'encoding' to some of these APIs, we probably want to take that route.
  4. Some of the forge API calls return binary-encoded strings -- which may make it difficult to update them to return ByteBuffers instead. I suspect that most of the functions that behave that way (if not all of them) also take a binary-encoded string as input, so we may experiment with returning ByteBuffers if ByteBuffers are given and strings if strings are given -- as a way to keep things backwards compatible.
  5. I started trying to clarify how binary/string encoding/decoding worked with some better-named APIs, I don't remember how much more work is required there. Some tests are certainly needed.

IIRC, I tried to mark the experimental stuff as "Experimental" in the docs in the code, so hopefully you can search for functions marked as such.

I think the general idea of implementing an "interface" of sorts (sorry, I come from OO land) is a good idea.

Agreed, this is a good use case for an interface. But, never go full-OO :). A better way to put it -- OO is a useful paradigm... for exactly this kind of problem. I find it troubling when it is used as a panacea because, in general, the end result is that people end up writing a lot of code that does nothing (it's all abstraction code) and that leaves far less time for actually useful code.

The majority of the forge API wouldn't need to even know which implementation it was using, it would just use the generic Buffer class which would handle the feature detection itself.

Yeah, we just want forge to be using an interface for the (Byte)(Something)(Buffer). This is probably the best first step -- getting the APIs to be consistently using this interface for inputs and outputs. Then we can focus on getting the TypedArray/DataView version fixed up (I see you mentioned it in another issue). This would also aid in letting us get some performance tests written so we can swap them out and see how things perform in various browsers and in node. We can then set some sane defaults based on platform -- but allow the user to configure which underlying implementation they want to use if they really want to.

There's probably more I could brain dump in here, but I'll respond to some other issues now.

Btw, @jduncanator, it would be awesome if you want to contribute!

@dlongley
Copy link
Member Author

dlongley commented Aug 5, 2014

We also need more unit tests and I'd like to have some decent performance tests that could be run to check for regressions, etc. It would be nice to have those before making changes to establish some base lines, but if that's not possible, I can accept that.

@jduncanator
Copy link
Contributor

@dlongley The reason I'm here is to contribute 😉 I'd be more than willing to work on this considering this is an area I actually know what I'm doing 😛

The "design" of the interface is really what needs to be decided on here, not so much the implementation. Ideally we'd do an audit of the code base and single out which parts of the API don't accept ForgeBuffers and change it so they do. To keep things backwards compatible I think we should employ something Node.js has. Return binary strings by default, but allow the user to specify an "output" encoding as well. For example cipher.output.bytes() would continue to return a binary string but cipher.output.bytes('buffer') would return a ForgeBuffer (or whatever). This way it will behave perfectly in a drop-in situation but also allows the user to "transition" over and use more performant APIs. Alternatively you could implement the toString and valueOf methods of the object and enable a sort of "implicit casting" when the object was used like a string but have the object actually be a ForgeBuffer. Before all of this, we should probably be asking ourselves whether we actually want to implement backwards compatibility or whether we release a few versions with deprecation warnings? Main reasoning for this is that, like most things on the web, if they ain't broken no one touches them. If we gave people a way to use .bytes('buffer') and we later wanted to switch the default to returning a ForgeBuffer then people won't have a reason to change their code. Much like vendor-prefixes. People don't touch their code, even after they don't need vendor prefixes anymore. Its all good discussion but I think the compatibility thing is something that needs further thought and investigation (eg. how are people currently using the library?)

I think the name shouldn't really matter. Ideally we would try and maintain the current ForgeBuffer API and simply implement it in a different way internally. That way the user doesn't even need to care that things changed, they just see a more performant ForgeBuffer (by ForgeBuffer I'm referring to the existing Buffer implementation). Personally, I think that the ForgeBuffer should be completely transparent and should be the only thing consumers should ever have to create and interface with, with the library managing the "backing" store internally.

I'll start working on a unified buffer implementation that tries to be as much backwards compatible as possible based around the current string-backed buffers. Once that's done it should be trivial to implement TypedArrays.

Regarding performance and unit tests, I agree. We should most certainly be writing unit tests as we develop the new code, as well as develop regression tests for anything that breaks during the development. You can never have too many tests!

@jduncanator
Copy link
Contributor

We should also try and get a decent encoding implementation developed too! Rather than have the code directly in the buffer implementation, modularize it and keep it separate. Helps with fixing issues. I'd be more than happy to port the Node.js implementations over to forge!

@dlongley
Copy link
Member Author

dlongley commented Aug 6, 2014

@jduncanator,

The reason I'm here is to contribute.

Great! Also note that some work was started on this in js/util.js, for instance the ArrayBuffer-backed implementation: https://github.com/digitalbazaar/forge/blob/master/js/util.js#L630. In general, search for "Experimental" in that file. Just want to make sure you know there's something to work off of already.

You may have just been giving an example, but regarding changing buffer.bytes(), I'd actually just like to provide newer methods like buffer.toString([encoding]) and buffer.toTypedArray()/buffer.toArrayBuffer() (or similar). Then we'd deprecate buffer.bytes(), buffer.getBytes(), and any other more ambiguous APIs. I don't remember, but some of that experimental work may have already been heading in that direction.

I think the order of priorities ought to be:

  1. An API that makes sense -- as in, it is clear what the inputs/outputs probably are without reading the documentation (I'd like this as much as possible), and there are no surprises.
  2. An API that is familiar either to node devs or browser devs or both.
  3. An API that is backwards compatible.

Obviously, getting all three is optimal -- and there may have to be exceptions in certain cases.

I agree that we ought to do an audit on existing functions and categorize things like so:

Buffers as inputs:

  1. Functions that currently accept forge buffers as inputs.
  2. Functions that could accept forge buffers as inputs and maintain backwards compatibility fairly easily.
  3. Functions that can't easily be changed to accept forge buffers as inputs and will likely need to be deprecated.

Buffers as outputs (return values):

  1. Functions that output a forge buffer.
  2. Functions could output a forge buffer if given one as an input and maintain backwards compatibility.
  3. Functions that can't easily adapt to outputting a forge buffer and will likely need to be deprecated.
@jduncanator
Copy link
Contributor

@dlongley Introducing some new methods on top of the buffer is actually a really good idea. We would have almost zero issues with backwards compatibility and we can make the API a lot more sane (currently the method names are, lets say, a little odd...)!

@Ayms
Copy link

Ayms commented Aug 6, 2014

Some thoughts, suggestions:

  • Not saying that my past implementation should be a model, but basically all functions in it (for some mentioned above) that input or output strings should be modified to input and output buffers also (keeping strings for backward compatibility), using a general option "I am using strings/typed arrays" (in my case I had clearly to make the distinction while handling results if I was using forge buffers or my typed arrays, really painfull...)
  • with typed arrays everything is utf-8, use TextEncoder/Decoder (surprisingly still not available in Chrome and IE, while there is a Google implementation here http://code.google.com/p/stringencoding/)
  • do not use DataViews for endianness purposes (quite slow with node, see https://github.com/Ayms/abstract-tls/blob/master/lib/abstract-tls.js#l273-334), reuse the forge functions (ex: Uint8Array.prototype.readUInt32LE=function(o) {return this[o] ^ this[o+1] << 8 ^ this[o+2] << 16 ^ this[o+3] << 24;};)
  • not sure the buffer upper class should not be Buffer itself, to align with node, so the same code works for both, my implementation is doing this, as well as aligning crypto functions with node (createcipheriv, etc) but not sure everybody would like this.
  • not related directly to the buffers but for SSL/TLS I would give up with the state machin model for the messages and move to a more evented model
@dlongley
Copy link
Member Author

Some of the buffer-centric API changes have begun on this branch: https://github.com/digitalbazaar/forge/tree/buffer-based-api

The plan is to just go ahead and break the API as needed to support requiring buffers as inputs and outputs and release a new minor version (we're still under 1.0 so it won't be a major change) so we'll be pushing out 0.7 with a changelog entry that indicates the breakages.

@dlongley dlongley self-assigned this Dec 9, 2014
@dlongley
Copy link
Member Author

dlongley commented Dec 9, 2014

See #200 for some potential encoding optimizations.

@jduncanator
Copy link
Contributor

Where are we at with this?

@dlongley
Copy link
Member Author

@jduncanator,

We've made pretty decent progress. There's still more to do, but most of the work has been completed in the buffer-based-api branch. For a comparison to master see: https://github.com/digitalbazaar/forge/compare/buffer-based-api

We still need to:

  1. Keep going through APIs to make sure everything is using buffers, but most APIs are converted over.
  2. Clean up the buffer API itself, deprecating methods like getBytes and replacing them with things like toArrayBuffer or similar, that are more explicit about what is going on.

That branch will also become the 0.7.x branch as it encompasses a few more features from the roadmap than just buffer-related fixes/changes.

@UnsungHero97
Copy link

any updates on toTypedArray() and toArrayBuffer() support?

@dlongley
Copy link
Member Author

@UnsungHero97,

I'd say forge 0.7.x is about 80-90% done (which is where the TypedArray support has been added), but we've been spread a bit thin lately on a number of different higher-priority projects. As time frees up, we'll shift back and finish up.

https://github.com/digitalbazaar/forge/compare/0.7.x

@UnsungHero97
Copy link

@dlongley thanks for the update!

@jduncanator
Copy link
Contributor

👍

@CMCDragonkai
Copy link

What happened to this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
5 participants