Buffer includes issue with escaped characters [closed]

Question

Closed. This question needs debugging details. It is not currently accepting answers.

Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.

Closed 10 days ago.

Improve this question

In the context of a babel plugin, I reading a Buffer from a file to check some content.

I'm looking specifically for the following string ɵɵfoobar escaped as \u0275\u0275foobar.

When I printing my buffer with myBuffer.toString() I can see

...

[...\u0275\u0275foobar()]
...

But when I check the content, I never get a positive. I've tried following :

myBuffer.includes('ɵɵfoobar')
myBuffer.includes('\u0275\u0275foobar')
myBuffer.toString().includes('ɵɵfoobar')

Also note that myBuffer.includes('foobar()') returns true.

Any idea what I'm doing wrong ?

You sure it’s not actual backslashes in the source? myBuffer.includes('\\u0275\\u0275foobar') Anyway, please reduce this to a minimal reproducible example – .subarray, .toString('hex'), and Buffer.from('…', 'hex') can help with that. — Ry-, Commented Jul 8 at 1:08

Luke · Accepted Answer · 2024-07-08 01:03:49Z

The encoding for the character you are looking for (ɵ) is 0x0275 in UTF-16 encoding, while it is 0xC9 or 0xB5 in UTF-8 encoding. Since you're looking for it as 0x0275, we know this file you've read into the buffer is encoded in UTF-16.

Node.js's Buffer.toString() accepts an 'encoding' as its first parameter. The default value of this is 'utf8'. This means that you must provide 'utf16le' as the first argument when calling the toString method (eg buffer.toString('utf16le')) if you want to be able to match against the UTF-16 encoding as \u0275. Note that Node.js only supports the little-endian variant of UTF-16.

myBuffer.includes('foobar()') returns true because all of the characters you are searching for in the string 'foobar()' are represented the same in ASCII as they are in UTF-8 (eg: 'f' is encoded as 0x66), and ASCII is a proper subset of UTF-8 (remember, we just saw UTF-8 is the default encoding of the Node.js Buffer.toString method).

If you're curious, this post on ASCII vs Unicode has some great jumping off points for some differences and encoding concepts.

@MatthieuRiegler can you elaborate on what "messes up the output" means for your situation? What example string was in the buffer as input, and what outputs are you getting with UTF-8 and UTF-16, respectively? — Luke, Commented Jul 8 at 1:07

Collectives™ on Stack Overflow

Buffer includes issue with escaped characters [closed]

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
javascript
node.js
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged javascriptnode.js or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
javascript
node.js
or ask your own question.