RFC: convert global opcode string pointers to offsets #6206

dvyukov · 2023-07-15T12:11:59Z

The goal of this change is to reduce drdecode library size. Static library size is reduced by ~200KB (18%) and runtime VM footprint is reduced by ~50KB (10%).

    FILE SIZE        VM SIZE
 --------------  --------------
  +139% +36.2Ki  +140% +36.2Ki    .rodata
  +0.4%    +100  [ = ]       0    .strtab
  +6.1%     +79  [ = ]       0    .comment
  +0.2%     +72  [ = ]       0    .symtab
  +5.9%     +64  [ = ]       0    [ELF Header]
  +4.7%     +60  [ = ]       0    [AR Headers]
  +0.1%     +48  [ = ]       0    .rela.text
  +7.3%      +4  [ = ]       0    .llvm_addrsig
  -0.2%    -220  -0.2%    -211    .text
  -0.7%    -576  [ = ]       0    [ELF Section Headers]
 -87.2% -28.2Ki -87.2% -28.2Ki    .rodata.str1.1
 -19.8% -62.9Ki -19.8% -62.9Ki    .data.rel.ro
 -63.5%  -151Ki  [ = ]       0    .rela.data.rel.ro
 -18.6%  -206Ki -10.5% -55.1Ki    TOTAL

Any pointers in .data/.rodata take additional 24 bytes in the binary for a relocation (on top of the pointer and pointee themselves).
This change replaces opcode names in decoding tables with offsets into a specially created string that contains all opcodes. This removes the 24-byte relocation and reduces 8-byte pointer with a 2-byte offset (though, this is not necessary realized in this change due to alignment padding, but can be).

The idea behind the approach is to keep the source code as readable and as maintainable as possible, while still replacing direct string literal pointers with offsets.

The crux of the approach is as follows.
A string "foo" is replaced with STROFF(foo).
A code generator extracts all STROFF(.*) from source files and generates a single global concatenated string that contains all individual strings. Plus it generates STROFF_foo macro that expands to the offset of the string "foo" in the global string. With the addition of:

#define STROFF(foo) STROFF_ ## foo

the source code does not need any additional modifications.

The downside is that strings cannot contain any characters that cannot appear in C identifiers. This is not a problem for real opcodes, but affects some fake internal names (e.g. "(bad)" is converted to "BAD").

If this change looks good in general, I will work on productionizing it:

convert other arches
integrate code generator into the build system
rewrite the script in Python

Also:

does the STROFF prefix for everything looks good? should I change it to something else?
suggestions for conversion rules for current strings that are not valid C identifiers are welcome
potentially all strings can be kept as is, if we do: STROFF(bad, "(bad)"), normal opcodes can still be STROFF(mov), but not sure if it's worth it (?)

dvyukov · 2023-07-15T12:13:03Z

CC @melver

The goal of this change is to reduce drdecode library size. Static library size is reduced by ~200KB (18%) and runtime VM footprint is reduced by ~50KB (10%). FILE SIZE VM SIZE -------------- -------------- +139% +36.2Ki +140% +36.2Ki .rodata +0.4% +100 [ = ] 0 .strtab +6.1% +79 [ = ] 0 .comment +0.2% +72 [ = ] 0 .symtab +5.9% +64 [ = ] 0 [ELF Header] +4.7% +60 [ = ] 0 [AR Headers] +0.1% +48 [ = ] 0 .rela.text +7.3% +4 [ = ] 0 .llvm_addrsig -0.2% -220 -0.2% -211 .text -0.7% -576 [ = ] 0 [ELF Section Headers] -87.2% -28.2Ki -87.2% -28.2Ki .rodata.str1.1 -19.8% -62.9Ki -19.8% -62.9Ki .data.rel.ro -63.5% -151Ki [ = ] 0 .rela.data.rel.ro -18.6% -206Ki -10.5% -55.1Ki TOTAL Any pointers in .data/.rodata take additional 24 bytes in the binary for a relocation (on top of the pointer and pointee themselves). This change replaces opcode names in decoding tables with offsets into a specially created string that contains all opcodes. This removes the 24-byte relocation and reduces 8-byte pointer with a 2-byte offset (it takes the alignment padding in instr_info_t, so effectively -8 bytes). The idea behind the approach is to keep the source code as readable and as maintainable as possible, while still replacing direct string literal pointers with offsets. The crux of the approach is as follows. A string "foo" is replaced with STROFF(foo). A code generator extracts all STROFF(.*) from source files and generates a single global concatenated string that contains all individual strings. Plus it generates STROFF_foo macro that expands to the offset of the string "foo" in the global string. With the addition of: define STROFF(foo) STROFF_ ## foo the source code does not need any additional modifications. The downside is that strings cannot contain any characters that cannot appear in C identifiers. This is not a problem for real opcodes, but affects some fake internal names (e.g. "(bad)" is converted to "BAD"). If this change looks good in general, I will work on productionizing it: - convert other arches - integrate code generator into the build system - rewrite the script in Python Also: - does the STROFF prefix for everything looks good? should I change it to something else? - suggestions for conversion rules for current strings that are not valid C identifiers are welcome - potentially all strings can be kept as is, if we do: STROFF(bad, "(bad)") normal opcodes can still be STROFF(mov), but not sure if it's worth it

Change instr_info_t::code from ptr_int_t to uint. This strips ~128K from drdecode binary and ~50K from runtime VM overhead: FILE SIZE VM SIZE -------------- -------------- +21% +5.60Ki +22% +5.61Ki .rodata +1.5% +720 [ = ] 0 .symtab +1.8% +431 [ = ] 0 .strtab +0.5% +384 [ = ] 0 [ELF Section Headers] +0.5% +264 [ = ] 0 .rela.text +0.1% +117 +0.1% +137 .text +31% +17 [ = ] 0 .llvm_addrsig +0.2% +15 [ = ] 0 [Unmapped] +0.0% +4 [ = ] 0 .rodata.str1.1 -19.3% -61.3Ki -19.3% -61.3Ki .data.rel.ro -31.0% -73.9Ki [ = ] 0 .rela.data.rel.ro -11.5% -127Ki -10.6% -55.6Ki TOTAL The idea is to change the pointer to table number + index in the table encoded as a single uint. This removes the need in runtime relocations for pointers (each taking 24 bytes in the binary). Along with DynamoRIO#6206 this should also shrink sizeof(instr_info_t) from 48 bytes to 32 bytes (not fully realized w/o DynamoRIO#6206 due to alignment). If this change looks good in general, I will work on productionizing it and update other arches. Format of the uint encoding is discussable. For example, if we collect all instr_info_t into a single global array (and then make individual tables pointers into it), then the encoding can be made simpler and faster, but will require more significant changes.

derekbruening · 2023-07-17T20:05:13Z

core/ir/decode.h

@@ -41,6 +41,7 @@
 #define DECODE_H


@abhinav92003 and I discussed this one.

We assume PIC/PIE is a requirement for your use case and building a version of the library w/o relocations is not a possible solution?

Are there possible compiler optimizations to help here: I guess duplicate string literal and string shared suffix optimizations don't help much and we'd need the compiler to convert our pointer uses into offsets.

The code readability impact here isn't too bad; mostly it's a maintenance burden as now a script has to be run. The script would be run as part of the build process, or a dev would run it manually whenever adding new entries?

but affects some fake internal names (e.g. "(bad)" is converted to "BAD").

Could we have some indication that "BAD" is not really an opcode name? "_BAD_"?

Worth getting feedback from @khuey as well

We assume PIC/PIE is a requirement for your use case and building a version of the library w/o relocations is not a possible solution?

Yes, we need to support all build modes.

Are there possible compiler optimizations to help here

I can't think of any. The main problem is with relocations rather than deduping/etc.

The script would be run as part of the build process, or a dev would run it manually whenever adding new entries?

The intention is that it will run automatically, but I din't look into how easy it is to integrate into cmake, I assume should have required support.

Could we have some indication that "BAD" is not really an opcode name? "BAD"?

We sure can adjust them any way you want (and these will also be 100% chanable in future patches).

One other option is to make name char[16] (embed right in the struct).
Longest real x86 instruction I see are vpbroadcastmw2d/vbroadcasti64x4/vscatterpf1dpd. So 16 should be enough.
The compiler won't ensure that it's 0-terminated, but it can be reliably verified in tests.

One other option is to make name char[16] (embed right in the struct).

I like that: but it doesn't save as much space?

If we have 6778 table entries: the original proposal removed the 24 bytes of relocs plus 6 bytes for pointer-to-2-byte-offset which is 6778*30 = 203340 == your 200K savings.

Here we'd go from the 8-byte pointer to a 16-byte string, but removing the 24 reloc bytes, for a net reduction of 6778*16 = 108448 == just 100K. But the strings themselves disappear: so if the strings averaged 6 chars with newline we're saving 6778*22 = 149116. Is 150K savings enough?

Right. I did a quick prototype and this version saves 76K on top of char[16] version:

FILE SIZE VM SIZE -------------- -------------- +43% +18.9Ki +44% +18.9Ki .rodata -27.3% -96.1Ki -27.3% -96.1Ki .data.rel.ro -7.8% -76.1Ki -14.1% -77.1Ki TOTAL

@khuey do you have any feedback/objections to this and #6211?

dvyukov force-pushed the dvyukov-stroff branch from 4fe59fc to 9440ee3 Compare July 15, 2023 12:54

dvyukov mentioned this pull request Jul 17, 2023

RFC: encode instr_info_t::code as uint #6211

Draft

dvyukov marked this pull request as draft July 17, 2023 13:15

derekbruening reviewed Jul 17, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: convert global opcode string pointers to offsets #6206

RFC: convert global opcode string pointers to offsets #6206

dvyukov commented Jul 15, 2023

dvyukov commented Jul 15, 2023

derekbruening Jul 17, 2023

derekbruening Jul 17, 2023

dvyukov Jul 19, 2023

dvyukov Jul 19, 2023

derekbruening Jul 19, 2023 •

edited

Loading

dvyukov Jul 20, 2023

dvyukov Jul 20, 2023

RFC: convert global opcode string pointers to offsets #6206

Are you sure you want to change the base?

RFC: convert global opcode string pointers to offsets #6206

Conversation

dvyukov commented Jul 15, 2023

dvyukov commented Jul 15, 2023

derekbruening Jul 17, 2023

Choose a reason for hiding this comment

derekbruening Jul 17, 2023

Choose a reason for hiding this comment

dvyukov Jul 19, 2023

Choose a reason for hiding this comment

dvyukov Jul 19, 2023

Choose a reason for hiding this comment

derekbruening Jul 19, 2023 • edited Loading

Choose a reason for hiding this comment

dvyukov Jul 20, 2023

Choose a reason for hiding this comment

dvyukov Jul 20, 2023

Choose a reason for hiding this comment

derekbruening Jul 19, 2023 •

edited

Loading