Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

international character sets #64

Open
pljakobs opened this issue Dec 27, 2015 · 5 comments
Open

international character sets #64

pljakobs opened this issue Dec 27, 2015 · 5 comments

Comments

@pljakobs
Copy link
Contributor

happy to see you adapted some of my code from ftGFX!
I'll have a look at the new code in Adafruit_GFX and, if you don't mind, might start contributing to this codebase (no use to have multiple forks that do the same thing).
My first priority (and the actual reason I forked from Paul's mfGFX fork) was latin-1 characters, and while working on the code, I figured, it was quite a bit of effort to support more than just this one character set.
My approach was to create an i18n library that would provide the mapping functions.
The initial intention was to leave all the print / writeChar etc. routines the same (that is: not enable them for UTF-8) in order not to create a comaptibility hell but instead frontend them with a UTF8toISO() call that would translate UTF8 to a predefined 256 character table. I chose this way since it allows me to still use a contiguous block of memory as my character bitmap and adress it using a linear mapping table (the GFXglyph table in this version).
The alternative would be a full blown UTF-8 implementation, but that would a) mean potentially huge fonts with thousands of glyphs (not very useful on uControllers) and b) searching through a sparsly populated glyph table for the matching UFT-8 code cell. I still believe that ISO8859-x translation is the better way to go.

So, I'm most happy to start working on this, but woud appreciate if you could let me know if you have any larger scheme that you would i18n to fit into.

@PaintYourDragon
Copy link
Contributor

Howdy. I think we're on the same page about UTF-8. While some of the newer MCUs may have a meg of flash or more and could probably hold huge charsets, there are enough other design compromises in GFX (intended as something to fit on an Uno) that trying to use it in a rich typographic application would just be misguided...more sensible in that situation to use a different library. It is what it is.

I'd taken a look at your encoding table scheme and intentionally avoided the issue by just converting 7-but ASCII for now, reason being to avoid painting ourselves into a corner too soon (basically, if we pick one 8-bit encoding to use, kinda have to stick with it forever or it creates compatibility headaches for users later (a la the CP437 bug & hack workaround). Maybe we pick a specific 8-bit encoding for the bundled fonts (or a set of encodings, e.g. maybe the monospaced font uses CP437 while the others are Latin-1, or maybe that's just ugly and inconsistent). Though another thought that occurs to me is just to provide maybe 2-3 different encodings and the user can pick whatever suits their application...or, provide no 8-bit encodings by default, but let them use the fontconvert tool to produce their own .h as needed.

i.e. originally I was thinking the next step would be a second font file at each size/style, like this:
FreeMonoBold9pt7b.h
FreeMonoBold9pt8b.h
Where the second file uses some 'most broadly applicable' 8-bit encoding we've decided on, maybe Latin-1 or something.

But maybe instead we do:
FreeMonoBold9pt7b.h
FreeMonoBold9ptCP437.h
FreeMonoBold9ptLatin1.h
This would be a huge number of files in the repository (in the Fonts folder), but the conversion is automated (makefonts.sh) so it's not like a horrible process, and the extra files only take code space if actually #included.

Or, "plan C" as mentioned above, just leave it at:
FreeMonoBold9pt7b.h
And let user produce their own 8-bit font file(s) in whatever encoding(s) their application requires (folding your encoding tables into the fontconvert.c tool). (Might add CP437 table...I know it's ridiculous in this day and age, but having it for behavioral consistency with the built-in 5x7 font...same glyphs in same positions.)

At such time that this is added, might change fontconvert command-line options so instead of first & last char, it's an encoding name or table number (default would be 7-bit ASCII unless an encoding is specified).

@pljakobs
Copy link
Contributor Author

I tend to agree, a "rich typographic environment" isn't what this should be about.
The ISO8859-x tables hold 255 glyph positions each of which 64 are normally empty. We would therefore look at roughly a 50% increase in size - well, a bit more since my glyph table would still contain a few all-zero entries in order to be easily accessible. If you want Windows codepages (including line drawing characters) those would occupy the other 64 positions.
I would consider conditional includes. If you include the font lib, you should #define an encoding. If none is defined, we'd include 7bit charsets, but if an encoding is defined, we'd include the suitable fonts and, along with it, a suitable translation routine (some of those UTF-8 to ISO mappings are pretty wild and will only easily work by including a table. If we load all of them every time, that might easily be a few k or more. (let's not get started with BIG5 😲)
makefont can be extended to create either a specific or all ISO encodings easy enough and thus creating a localized sketch might be as simple as setting a #define. Would require a bit of preprocessor magic, though.

@pljakobs
Copy link
Contributor Author

thought some more about it:

I'd globally define an encoding (as a #define) and use #if encoding == latin-1 / #endif to add code / data:

  • UTF2ISO & ISO2UTF functions (not sure what the latter will be needed for, but it's in fact the easier translation since it's just reading UTF from an array) The former will be a lot of case statements methinks. At least for everything not latin-1
  • if makefont.c is extended to create fonts with multiple encodings, I think the easiest way would be to have a "master" include file that conditionally includes the matching font file. I would probably throw all the different encodings into one directory and keep just the master include at top level.
#if encoding == latin1
#include <some-fancy-font-12pt/latin1.h>
#elif encoding == latin 2
#include <some-fancy-font-12pt/latin2.h>
#endif

I believe this would keep the unnecessary amount of code in the binary to a minimum and allow any developer to select their specific character set. If someone provides another language version, they could just use the most suitable character encoding and be on their way upon recompile.

Now, latin-6 and latin-8 would be an issue, since they're right-to-left (actually bidirectional) scripts, I think we'd leave that to a language native ;-)
@fweiss
Copy link

fweiss commented Jan 23, 2017

I think that UTF-8 needs to be handled, but all the glyphs don't need to displayed. When I use the Adafruit Bluefruit LE client, it does appear to send as UTF-8. Try the additional special characters available from the Android keyboard, such as pi and square root. I notice that sequence displays 5 glyphs, one of which is the pi glyph. I'm going to create a mapper for that in my Android Neopixel app.

@KeeTraxx
Copy link

I'm working on a project which needs German and French characters.

I just wanted to add that fontconvert works fine for this.

fontconvert myfont.ttf 11 32 252

And then save my source files in iso-8859-1 will work fine and print the correct characters to the display.

It's still not a proper way to handle things, but it will work in a pinch for those looking for a simple solution for languages that are within iso-8859-1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
4 participants