LB Booster
« Limitations of Syntax Coloring »

Welcome Guest. Please Login or Register.
Apr 1st, 2018, 03:31am



ATTENTION MEMBERS: Conforums will be closing it doors and discontinuing its service on April 15, 2018.
We apologize Conforums does not have any export functions to migrate data.
Ad-Free has been deactivated. Outstanding Ad-Free credits will be reimbursed to respective payment methods.

Thank you Conforums members.
Speed up Liberty BASIC programs by up to ten times!
Compile Liberty BASIC programs to compact, standalone executables!
Overcome many of Liberty BASIC's bugs and limitations!
LB Booster Resources
LB Booster documentation
LB Booster Home Page
LB Booster technical Wiki
Just BASIC forum
BBC BASIC Home Page
Liberty BASIC forum (the original)

« Previous Topic | Next Topic »
Pages: 1 2 3  Notify Send Topic Print
 veryhotthread  Author  Topic: Limitations of Syntax Coloring  (Read 3135 times)
Richard Russell
Administrator
ImageImageImageImageImage


member is offline

Avatar




Homepage PM


Posts: 1348
xx Limitations of Syntax Coloring
« Thread started on: Sep 6th, 2016, 8:36pm »

This has been discussed before, but it has come up again in a private email so I thought it might be helpful once again to explain it.

The way I have bolted Syntax Coloring onto the Windows Rich Edit control relies on knowing what font the control is using to render the text. Normally this is straightforward - it is the font selected in the LBB 'Options... Set Font...' dialog - but there is a complication that arises if that selected font doesn't contain every character needed.

In that circumstance the Rich Edit control automatically substitutes a character from an alternative font that does include the character. Unfortunately that breaks the Syntax Coloring because it assumes the same font is used everywhere. The result is a real mess: the colored text doesn't line up with the uncolored text!

A classic situation in which this issue manifests itself is if you select FixedSys as the editor font. FixedSys is not a TrueType font: it's a non-scalable bitmapped font. But more importantly it's not a Unicode font, and includes only the basic ANSI 8-bit character set. So if you attempt to display a foreign-language (e.g. accented) character from a little-used character set - the case in point was Turkish - it is very likely that a font substitution will take place and the Syntax Coloring will fail.

There are only two ways of avoiding this problem: the first is to disable Syntax Coloring and the second is to select a font that includes every character you might need. Fortunately selecting a Unicode font and enabling Unicode Support in the Options menu is likely to satisfy the second of these. Add %options unicode to your program too.

Richard.
« Last Edit: Sep 6th, 2016, 9:28pm by Richard Russell » User IP Logged

CryptoMan
New Member
Image


member is offline

Avatar




PM

Gender: Male
Posts: 46
xx Re: Limitations of Syntax Coloring
« Reply #1 on: Sep 17th, 2016, 11:02pm »

I still think it is something else because FixedSys font contains Turkish characters.

If you go to Liberty Basic Font Selection Dialog for choosing Editor Font you can see that there is an option called "Script" and in this we can see Turkish exists for FixedSys. In some other fonts there are even richer national language support.

However, I have a font called "Elephant" and this font only supports "Western" and yes I know the effect of font substitution for this font.

I am very picky about fonts and not only here I observe these behaviour. Usually, when a silly office boy or girl makes an ugly PowerPoint presentation with one of these childish decorative fonts this issue shows it's face.

But, it is a different case here. There is no need for font substitution. We see the proper Turkish characters between quotes for a long time in the source code but at some undetermined time or condition it gets messed up. So, the question is when there is no need for font subsitution why does RichEdit messes up. I have seen different versions of this mess up. One of them is changing 0x20 space characters to Point Size 1 and thus words looses whites spaces between them. Also, after this point editing becomes impossible as cursor movements and insertion points becomes erratic. Probably because of font size change happened out of control the graphic system looses coordinates of characters on the screen.

This really happens. On many different hardware and Windows versions.
As I was evangelising LBB to my colleagues and put my code on their machines to show how good LBB is, several times I got stumped by this odd behaviour. As I was showing the source code on their machine by
going down with Page Down after some lines the code becomes garbled beyond recognition. Since then I am trying to pinpoint what causes this.
Probably, RichEdit is trying to make some unsuccesful and unnecessary
font substitution at Turkish Characters.

Obviously, there is something wrong with the RichEdit Control. As I sent you an example from Alyce's RichEdit example the Turkish Content existed fine inside for most of the cases. However, when I marked a part of the text and gave change font command that also misbehaved but strangely enough in that case Turkish Characters changed to FixedSys but other characters couldn't.

I think RichEdit control is buggy about this Font Culture settings. It can not handle them when you make various actions.

I believe that this is a never ending saga from Microsoft. For example,
in Word I have a document which originally came from someone whose Windows and Word is Arabic. Whatever inside hidden and unvisible I can not change it behave left to right as it insists on working right to left. This is on a simple, basic Times New Roman font.

Anyway, I just can not understand how Microsoft made some simple things so difficult and able to produce megabytes of .DOC fiies from few pages of text. When I look inside with a hexeditor such .DOC files my stomach turns upside down.

So, I changed to Courier New and replaced all TR characters with nearest UK unaccented characters and doing my big software conversion. After this, I haven't observed this odd behaviour. And, I agree that whatever it is coming from RichEdit.

On the other hand, this problem never happened in LB which you said using some other text editor library.

Anyway, I can live with it. No need to dwell on this but maybe some other LBB users may have encountered such oddities with their national character sets. In fact, TR character set is a Latin set with 6 additional
accented characters Ş ş İ ı Ğ ğ plus Ç ç Ö ö Ü ü written left to write. Not like Greek, Cyrillic, Arabic, Hebrew, Chinese, Korean or Japanese. It shouldn't cause a mess this easily with Rich Edit.

User IP Logged

Richard Russell
Administrator
ImageImageImageImageImage


member is offline

Avatar




Homepage PM


Posts: 1348
xx Re: Limitations of Syntax Coloring
« Reply #2 on: Sep 18th, 2016, 09:33am »

on Sep 17th, 2016, 11:02pm, CryptoMan wrote:
I still think it is something else because FixedSys font contains Turkish characters.

It depends on the Code Page. You previously told me that you were carrying out the test on a UK/US standard PC, i.e. a PC configured for Code Page 1252. In that case FixedSys (which is not a Unicode font) definitely does not contain Turkish characters!

You can run this simple test program to find out which Code Page you are using:

Code:
    %options ansi
    open "Code Page Test" for text as #w
    #w "!font Arial 24"
    #w chr$(222)
    wait 

If you see the Icelandic Thorn character Þ then you are (probably) using Code Page 1252.
If you see the Turkish S-cedilla character Ş then you are using Code Page 1254.

Quote:
This really happens. On many different hardware and Windows versions.

So you say, but you have failed to provide any evidence in the form of a screenshot. If I was reporting a problem with fonts displaying incorrectly the very first thing I would do is to press Alt+PrtSc to grab an image of the window! Rather than doing this you try to explain the problem in words.

Quote:
In fact, TR character set is a Latin set with 6 additional accented characters Ş ş İ ı Ğ ğ plus Ç ç Ö ö Ü ü .... It shouldn't cause a mess this easily with Rich Edit.

I agree that it should not cause a problem with a Unicode font (and with Unicode Support selected in the LBB options menu) and I have not seen any evidence that it does. However with a non-Unicode font like FixedSys then of course it will cause a problem because the standard UK/US Code Page does not contain those characters and the Rich Edit control will be forced to do a font substitution.

Richard.
User IP Logged

CryptoMan
New Member
Image


member is offline

Avatar




PM

Gender: Male
Posts: 46
xx Re: Limitations of Syntax Coloring
« Reply #3 on: Sep 18th, 2016, 10:39am »

OK, here is more about this.

I have made very easy repeatable case but before that let me make some remarks about the Font Selection Dialog where Script is shown.

Yesterday, I told you that LB Editor Font Selection on laptop indicates "Turkish" for FixedSys font. I checked most of the fonts there and most except some decorative fonts are all supporting "Turkish" character rendering.

Please, also remember that I am not complaining about a cosmetic issue like seeing Icelandic Thorn instead of S-cedilla. It is a more dramatic problem.

However, as I told you before, I bought a brand new ACER Windows 10 loaded with US/UK Windows. In fact, my previous LENOVO Yoga 2 was purchased from BEST BUY in California, USA. So, that was infact US Windows as good as it gets but that Windows 8.1 LENOVO was violated by Microsoft with numerous updates and eventually drivers ruined by the fateful Windows 10 upgrade. During Windows updates I noticed that Microsoft is deciding by itself what is best fit for you and I suspect it might have Turkishized some parts of the Win 10 upgrade.

So, to clear any doubts, I got a brand new US/UK Windows 10 with no
Turkish settings and started working with it admittedly with my favourite font FixedSys which works just fine with most other software.

Anyway, I checked the Editor Font Selection on my old LENOVO PC yesterday after seeing your post and saw my font showed "Turkish" a.k.a. Code Page 1254 in Script selection instead of "Western" a.k.a. Code Page 1252.

This morning I sat down on my new PC and wanted check what LBB says for Script vs LB. I noticed both were saying "Western" and there was no "Turkish" on the Script selection.

However, even at this point, without any involvement of Turkish characters it was showing the odd behaviour which is very easy to replicate.

Choose FixedSys font with Script=Western or Turkish. I can email you the Turkish FixedSys if you want but this is not really necessary to observe what I am trying to illustrate.

1) Select Editor Font FixedSys.

2) Start typing test$="ABCDEF and until you close the quotes LBB editor does not start using the FixedSys font.

3) When you close the quote only "ABCDEF" is rendered into FixedSys which I believe due to Syntax Colouring but test$= remains whatever font it started with.

4) I noticed that syntax coloured keywords are changing to FixedSys but others remaining with the other font.

You can try this simple example and see the effect.

I hope that you can replicate and I can be sure that I am not hallucinating and the only one to observe this problem.



User IP Logged

Richard Russell
Administrator
ImageImageImageImageImage


member is offline

Avatar




Homepage PM


Posts: 1348
xx Re: Limitations of Syntax Coloring
« Reply #4 on: Sep 18th, 2016, 10:57am »

on Sep 18th, 2016, 10:39am, CryptoMan wrote:
Please, also remember that I am not complaining about a cosmetic issue like seeing Icelandic Thorn instead of S-cedilla.

It seems you are having difficulty grasping what Code Pages are and how they work. I had hoped that my Thorn versus S-cedilla example would help you understand.

Quote:
1) Select Editor Font FixedSys.
2) Start typing test$="ABCDEF and until you close the quotes LBB editor does not start using the FixedSys font.

I cannot reproduce that. Typing 'test$="ABCDEF' renders the text in FixedSys from the start: even the initial character "t" is rendered in FixedSys. The only thing that happens when the quotes are closed is that the Syntax Coloring changes the quoted string to magenta.

Quote:
You can try this simple example and see the effect.

I tried your simple example: I did not see the effect.

Can I ask that other users also try the test and report back their results. If you select the editor font as FixedSys and then start typing, does it render the characters using the FixedSys font from the very start or not?

Richard.
User IP Logged

CryptoMan
New Member
Image


member is offline

Avatar




PM

Gender: Male
Posts: 46
xx Re: Limitations of Syntax Coloring
« Reply #5 on: Sep 18th, 2016, 11:10am »

Let me add a few more comments.

If you turn off Syntax Colouring and Select FixedSys LBB editor never renders the fonts with FixedSys.

With syntax colouring this problem happened with US/UK ASCII characters and therefore is nothing to do with Turkish.

As I told you before it was never a font substitution problem or Unicode problem. Turkish fonts really doesn't need Unicode. At worst, we will see Icelandic Thorns and when that appears we know how to fix it by finding the correct font file or modifying a a Western fonts with a font editor.

Most probably, this is a Rich Edit bug.

But the question is why LBB Editor is not starting by using the choosen Editor font FixedSys?

I agree RichEdit is bombing when you ask it do find and replace font changes but a source code editor is not a Word Processor and will work with only one font. Only need for RichEdit is for syntax colouring. The font and character size should remain the same all through the text.

And, I have seen this odd behaviour even in the earlier versions of LBB without any syntax colouring with font's are changing to strange fonts and sometimes italic. I think the RichEdit control is loosing track of atribute information of characters due to reasons which I don't know.

I haven't studied the internal logic of RichEdit. Most probably, there is a linked list of pointers indicating position of font atribute changes and as the text gets bigger and bigger; and with insertions and deletions in the text memory map probably corresponding updates on the atributes list looses it at certain points and the mess starts.

If you remember the original IBM PC screen memory map at 0xB000
for monochrome and 0xB800 for colour displays each character was represented as two byte, first byte the character and second by the character atribute: color,underline,bold,etc. That's probably a wasteful strategy for large documents but really does Microsoft care about any efficiency these days?

The other strategy could be the ANSI.SYS or VT100/VT220 method with Esc[nnn for causing atribute change. These were working just fine in those days.

Anyway, this issue is wasting too much of your time and my time.
Let's switch to Courier New and hope Rich Edit behaves better with one of those proper fonts.
User IP Logged

Richard Russell
Administrator
ImageImageImageImageImage


member is offline

Avatar




Homepage PM


Posts: 1348
xx Re: Limitations of Syntax Coloring
« Reply #6 on: Sep 18th, 2016, 11:14am »

on Sep 18th, 2016, 11:10am, CryptoMan wrote:
But the question is why LBB Editor is not starting by using the choosen Editor font FixedSys?

It is, for me. Let's await the reports of other users.

Quote:
Turkish fonts really doesn't need Unicode. At worst, we will see Icelandic Thorns and when that appears we know how to fix it

That statement is wrong, or at least highly misleading, as I have explained so many times.

Richard.
« Last Edit: Sep 18th, 2016, 11:18am by Richard Russell » User IP Logged

CryptoMan
New Member
Image


member is offline

Avatar




PM

Gender: Male
Posts: 46
xx Re: Limitations of Syntax Coloring
« Reply #7 on: Sep 18th, 2016, 11:20am »

Just to eliminate any doubts on Keyboard settings, I changed it to English (US) Keyboard and it is the same. It does not start with FixedSys.
User IP Logged

Richard Russell
Administrator
ImageImageImageImageImage


member is offline

Avatar




Homepage PM


Posts: 1348
xx Re: Limitations of Syntax Coloring
« Reply #8 on: Sep 18th, 2016, 11:24am »

on Sep 18th, 2016, 11:20am, CryptoMan wrote:
Just to eliminate any doubts on Keyboard settings, I changed it to English (US) Keyboard and it is the same. It does not start with FixedSys.

You said this was wasting too much time. I agree, so please let us wait for reports from other users so we can determine whether this is something peculiar to your setup or a more general issue. I hope not to see any more posts from you until then, thanks.

Richard.
User IP Logged

CryptoMan
New Member
Image


member is offline

Avatar




PM

Gender: Male
Posts: 46
xx Re: Limitations of Syntax Coloring
« Reply #9 on: Sep 18th, 2016, 11:31am »

I went to my old Lenovo and choose FixedSys and it does start with FixedSys.

On that machine FixedSys have only Turkish Script but the new machine has both Western and Turkish.

I will now erase one of them and see if this is the cause.
User IP Logged

CryptoMan
New Member
Image


member is offline

Avatar




PM

Gender: Male
Posts: 46
xx Re: Limitations of Syntax Coloring
« Reply #10 on: Sep 18th, 2016, 11:36am »

Windows refuses to erase FixedSys so I couldn't try this.

But, I tried with a similar font Terminal which has only one script OEM/DOS and it does start rendering with terminal.
User IP Logged

CryptoMan
New Member
Image


member is offline

Avatar




PM

Gender: Male
Posts: 46
xx Re: Limitations of Syntax Coloring
« Reply #11 on: Sep 18th, 2016, 11:40am »

In fact, when it works in the old Lenovo, as soon as type " syntax colouring starts.

In the new machine syntax colouring happens when I close the quote.

Why is these different behaviours happening?
User IP Logged

CryptoMan
New Member
Image


member is offline

Avatar




PM

Gender: Male
Posts: 46
xx Re: Limitations of Syntax Coloring
« Reply #12 on: Sep 18th, 2016, 11:48am »

Old Lenovo is Windows 10 Home Edition
New Acer is Windows 10 Home Single Language

Yes, this is really a time waster and let's see if anybody else have seen anything like this.

Thanks for your patience.
User IP Logged

tsh73
Full Member
ImageImageImage


member is offline

Avatar




PM

Gender: Male
Posts: 210
xx Re: Limitations of Syntax Coloring
« Reply #13 on: Sep 19th, 2016, 06:37am »

Her's what I got on Win XP prof SP3
(LBB version still 3.02 but I think it doesn't changed here)
http://s21.postimg.io/3zvqvn7yf/Syntax_Coloring_Fixedsys.gif

User Image

If I switch to Lucida Console font stays consistent (looks Lucida Console )

But syntax coloring was turned off to me. I will put it on and see if I see anything wrong.
User IP Logged

tsh73
Full Member
ImageImageImage


member is offline

Avatar




PM

Gender: Male
Posts: 210
xx Re: Limitations of Syntax Coloring
« Reply #14 on: Sep 19th, 2016, 07:03am »

Aha.
So I went on reading "pressing ESC aborts the program" thread

and pasted piece of code from reply #23, this one
Code:
    htb = hwnd(#w.tb)
    calldll #user32, "GetWindowLongA", htb as ulong, _
      _GWL_STYLE as long, style as long
    print dechex$(style)
 

Pasted from Internet Explorer 8 (it happens pasting from Google Chrome works normally with/without syntax coloring!)

Now that I've got:
with syntax coloring on I got single line, syntax colored, of different font size, obviously garbled (some "(" not visible)
first word is Russian for "Normal" - have no idea where it gets from
(Last vertical line is text cursor)

If I turn off syntax coloring, this line gets same font size, but still as a single line it won't compile even after removing extra first word.

Now if I paste same thing with syntax coloring off it appears normal, compiles too
User Image
Still somehow pasted text font looks smaller.

so I'll turn it off for a while.

EDIT: that was Internet Explorer 8
From Google Chrome it works as supposed too, everything is fine.
User IP Logged

Pages: 1 2 3  Notify Send Topic Print
« Previous Topic | Next Topic »

| |

This forum powered for FREE by Conforums ©
Terms of Service | Privacy Policy | Conforums Support | Parental Controls