upper-case/lower-case with umlauts?

Started by HPW, November 18, 2003, 12:19:51 AM

Previous topic - Next topic

HPW

Is there any way to support upper-case/lower-case with umlauts?



Example:



(upper-case "Testöäüß")



gives:



"TESTöäüß"



One possibility is to define my own function with parsing/replacing the umlauts. Other ideas?
Hans-Peter

nigelbrown

#1
I found

http://mail.python.org/pipermail/python-dev/2000-May/004165.html">http://mail.python.org/pipermail/python ... 04165.html">http://mail.python.org/pipermail/python-dev/2000-May/004165.html

that discusses toupper conversion (in the context of C libraries in Python?)

viz:<quote>

> On POSIX systems there are a several environment variables used to

> control the default locale settings for a users session.  For example

> on my SuSE Linux system currently running in the german locale the

> environment variable LC_CTYPE=de_DE is automatically set by a file

> /etc/profile during login, which causes automatically the C-library

> function toupper('ä') to return an 'Ä' ---you should see

> a lower case a-umlaut as argument and an upper case umlaut as return

> value--- without having all applications to call 'setlocale' explicitly.

>

> So this simply works well as intended without having to add calls

> to 'setlocale' to all application program using this C-library functions.



I don;t believe that.  According to the ANSI standard, a C program

*must* call setlocale(LC_..., "") if it wants the environment

variables to be honored; without this call, the locale is always the

"C" locale, which should *not* honor the environment variables.

<end quote>



This suggests that newlisp code could have a locale setting that would lead to the correct conversion (if supported by borland)

I'm not currently at a computer with the borland compiler installed so haven't looked at the borland docs.

Regards

Nigel

nigelbrown

#2
Further to my earlier reply: from the Borland helpfile BCB5.HLP

<quote>

Syntax



#include <locale.h>

char *setlocale(int category, const char *locale);

wchar_t * _wsetlocale( int category, const wchar_t *locale);



Description



Use the setlocale to select or query a locale.



Borland C++ supports all locales supported in NT 3.5x and Win95/NT 4.0 operating systems. See your system documentation for details.



The possible values for the category argument are as follows:



Value   Affect



LC_ALL   Affects all the following categories

LC_COLLATE   Affects strcoll and strxfrm

LC_CTYPE   Affects single-byte character handling functions. The mbstowcs and mbtowc functions are not affected.

<end quote>...

<quote>

To take advantage of dynamically loadable locales in your application, define _ _USELOCALES_ _ for each module. If _ _USELOCALES_ _ is not defined, all locale-sensitive functions and macros will work only with the default C locale.

<end quote>

This could be tried.

Regards

Nigel

Lutz

#3
Thanks for the pointers Nigel.



I left "newlisp.exe.7305" and "README_7305.txt" in the development directory. This version tries to set the locale of your country automatically. Look into the "README_7305.txt" for instructions.



I had the opportunity to log on to a German Linux computer and it worked well doing the uppercase on the German character set.



But I had not the chance to test a Borland C compile on a German machine. But yes, the Borland compiler supports the setlocale() function. It is called automatically now in newLISP and also available as a builtin function.



The builtin function does not return a correct value (the locale 'C' or 'POSIX' or country code) as it does on CYGWIN and Linux, but perhaps it does it on a German Windows. Here in US BorlandC I get a NULL-pointer which I am converting to 'nil'. In US CYGWIN I get "C" and in German Linux I get "de_DE".



So give it a try and tell me whats happening.



Lutz



ps: FP exception in this version behaves like on UNIX, also changes for big-buffer in Tcl/Tk NewlispEvaluateBuffer(), but still working on newlisp-tk.tcl.

nigelbrown

#4
Lutz

README_7305.txt is missing

Nigel

HPW

#5
Quote
newLISP v7.3.5 Copyright (c) 2003 Lutz Mueller. All rights reserved.



> (upper-case "ASDasdöäüßÖÄÜ")

"ASDASDÖÄÜßÖÄÜ"

> (lower-case "ASDasdöäüßÖÄÜ")

"asdasdöäüßöäü"

>


Works well on WIN2K PRO and WIN XP PRO (Both German)!

Great!
Hans-Peter

Lutz

#6
I am glad 'locale' works! Another problem solved. About 50% of newLISP users are outside USA, so this was important.



BTW, readme_7305.txt is now visible in the development directory, it would be good to hear from other countries too.



Lutz

nigelbrown

#7
Using the functions in the readme I noticed that the square root sign 251 is channged by upper-case.

Compare 7.3.3 which leaves sqrt sign alone:

newLISP v7.3.3 Copyright (c) 2003 Lutz Mueller. All rights reserved.



> (char "251")

251

> (char (upper-case "251"))

251

> (char (upper-case "a"))

65

> (char "a")

97

>

with 7.3.5 de novo that seems to subtract 32 to make it "uppercase":



C:temp>newlisp

newLISP v7.3.5 Copyright (c) 2003 Lutz Mueller. All rights reserved.



> (char "251")

251

> (char (upper-case "251"))

219

> (set-locale 0)

nil

>





I'm in Australia with Win XP Pro



Regards

Nigel

nigelbrown

#8
Sorry if my last post caused confusion - the variation in console fonts on win systems is the source of confusion

My above comments on the sqrt sign apply using "Lucinda ConsoleP" font on my DOSBox on my Win98SE setup - that has the sqrt sign as 251 - however I see noe that many fonts have

u-hat û at that value which is converted to uppercase by subtracting 32.

Also although my CommandPrompt box on WinXP Pro is set to Lucinda Console that 'should' have u-hat a line draw sumbol actually appears-



C:temp>newlisp

newLISP v7.3.5 Copyright (c) 2003 Lutz Mueller. All rights reserved.



> "251"

"¹"

> (upper-case "251")

"█"

>

(I note here 251 is ? superscript 1 and uppercases to a block)



Very confusing in the upper decimal characters.



Nigel

I guess the use of an unexpected display font will muddy the waters.

Lutz

#9
I am still exploring the whole 'locale' thing, this is what I did so far. On startup with 7.3.5 and after newLISP does a:



(set-locale 0xFF "")  ; switch all option on for your locale



Internally it does: setlocale(LC_ALL, ""), LC_ALL is defined in locale.h as 0xff.



To go back to a pre 7.3.5 status I think you would do a:



(set-locale 0 "C") ; switch to ISO 'C' locale available in all countries



When set-locale is given only the first parameter it is supposed to return the current locale (newLISP passes null to setlocale(option, null)). When giving "" as the second it is supposed to switch to the local locale.



The question is: how should I distribute newLISP?



(1) with locale switching as the default. This broke Turtle.lsp in Germany because of decimal comma in floats, but made upper-case etc. working right away for HPW



(2)  with ISO 'C' locale as default like before 7.3.5 ? will guarantee a newLISP which behaves in the whole world the same way, but may be not practical for writing your daily application.



Currently I haven't documented 'set-locale' yet to clear up these questions first. I wonder what other languages do, i.e. Perl or Python. Hans-Peter how is it in Germany with Perl, Python ?!



Lutz

HPW

#10

> (set-locale 0 "C")
"C"
> (upper-case "asdöäüÖÄÜß")
"ASDöäüÖÄÜß"

> (set-locale 0xFF "")
"LC_MONETARY=German_Germany.850nLC_TIME=German_Germany.850nLC_NUMERIC=German_Germany.850nLC_COLLATE=German_Germany.850nLC_CTYPE=German_Germany.850n"
> (upper-case "asdöäüÖÄÜß")
"ASDÖÄÜÖÄÜß"


With locale "C" original Turtle.lsp works.

With german original Turtle has the bug.



>(1) with locale switching as the default.



Yes I would prefer it as Default. It should be well documented.

Put in the above switch code in Turtle lisp and switch temporaly back to "C" inside Turtle lisp. Then everyone can look in the sample-code how to avoid such Problems. Works for me here with 7.3.7.

;; Turtle.lsp - graphics demo for newLISP-tk
;;
;; to run: (Turtle:run)
;;
;;
;;

(set-locale 0 "C")

...
...


(define (run )
  (tk "if {[winfo exists .tw] == 1} {destroy .tw}")
  (tk "toplevel .tw")
  (tk "canvas .tw.can -width 500 -height 400 -bg #FFFEC0")
  (tk "pack .tw.can")
  (tk "wm geometry .tw +100+160")
  (tk "wm title .tw { Turtle.lsp}")
  (tk ".tw.can create text 380 70 -fill navy -font {Times 12} -text {Dragon Fractal}")
  (tk ".tw.can create text 100 350 -fill navy -font {Times 16} -text {Turtle Graphics}")
  (tk "after 300; update idletasks")
  (turtle-start 300 50)
  (dragon-curve 12 "red")
  (draw)
  (turtle-start 120 200)
  (rose "blue")
  (set-locale 0xFF ""))




>Perl or Python. Hans-Peter how is it in Germany with Perl, Python ?!



Have to investigate and ask my python-college.
Hans-Peter

nigelbrown

#11
An extensive discussion of perl locale use and issues is at:

http://www.perldoc.com/perl5.6/pod/perllocale.html">http://www.perldoc.com/perl5.6/pod/perllocale.html



An issue for newlisp is how to have the (upper-case and (regex.. working with the same locale. The pcre lib docs suggest that quite a bit of fiddling is needed (compiling custom tables for each desired charset) for pcre to be locale aware - otherwise upper case will mean different things to regex and upper-case.



Perhaps the standard newlisp could work with default locale (option(2) of Lutz's post) and a special compilation flag be used if a locale aware is desired. Looking at the perl locale discussion shows what a can of worms locale can open.



Nigel

Lutz

#12
For now we will leave the locale switching in as a default in the development versions, document everyting well until more research is done.



In PCRE there are different issuee, I looked through the code an there is no locale switching. Everything seems to be dependant on character tables, which are generated before compiling. May be PCRE could be the reason not to automtically switch the locale but distribute newLISP with (set-locale 0 "C").



Unfortunately I don't know how find/replace/regex are performing i.e. on case specific stuff when the locale is switched.



Again, I think I have to do some reading first, to figure out how others solve these issues.



Lutz

Lutz

#13
See new thread "Localization in newLISP" in "Lisp in general" group.



Lutz