Explain non-ASCII chars in identifiers better.

Clarify the discussion about preferring English.
This commit is contained in:
Richard Stallman 2022-09-19 14:12:43 -04:00
parent ed702eb6ab
commit ae13d4a5de
1 changed files with 26 additions and 18 deletions

44
c.texi
View File

@ -1479,19 +1479,20 @@ kind.
In principle, you can write the function and variable names in a In principle, you can write the function and variable names in a
program, and the comments, in any human language. C allows any kinds program, and the comments, in any human language. C allows any kinds
of characters in comments, and you can put non-ASCII characters into of Unicode characters in comments, and you can put them into
identifiers with a special prefix. However, to enable programmers in identifiers with a special prefix (@pxref{Unicode Character Codes}).
all countries to understand and develop the program, it is best given However, to enable programmers in all countries to understand and
today's circumstances to write identifiers and comments in develop the program, it is best under today's circumstances to write
English. all identifiers and comments in English.
English is the one language that programmers in all countries English is the common language of programmers; in all countries,
generally study. If a program's names are in English, most programmers generally learn English. If names and comments in a
programmers in Bangladesh, Belgium, Bolivia, Brazil, and Bulgaria can program are written in English, most programmers in Bangladesh,
understand them. Most programmers in those countries can speak Belgium, Bolivia, Brazil, Bulgaria and Burundi can understand them.
English, or at least read it, but they do not read each other's In all those countries, most programmers can speak English, or at least
languages at all. In India, with so many languages, two programmers read it, but they do not read each other's languages at all. In
may have no common language other than English. India, with so many languages, two programmers may have no common
language other than English.
If you don't feel confident in writing English, do the best you can, If you don't feel confident in writing English, do the best you can,
and follow each English comment with a version in a language you and follow each English comment with a version in a language you
@ -1500,7 +1501,7 @@ Someone will eventually do that.
The program's user interface is a different matter. We don't need to The program's user interface is a different matter. We don't need to
choose one language for that; it is easy to support multiple languages choose one language for that; it is easy to support multiple languages
and let each user choose the language to use. This requires writing and let each user choose the language for display. This requires writing
the program to support localization of its interface. (The the program to support localization of its interface. (The
@code{gettext} package exists to support this; @pxref{Message @code{gettext} package exists to support this; @pxref{Message
Translation, The GNU C Library, , libc, The GNU C Library Reference Translation, The GNU C Library, , libc, The GNU C Library Reference
@ -4631,11 +4632,11 @@ contains character codes 128 and up, the results cannot be relied on.
You can specify Unicode characters, for individual character constants You can specify Unicode characters, for individual character constants
or as part of string constants (@pxref{String Constants}), using or as part of string constants (@pxref{String Constants}), using
escape sequences. Use the @samp{\u} escape sequence with a 16-bit escape sequences; and even in C identifiers. Use the @samp{\u} escape
hexadecimal Unicode character code. If the code value is too big for sequence with a 16-bit hexadecimal Unicode character code. If the
16 bits, use the @samp{\U} escape sequence with a 32-bit hexadecimal code value is too big for 16 bits, use the @samp{\U} escape sequence
Unicode character code. (These codes are called @dfn{universal with a 32-bit hexadecimal Unicode character code. (These codes are
character names}.) For example, called @dfn{universal character names}.) For example,
@example @example
\u6C34 /* @r{16-bit code (UTF-16)} */ \u6C34 /* @r{16-bit code (UTF-16)} */
@ -4667,6 +4668,13 @@ u"\u6C34\u6C33" /* @r{16-bit code} */
U"\U0010ABCD" /* @r{32-bit code} */ U"\U0010ABCD" /* @r{32-bit code} */
@end example @end example
@noindent
And in an identifier:
@example
int foo\u6C34bar = 0;
@end example
Codes in the range of @code{D800} through @code{DFFF} are not valid Codes in the range of @code{D800} through @code{DFFF} are not valid
in Unicode. Codes less than @code{00A0} are also forbidden, except for in Unicode. Codes less than @code{00A0} are also forbidden, except for
@code{0024}, @code{0040}, and @code{0060}; these characters are @code{0024}, @code{0040}, and @code{0060}; these characters are