From ae13d4a5de394695c42912a4ef61799501f45598 Mon Sep 17 00:00:00 2001
From: Richard Stallman <rms@gnu.org>
Date: Mon, 19 Sep 2022 14:12:43 -0400
Subject: [PATCH] Explain non-ASCII chars in identifiers better. Clarify the
 discussion about preferring English.

---
 c.texi | 44 ++++++++++++++++++++++++++------------------
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/c.texi b/c.texi
index 8e89d2f..92376a0 100644
--- a/c.texi
+++ b/c.texi
@@ -1479,19 +1479,20 @@ kind.
 
 In principle, you can write the function and variable names in a
 program, and the comments, in any human language.  C allows any kinds
-of characters in comments, and you can put non-ASCII characters into
-identifiers with a special prefix.  However, to enable programmers in
-all countries to understand and develop the program, it is best given
-today's circumstances to write identifiers and comments in
-English.
+of Unicode characters in comments, and you can put them into
+identifiers with a special prefix (@pxref{Unicode Character Codes}).
+However, to enable programmers in all countries to understand and
+develop the program, it is best under today's circumstances to write
+all identifiers and comments in English.
 
-English is the one language that programmers in all countries
-generally study.  If a program's names are in English, most
-programmers in Bangladesh, Belgium, Bolivia, Brazil, and Bulgaria can
-understand them.  Most programmers in those countries can speak
-English, or at least read it, but they do not read each other's
-languages at all.  In India, with so many languages, two programmers
-may have no common language other than English.
+English is the common language of programmers; in all countries,
+programmers generally learn English.  If names and comments in a
+program are written in English, most programmers in Bangladesh,
+Belgium, Bolivia, Brazil, Bulgaria and Burundi can understand them.
+In all those countries, most programmers can speak English, or at least
+read it, but they do not read each other's languages at all.  In
+India, with so many languages, two programmers may have no common
+language other than English.
 
 If you don't feel confident in writing English, do the best you can,
 and follow each English comment with a version in a language you
@@ -1500,7 +1501,7 @@ Someone will eventually do that.
 
 The program's user interface is a different matter.  We don't need to
 choose one language for that; it is easy to support multiple languages
-and let each user choose the language to use.  This requires writing
+and let each user choose the language for display.  This requires writing
 the program to support localization of its interface.  (The
 @code{gettext} package exists to support this; @pxref{Message
 Translation, The GNU C Library, , libc, The GNU C Library Reference
@@ -4631,11 +4632,11 @@ contains character codes 128 and up, the results cannot be relied on.
 
 You can specify Unicode characters, for individual character constants
 or as part of string constants (@pxref{String Constants}), using
-escape sequences.  Use the @samp{\u} escape sequence with a 16-bit
-hexadecimal Unicode character code.  If the code value is too big for
-16 bits, use the @samp{\U} escape sequence with a 32-bit hexadecimal
-Unicode character code.  (These codes are called @dfn{universal
-character names}.)  For example,
+escape sequences; and even in C identifiers.  Use the @samp{\u} escape
+sequence with a 16-bit hexadecimal Unicode character code.  If the
+code value is too big for 16 bits, use the @samp{\U} escape sequence
+with a 32-bit hexadecimal Unicode character code.  (These codes are
+called @dfn{universal character names}.)  For example,
 
 @example
 \u6C34      /* @r{16-bit code (UTF-16)} */
@@ -4667,6 +4668,13 @@ u"\u6C34\u6C33"  /* @r{16-bit code} */
 U"\U0010ABCD"    /* @r{32-bit code} */
 @end example
 
+@noindent
+And in an identifier:
+
+@example
+int foo\u6C34bar = 0;
+@end example
+
 Codes in the range of @code{D800} through @code{DFFF} are not valid
 in Unicode.  Codes less than @code{00A0} are also forbidden, except for
 @code{0024}, @code{0040}, and @code{0060}; these characters are