274 lines
8.2 KiB
HTML
274 lines
8.2 KiB
HTML
<!-- $XConsortium: sgmldecl.htm /main/1 1996/09/22 18:17:17 rws $ -->
|
|
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML Strict//EN">
|
|
<HTML>
|
|
<HEAD>
|
|
<TITLE>SP - SGML declaration</TITLE>
|
|
</HEAD>
|
|
<BODY>
|
|
<H1>Handling of the SGML declaration in SP</H1>
|
|
<H2>Default SGML declaration</H2>
|
|
<P>
|
|
If the SGML declaration is omitted
|
|
and there is no applicable
|
|
<A HREF="catalog.htm#sgmldecl"><SAMP>SGMLDECL</SAMP></A>
|
|
entry in a catalog,
|
|
the following declaration will be implied:
|
|
<PRE>
|
|
<!SGML "ISO 8879:1986"
|
|
CHARSET
|
|
BASESET "ISO 646-1983//CHARSET
|
|
International Reference Version (IRV)//ESC 2/5 4/0"
|
|
DESCSET 0 9 UNUSED
|
|
9 2 9
|
|
11 2 UNUSED
|
|
13 1 13
|
|
14 18 UNUSED
|
|
32 95 32
|
|
127 1 UNUSED
|
|
CAPACITY PUBLIC "ISO 8879:1986//CAPACITY Reference//EN"
|
|
SCOPE DOCUMENT
|
|
SYNTAX
|
|
SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
|
|
18 19 20 21 22 23 24 25 26 27 28 29 30 31 127 255
|
|
BASESET "ISO 646-1983//CHARSET International Reference Version
|
|
(IRV)//ESC 2/5 4/0"
|
|
DESCSET 0 128 0
|
|
FUNCTION RE 13
|
|
RS 10
|
|
SPACE 32
|
|
TAB SEPCHAR 9
|
|
NAMING LCNMSTRT ""
|
|
UCNMSTRT ""
|
|
LCNMCHAR "-."
|
|
UCNMCHAR "-."
|
|
NAMECASE GENERAL YES
|
|
ENTITY NO
|
|
DELIM GENERAL SGMLREF
|
|
SHORTREF SGMLREF
|
|
NAMES SGMLREF
|
|
QUANTITY SGMLREF
|
|
ATTCNT 99999999
|
|
ATTSPLEN 99999999
|
|
DTEMPLEN 24000
|
|
ENTLVL 99999999
|
|
GRPCNT 99999999
|
|
GRPGTCNT 99999999
|
|
GRPLVL 99999999
|
|
LITLEN 24000
|
|
NAMELEN 99999999
|
|
PILEN 24000
|
|
TAGLEN 99999999
|
|
TAGLVL 99999999
|
|
FEATURES
|
|
MINIMIZE DATATAG NO
|
|
OMITTAG YES
|
|
RANK YES
|
|
SHORTTAG YES
|
|
LINK SIMPLE YES 1000
|
|
IMPLICIT YES
|
|
EXPLICIT YES 1
|
|
OTHER CONCUR NO
|
|
SUBDOC YES 99999999
|
|
FORMAL YES
|
|
APPINFO NONE>
|
|
</PRE>
|
|
<P>
|
|
with the exception that all characters that are neither significant
|
|
nor shunned will be assigned to DATACHAR.
|
|
<H2>Character sets</H2>
|
|
<P>
|
|
A character in a base character set is described either by giving its
|
|
number in a universal character set, or by specifying a minimum
|
|
literal. The constraints on the choice of universal character set are
|
|
that characters that are significant in the SGML reference concrete
|
|
syntax must be in the universal character set and must have the same
|
|
number in the universal character set as in ISO 646 and that each
|
|
character in the character set must be represented by exactly one
|
|
number; that character numbers in the range 0 to 31 and 127 to 159 are
|
|
control characters (for the purpose of enforcing SHUNCHAR CONTROLS).
|
|
It is recommended that ISO 10646 (Unicode) be used as the universal
|
|
character set, except in environments where the normal document
|
|
character sets are large character set which cannot be compactly
|
|
described in terms of ISO 10646.
|
|
The public identifier of a base character set can be associated
|
|
with an entity that describes it by using a
|
|
<SAMP>PUBLIC</SAMP>
|
|
entry in the catalog entry file.
|
|
The entity must be a fragment
|
|
of an SGML declaration
|
|
consisting of the
|
|
portion of a character set description,
|
|
following the DESCSET keyword,
|
|
that is, it must be a sequence of character descriptions,
|
|
where each character description specifies a described character
|
|
number, the number of characters and
|
|
either a character number in the universal character set, a minimum literal
|
|
or the keyword
|
|
<SAMP>UNUSED</SAMP>.
|
|
Character numbers in the universal character set can be as big as
|
|
99999999.
|
|
<P>
|
|
In addition SP has built in knowledge of a few character sets.
|
|
These are identified using the designating sequence in the
|
|
public identifier. The following designating sequences are
|
|
recognized:
|
|
<DL>
|
|
<DT>
|
|
<SAMP>ESC 2/5 4/0</SAMP>
|
|
<DD>
|
|
The full set of ISO 646 IRV.
|
|
This is not a registered character set,
|
|
but is recommended by ISO 8879 (clause 10.2.2.4).
|
|
<DT>
|
|
<SAMP>ESC 2/8 4/0</SAMP>
|
|
<DD>
|
|
G0 set of ISO 646 IRV,
|
|
ISO Registration Number 2.
|
|
<DT>
|
|
<SAMP>ESC 2/8 4/2</SAMP>
|
|
<DD>
|
|
G0 set of ASCII,
|
|
ISO Registration Number 6.
|
|
<DT>
|
|
<SAMP>ESC 2/1 4/0</SAMP>
|
|
<DD>
|
|
C0 set of ISO 646,
|
|
ISO Registration Number 1.
|
|
</DL>
|
|
<P>
|
|
All the above character sets will be treated as mapping character numbers
|
|
0 to 127 inclusive as in ISO 646.
|
|
<P>
|
|
It is not necessary for every character set used in the SGML
|
|
declaration to be known to SP
|
|
provided that characters in the document character set that are
|
|
significant both in the reference concrete syntax and in the described
|
|
concrete syntax are described using known base character sets and that
|
|
characters that are significant in the described concrete syntax are
|
|
described using the same base character sets or the same minimum
|
|
literals in both the document character set description and the syntax
|
|
reference character set description.
|
|
|
|
<H2>Concrete syntaxes</H2>
|
|
<P>
|
|
The public identifier for a public concrete syntax can be associated
|
|
with an entity that describes using a
|
|
<SAMP>PUBLIC</SAMP>
|
|
entry in the catalog entry file.
|
|
The entity must be a fragment of an SGML declaration
|
|
consisting of a concrete syntax description
|
|
starting with the
|
|
<SAMP>SHUNCHAR</SAMP>
|
|
keyword
|
|
as in an SGML declaration.
|
|
The entity can also make use of the following extensions:
|
|
<UL>
|
|
<LI>
|
|
An
|
|
<I>added function</I>
|
|
can be expressed as a parameter literal
|
|
instead of a name.
|
|
<LI>
|
|
The replacement for a reference reserved name
|
|
can be expressed as a parameter literal instead of a name.
|
|
<LI>
|
|
The
|
|
<SAMP>LCNMSTRT</SAMP>,
|
|
<SAMP>UCNMSTRT</SAMP>,
|
|
<SAMP>LCNMCHAR</SAMP>
|
|
and
|
|
<SAMP>UCNMCHAR</SAMP>
|
|
keywords may each be followed by more than one parameter literal. A
|
|
sequence of parameter literals has the same meaning as a single
|
|
parameter literal whose content is the concatenation of the content of
|
|
each of the literals in the sequence. This extension is useful
|
|
because of the restriction on the length of a parameter literal in the
|
|
SGML declaration to 240 characters.
|
|
<LI>
|
|
The total number of characters specified for
|
|
<SAMP>UCNMCHAR</SAMP>
|
|
or
|
|
<SAMP>UCNMSTRT</SAMP>
|
|
may exceed the total number of characters specified for
|
|
<SAMP>LCNMCHAR</SAMP>
|
|
or
|
|
<SAMP>LCNMSTRT</SAMP>
|
|
respectively.
|
|
Each character in
|
|
<SAMP>UCNMCHAR</SAMP>
|
|
or
|
|
<SAMP>UCNMSTRT</SAMP>
|
|
which does not have a corresponding character in the same position in
|
|
<SAMP>LCNMCHAR</SAMP>
|
|
or
|
|
<SAMP>LCNMSTRT</SAMP>
|
|
is simply assigned to <SAMP>UCNMCHAR</SAMP> or <SAMP>UCNMSTRT</SAMP>
|
|
without making it the upper-case form of any character.
|
|
<LI>
|
|
A parameter following any of
|
|
<SAMP>LCNMSTRT</SAMP>,
|
|
<SAMP>UCNMSTRT</SAMP>,
|
|
<SAMP>LCNMCHAR</SAMP>
|
|
and
|
|
<SAMP>UCNMCHAR</SAMP>
|
|
keywords may be followed by
|
|
the name token <SAMP>...</SAMP>
|
|
(three periods) and another parameter literal.
|
|
This has the same meaning as the two parameter literals
|
|
with a parameter literal in between
|
|
containing in order each character whose number
|
|
is greater than the number of the last character in
|
|
the first parameter literal and less than the
|
|
number of the first character in the second
|
|
parameter literal.
|
|
A parameter literal must contain at least one character for each
|
|
<SAMP>...</SAMP>
|
|
to which it is adjacent.
|
|
<LI>
|
|
A number may be used as a parameter following the
|
|
<SAMP>LCNMSTRT</SAMP>,
|
|
<SAMP>UCNMSTRT</SAMP>,
|
|
<SAMP>LCNMCHAR</SAMP>
|
|
and
|
|
<SAMP>UCNMCHAR</SAMP>
|
|
keywords or as a delimiter in the
|
|
<SAMP>DELIM</SAMP>
|
|
section with the same meaning as a parameter literal
|
|
containing just a numeric character reference with that number.
|
|
<LI>
|
|
The parameters following the
|
|
<SAMP>LCNMSTRT</SAMP>,
|
|
<SAMP>UCNMSTRT</SAMP>,
|
|
<SAMP>LCNMCHAR</SAMP>
|
|
and
|
|
<SAMP>UCNMCHAR</SAMP>
|
|
keywords may be omitted.
|
|
This has the same meaning as specifying
|
|
an empty parameter literal.
|
|
<LI>
|
|
Within the specification of the short reference delimiters,
|
|
a parameter literal containing exactly one character
|
|
may be followed by the name token <SAMP>...</SAMP>
|
|
and another parameter literal containing exactly one character.
|
|
This has the same meaning as a sequence of parameter literals
|
|
one for each character number that is greater than or equal
|
|
to the number of the character in the first parameter literal
|
|
and less than or equal to the number of the character in the
|
|
second parameter literal.
|
|
</UL>
|
|
<H2>Capacity sets</H2>
|
|
<P>
|
|
The public identifier for a public capacity set can be associated
|
|
with an entity that describes using a
|
|
<SAMP>PUBLIC</SAMP>
|
|
entry in the catalog entry file.
|
|
The entity must be a fragment of an SGML declaration
|
|
consisting of a sequence of capacity names and numbers.
|
|
<P>
|
|
<ADDRESS>
|
|
James Clark<BR>
|
|
jjc@jclark.com
|
|
</ADDRESS>
|
|
</BODY>
|
|
</HTML>
|