13407 lines
436 KiB
Plaintext
13407 lines
436 KiB
Plaintext
\input texinfo
|
||
|
||
|
||
@c Copyright (C) 2022, 2023, 2025 Richard Stallman
|
||
@c and Free Software Foundation, Inc.
|
||
|
||
@c (The work of Trevis Rothwell and Nelson Beebe has been assigned to the FSF.)
|
||
|
||
@c move alignment later?
|
||
|
||
@c ??? alloca
|
||
|
||
@setfilename ./c.info
|
||
@include version.texi
|
||
@settitle GNU C Language Manual
|
||
@documentencoding UTF-8
|
||
|
||
@c Merge variable index into the function index.
|
||
@synindex vr fn
|
||
@codequoteundirected on
|
||
@codequotebacktick on
|
||
|
||
@copying
|
||
This is Edition @value{VERSION}.
|
||
|
||
Copyright @copyright{} 2022, 2023, 2025 Richard Stallman
|
||
and Free Software Foundation, Inc.
|
||
|
||
(The work of Trevis Rothwell and Nelson Beebe has been assigned to the FSF.)
|
||
|
||
@quotation
|
||
Permission is granted to copy, distribute and/or modify this document
|
||
under the terms of the GNU Free Documentation License, Version 1.3 or
|
||
any later version published by the Free Software Foundation; with the
|
||
Invariant Sections being ``GNU General Public License,'' with the
|
||
Front-Cover Texts being ``A GNU Manual,'' and with the Back-Cover
|
||
Texts as in (a) below. A copy of the license is included in the
|
||
section entitled ``GNU Free Documentation License.''
|
||
|
||
(a) The FSF's Back-Cover Text is: ``You have the freedom to copy and
|
||
modify this GNU manual.''
|
||
@end quotation
|
||
@end copying
|
||
|
||
@dircategory Programming
|
||
@direntry
|
||
* C: (c). GNU C Language Intro and Reference Manual
|
||
@end direntry
|
||
|
||
@titlepage
|
||
@sp 6
|
||
@center @titlefont{GNU C Language Introduction}
|
||
@center @titlefont{and Reference Manual}
|
||
@sp 4
|
||
@center Edition @value{VERSION}
|
||
@sp 5
|
||
@center Richard Stallman
|
||
@center and
|
||
@center Trevis Rothwell
|
||
@center plus Nelson Beebe
|
||
@center on floating point
|
||
@page
|
||
@vskip 0pt plus 1filll
|
||
|
||
@insertcopying
|
||
|
||
@sp 2
|
||
@ignore
|
||
WILL BE Published by the Free Software Foundation @*
|
||
31 Milk St # 960789 @*
|
||
Boston, MA 02110 USA @*
|
||
ISBN ?-??????-??-?
|
||
@end ignore
|
||
|
||
@ignore
|
||
@sp 1
|
||
Cover art by J. Random Artist
|
||
@end ignore
|
||
|
||
@end titlepage
|
||
|
||
@summarycontents
|
||
@contents
|
||
|
||
|
||
@node Top
|
||
@ifnottex
|
||
@top GNU C Manual
|
||
@end ifnottex
|
||
@iftex
|
||
@top Preface
|
||
@end iftex
|
||
|
||
This manual explains the C language for use with the GNU Compiler
|
||
Collection (GCC) on the GNU/Linux operating system and other systems.
|
||
We refer to this dialect as GNU C. If you already know C, you can use
|
||
this as a reference manual.
|
||
|
||
If you understand basic concepts of programming but know nothing about
|
||
C, you can read this manual sequentially from the beginning to learn
|
||
the C language.
|
||
|
||
If you are a beginner in programming, we recommend you first learn a
|
||
language with automatic garbage collection and no explicit pointers,
|
||
rather than starting with C@. Good choices include Lisp, Scheme,
|
||
Python and Java. Because of C's explicit pointers, programmers must be
|
||
careful to avoid certain kinds of errors in memory usage.
|
||
|
||
C is a venerable language; it was first used in 1973. The GNU C
|
||
Compiler, which was subsequently extended into the GNU Compiler
|
||
Collection, was first released in 1987. Other important languages
|
||
were designed based on C: once you know C, it gives you a useful base
|
||
for learning C@t{++}, C#, Java, Scala, D, Go, and more.
|
||
|
||
The special advantage of C is that it is fairly simple while allowing
|
||
close access to the computer's hardware, which previously required
|
||
writing in assembler language to describe the individual machine
|
||
instructions. Some have called C a ``high-level assembler language''
|
||
because of its explicit pointers and lack of automatic management of
|
||
storage. As one wag put it, ``C combines the power of assembler
|
||
language with the convenience of assembler language.'' However, C is
|
||
far more portable, and much easier to read and write, than assembler
|
||
language.
|
||
|
||
This manual describes the GNU C language supported by the GNU Compiler
|
||
Collection, as of roughly 2017. Please inform us of any changes
|
||
needed to match the current version of GNU C.
|
||
|
||
When a construct may be absent or work differently in other C
|
||
compilers, we say so. When it is not part of ISO standard C, we say
|
||
it is a ``GNU C extension,'' because it is useful to know that.
|
||
However, standards and other dialects are secondary topics for this
|
||
manual. For simplicity's sake, we keep those notes short, unless it
|
||
is vital to say more.
|
||
|
||
Some aspects of the meaning of C programs depend on the target
|
||
platform: which computer, and which operating system, the compiled
|
||
code will run on. Where this is the case, we say so.
|
||
|
||
When compiling for a ``real computer'', one that is a reasonable
|
||
platform for running the GNU/Linux system, the type @code{int} is
|
||
always 32 bits in size. This manual assumes you are compiling for the
|
||
computer where you are running the compiler, which implies @code{int}
|
||
has that size. GNU C can also compile code for some microprocessors
|
||
on which type @code{int} has fewer bits, but this manual does not try
|
||
to cover the complications of those peculiar platforms.
|
||
|
||
We hardly mention C@t{++} or other languages that the GNU
|
||
Compiler Collection supports. We hope this manual will serve as a
|
||
base for writing manuals for those languages, but languages so
|
||
different can't share one common manual.
|
||
|
||
The C language provides no built-in facilities for performing such
|
||
common operations as input/output, memory management, string
|
||
manipulation, and the like. Instead, these facilities are provided by
|
||
functions defined in the standard library, which is automatically
|
||
available in every C program. @xref{Top, The GNU C Library, , libc,
|
||
The GNU C Library Reference Manual}.
|
||
|
||
Most GNU/Linux systems use the GNU C Library to provide those facilities.
|
||
It is itself written in C, so once you know C you can read its source
|
||
code and see how its library functions do their jobs. Some fraction
|
||
of the functions are implemented as @dfn{system calls}, which means
|
||
they contain a special instruction that asks the system kernel (Linux)
|
||
to do a specific task. To understand how those are implemented, you'd
|
||
need to read Linux source code. Whether a library function is
|
||
a system call is an internal implementation detail that makes no
|
||
difference for how to call the function.
|
||
|
||
This manual incorporates the former GNU C Preprocessor Manual, which
|
||
was among the earliest GNU manuals. It also uses some text from the
|
||
earlier GNU C Manual that was written by Trevis Rothwell and James
|
||
Youngman.
|
||
|
||
GNU C has many obscure features, each one either for historical
|
||
compatibility or meant for very special situations. We have left them
|
||
to a companion manual, the GNU C Obscurities Manual, which will be
|
||
published digitally later.
|
||
|
||
Please report errors and suggestions to c-manual@@gnu.org.
|
||
|
||
@menu
|
||
* The First Example:: Getting started with basic C code.
|
||
* Complete Program:: A whole example program
|
||
that can be compiled and run.
|
||
* Storage:: Basic layout of storage; bytes.
|
||
* Beyond Integers:: Exploring different numeric types.
|
||
* Lexical Syntax:: The various lexical components of C programs.
|
||
* Arithmetic:: Numeric computations.
|
||
* Assignment Expressions:: Storing values in variables.
|
||
* Execution Control Expressions:: Expressions combining values in various ways.
|
||
* Binary Operator Grammar:: An overview of operator precedence.
|
||
* Order of Execution:: The order of program execution.
|
||
* Primitive Types:: More details about primitive data types.
|
||
* Constants:: Explicit constant values:
|
||
details and examples.
|
||
* Type Size:: The memory space occupied by a type.
|
||
* Pointers:: Creating and manipulating memory pointers.
|
||
* Structures:: Compound data types built
|
||
by grouping other types.
|
||
* Arrays:: Creating and manipulating arrays.
|
||
* Enumeration Types:: Sets of integers with named values.
|
||
* Defining Typedef Names:: Using @code{typedef} to define type names.
|
||
* Statements:: Controlling program flow.
|
||
* Variables:: Details about declaring, initializing,
|
||
and using variables.
|
||
* Type Qualifiers:: Mark variables for certain intended uses.
|
||
* Functions:: Declaring, defining, and calling functions.
|
||
* Compatible Types:: How to tell if two types are compatible
|
||
with each other.
|
||
* Type Conversions:: Converting between types.
|
||
* Scope:: Different categories of identifier scope.
|
||
* Preprocessing:: Using the GNU C preprocessor.
|
||
* Integers in Depth:: How integer numbers are represented.
|
||
* Floating Point in Depth:: How floating-point numbers are represented.
|
||
* Compilation:: How to compile multi-file programs.
|
||
* Directing Compilation:: Operations that affect compilation
|
||
but don't change the program.
|
||
|
||
Appendices
|
||
|
||
* Type Alignment:: Where in memory a type can validly start.
|
||
* Aliasing:: Accessing the same data in two types.
|
||
* Digraphs:: Two-character aliases for some characters.
|
||
* Attributes:: Specifying additional information
|
||
in a declaration.
|
||
* Signals:: Fatal errors triggered in various scenarios.
|
||
* GNU Free Documentation License:: The license for this manual.
|
||
* GNU General Public License::
|
||
* Symbol Index:: Keyword and symbol index.
|
||
* Concept Index:: Detailed topical index.
|
||
|
||
@detailmenu
|
||
--- The Detailed Node Listing ---
|
||
|
||
* Recursive Fibonacci:: Writing a simple function recursively.
|
||
* Stack:: Each function call uses space in the stack.
|
||
* Iterative Fibonacci:: Writing the same function iteratively.
|
||
* Complete Example:: Turn the simple function into a full program.
|
||
* Complete Explanation:: Explanation of each part of the example.
|
||
* Complete Line-by-Line:: Explaining each line of the example.
|
||
* Compile Example:: Using GCC to compile the example.
|
||
* Float Example:: A function that uses floating-point numbers.
|
||
* Array Example:: A function that works with arrays.
|
||
* Array Example Call:: How to call that function.
|
||
* Array Example Variations:: Different ways to write the call example.
|
||
|
||
Lexical Syntax
|
||
|
||
* English:: Write programs in English!
|
||
* Characters:: The characters allowed in C programs.
|
||
* Whitespace:: The particulars of whitespace characters.
|
||
* Comments:: How to include comments in C code.
|
||
* Identifiers:: How to form identifiers (names).
|
||
* Operators/Punctuation:: Characters used as operators or punctuation.
|
||
* Line Continuation:: Splitting one line into multiple lines.
|
||
* Digraphs:: Two-character substitutes for some characters.
|
||
|
||
Arithmetic
|
||
|
||
* Basic Arithmetic:: Addition, subtraction, multiplication,
|
||
and division.
|
||
* Integer Arithmetic:: How C performs arithmetic with integer values.
|
||
* Integer Overflow:: When an integer value exceeds the range
|
||
of its type.
|
||
* Mixed Mode:: Calculating with both integer values
|
||
and floating-point values.
|
||
* Division and Remainder:: How integer division works.
|
||
* Numeric Comparisons:: Comparing numeric values for
|
||
equality or order.
|
||
* Shift Operations:: Shift integer bits left or right.
|
||
* Bitwise Operations:: Bitwise conjunction, disjunction, negation.
|
||
|
||
Assignment Expressions
|
||
|
||
* Simple Assignment:: The basics of storing a value.
|
||
* Lvalues:: Expressions into which a value can be stored.
|
||
* Modifying Assignment:: Shorthand for changing an lvalue's contents.
|
||
* Increment/Decrement:: Shorthand for incrementing and decrementing
|
||
an lvalue's contents.
|
||
* Postincrement/Postdecrement:: Accessing then incrementing or decrementing.
|
||
* Assignment in Subexpressions:: How to avoid ambiguity.
|
||
* Write Assignments Separately:: Write assignments as separate statements.
|
||
|
||
Execution Control Expressions
|
||
|
||
* Logical Operators:: Logical conjunction, disjunction, negation.
|
||
* Logicals and Comparison:: Logical operators with comparison operators.
|
||
* Logicals and Assignments:: Assignments with logical operators.
|
||
* Conditional Expression:: An if/else construct inside expressions.
|
||
* Comma Operator:: Build a sequence of subexpressions.
|
||
|
||
Order of Execution
|
||
|
||
* Reordering of Operands:: Operations in C are not necessarily computed
|
||
in the order they are written.
|
||
* Associativity and Ordering:: Some associative operations are performed
|
||
in a particular order; others are not.
|
||
* Sequence Points:: Some guarantees about the order of operations.
|
||
* Postincrement and Ordering:: Ambiguous execution order with postincrement.
|
||
* Ordering of Operands:: Evaluation order of operands
|
||
and function arguments.
|
||
* Optimization and Ordering:: Compiler optimizations can reorder operations
|
||
only if it has no impact on program results.
|
||
|
||
Primitive Data Types
|
||
|
||
* Integer Types:: Description of integer types.
|
||
* Floating-Point Data Types:: Description of floating-point types.
|
||
* Complex Data Types:: Description of complex number types.
|
||
* The Void Type:: A type indicating no value at all.
|
||
* Other Data Types:: A brief summary of other types.
|
||
|
||
Constants
|
||
|
||
* Integer Constants:: Literal integer values.
|
||
* Integer Const Type:: Types of literal integer values.
|
||
* Floating Constants:: Literal floating-point values.
|
||
* Imaginary Constants:: Literal imaginary number values.
|
||
* Invalid Numbers:: Avoiding preprocessing number misconceptions.
|
||
* Character Constants:: Literal character values.
|
||
* Unicode Character Codes:: Unicode characters represented
|
||
in either UTF-16 or UTF-32.
|
||
* Wide Character Constants:: Literal characters values larger than 8 bits.
|
||
* String Constants:: Literal string values.
|
||
* UTF-8 String Constants:: Literal UTF-8 string values.
|
||
* Wide String Constants:: Literal string values made up of
|
||
16- or 32-bit characters.
|
||
|
||
Pointers
|
||
|
||
* Address of Data:: Using the ``address-of'' operator.
|
||
* Pointer Types:: For each type, there is a pointer type.
|
||
* Pointer Declarations:: Declaring variables with pointer types.
|
||
* Pointer Type Designators:: Designators for pointer types.
|
||
* Pointer Dereference:: Accessing what a pointer points at.
|
||
* Null Pointers:: Pointers which do not point to any object.
|
||
* Invalid Dereference:: Dereferencing null or invalid pointers.
|
||
* Void Pointers:: Totally generic pointers, can cast to any.
|
||
* Pointer Comparison:: Comparing memory address values.
|
||
* Pointer Arithmetic:: Computing memory address values.
|
||
* Pointers and Arrays:: Using pointer syntax instead of array syntax.
|
||
* Low-Level Pointer Arithmetic:: More about computing memory address values.
|
||
* Pointer Increment/Decrement:: Incrementing and decrementing pointers.
|
||
* Pointer Arithmetic Drawbacks:: A common pointer bug to watch out for.
|
||
* Pointer-Integer Conversion:: Converting pointer types to integer types.
|
||
* Printing Pointers:: Using @code{printf} for a pointer's value.
|
||
|
||
Structures
|
||
|
||
* Referencing Fields:: Accessing field values in a structure object.
|
||
* Arrays as Fields:: Accessing arrays as structure fields.
|
||
* Dynamic Memory Allocation:: Allocating space for objects
|
||
while the program is running.
|
||
* Field Offset:: Memory layout of fields within a structure.
|
||
* Structure Layout:: Planning the memory layout of fields.
|
||
* Packed Structures:: Packing structure fields as close as possible.
|
||
* Bit Fields:: Dividing integer fields
|
||
into fields with fewer bits.
|
||
* Bit Field Packing:: How bit fields pack together in integers.
|
||
* const Fields:: Making structure fields immutable.
|
||
* Zero Length:: Zero-length array as a variable-length object.
|
||
* Flexible Array Fields:: Another approach to variable-length objects.
|
||
* Overlaying Structures:: Casting one structure type
|
||
over an object of another structure type.
|
||
* Structure Assignment:: Assigning values to structure objects.
|
||
* Unions:: Viewing the same object in different types.
|
||
* Packing With Unions:: Using a union type to pack various types into
|
||
the same memory space.
|
||
* Cast to Union:: Casting a value one of the union's alternative
|
||
types to the type of the union itself.
|
||
* Structure Constructors:: Building new structure objects.
|
||
* Unnamed Types as Fields:: Fields' types do not always need names.
|
||
* Incomplete Types:: Types which have not been fully defined.
|
||
* Intertwined Incomplete Types:: Defining mutually-recursive structure types.
|
||
* Type Tags:: Scope of structure and union type tags.
|
||
|
||
Arrays
|
||
|
||
* Accessing Array Elements:: How to access individual elements of an array.
|
||
* Declaring an Array:: How to name and reserve space for a new array.
|
||
* Strings:: A string in C is a special case of array.
|
||
* Incomplete Array Types:: Naming, but not allocating, a new array.
|
||
* Limitations of C Arrays:: Arrays are not first-class objects.
|
||
* Multidimensional Arrays:: Arrays of arrays.
|
||
* Constructing Array Values:: Assigning values to an entire array at once.
|
||
* Arrays of Variable Length:: Declaring arrays of non-constant size.
|
||
|
||
Statements
|
||
|
||
* Expression Statement:: Evaluate an expression, as a statement,
|
||
usually done for a side effect.
|
||
* if Statement:: Basic conditional execution.
|
||
* if-else Statement:: Multiple branches for conditional execution.
|
||
* Blocks:: Grouping multiple statements together.
|
||
* return Statement:: Return a value from a function.
|
||
* Loop Statements:: Repeatedly executing a statement or block.
|
||
* switch Statement:: Multi-way conditional choices.
|
||
* switch Example:: A plausible example of using @code{switch}.
|
||
* Duffs Device:: A special way to use @code{switch}.
|
||
* Case Ranges:: Ranges of values for @code{switch} cases.
|
||
* Null Statement:: A statement that does nothing.
|
||
* goto Statement:: Jump to another point in the source code,
|
||
identified by a label.
|
||
* Local Labels:: Labels with limited scope.
|
||
* Labels as Values:: Getting the address of a label.
|
||
* Statement Exprs:: A series of statements used as an expression.
|
||
|
||
Variables
|
||
|
||
* Variable Declarations:: Name a variable and reserve space for it.
|
||
* Initializers:: Assigning initial values to variables.
|
||
* Designated Inits:: Assigning initial values to array elements
|
||
at particular array indices.
|
||
* Auto Type:: Obtaining the type of a variable.
|
||
* Local Variables:: Variables declared in function definitions.
|
||
* File-Scope Variables:: Variables declared outside of
|
||
function definitions.
|
||
* Static Local Variables:: Variables declared within functions,
|
||
but with permanent storage allocation.
|
||
* Extern Declarations:: Declaring a variable
|
||
which is allocated somewhere else.
|
||
* Allocating File-Scope:: When is space allocated
|
||
for file-scope variables?
|
||
* auto and register:: Historically used storage directions.
|
||
* Omitting Types:: The bad practice of declaring variables
|
||
with implicit type.
|
||
|
||
Type Qualifiers
|
||
|
||
* const:: Variables whose values don't change.
|
||
* volatile:: Variables whose values may be accessed
|
||
or changed outside of the control of
|
||
this program.
|
||
* restrict Pointers:: Restricted pointers for code optimization.
|
||
* restrict Pointer Example:: Example of how that works.
|
||
|
||
Functions
|
||
|
||
* Function Definitions:: Writing the body of a function.
|
||
* Function Declarations:: Declaring the interface of a function.
|
||
* Function Calls:: Using functions.
|
||
* Function Call Semantics:: Call-by-value argument passing.
|
||
* Function Pointers:: Using references to functions.
|
||
* The main Function:: Where execution of a GNU C program begins.
|
||
|
||
Type Conversions
|
||
|
||
* Explicit Type Conversion:: Casting a value from one type to another.
|
||
* Assignment Type Conversions:: Automatic conversion by assignment operation.
|
||
* Argument Promotions:: Automatic conversion of function parameters.
|
||
* Operand Promotions:: Automatic conversion of arithmetic operands.
|
||
* Common Type:: When operand types differ, which one is used?
|
||
|
||
Scope
|
||
|
||
* Scope:: Different categories of identifier scope.
|
||
|
||
Preprocessing
|
||
|
||
* Preproc Overview:: Introduction to the C preprocessor.
|
||
* Directives:: The form of preprocessor directives.
|
||
* Preprocessing Tokens:: The lexical elements of preprocessing.
|
||
* Header Files:: Including one source file in another.
|
||
* Macros:: Macro expansion by the preprocessor.
|
||
* Conditionals:: Controlling whether to compile some lines
|
||
or ignore them.
|
||
* Diagnostics:: Reporting warnings and errors.
|
||
* Line Control:: Reporting source line numbers.
|
||
* Null Directive:: A preprocessing no-op.
|
||
|
||
Integers in Depth
|
||
|
||
* Integer Representations:: How integer values appear in memory.
|
||
* Maximum and Minimum Values:: Value ranges of integer types.
|
||
|
||
Floating Point in Depth
|
||
|
||
* Floating Representations:: How floating-point values appear in memory.
|
||
* Floating Type Specs:: Precise details of memory representations.
|
||
* Special Float Values:: Infinity, Not a Number, and Subnormal Numbers.
|
||
* Invalid Optimizations:: Don't mess up non-numbers and signed zeros.
|
||
* Exception Flags:: Handling certain conditions in floating point.
|
||
* Exact Floating-Point:: Not all floating calculations lose precision.
|
||
* Rounding:: When a floating result can't be represented
|
||
exactly in the floating-point type in use.
|
||
* Rounding Issues:: Avoid magnifying rounding errors.
|
||
* Significance Loss:: Subtracting numbers that are almost equal.
|
||
* Fused Multiply-Add:: Taking advantage of a special floating-point
|
||
instruction for faster execution.
|
||
* Error Recovery:: Determining rounding errors.
|
||
* Exact Floating Constants:: Precisely specified floating-point numbers.
|
||
* Handling Infinity:: When floating calculation is out of range.
|
||
* Handling NaN:: What floating calculation is undefined.
|
||
* Signed Zeros:: Positive zero vs. negative zero.
|
||
* Scaling by the Base:: A useful exact floating-point operation.
|
||
* Rounding Control:: Specifying some rounding behaviors.
|
||
* Machine Epsilon:: The smallest number you can add to 1.0
|
||
and get a sum which is larger than 1.0.
|
||
* Complex Arithmetic:: Details of arithmetic with complex numbers.
|
||
* Round-Trip Base Conversion:: What happens between base-2 and base-10.
|
||
* Further Reading:: References for floating-point numbers.
|
||
|
||
Directing Compilation
|
||
|
||
* Pragmas:: Controlling compilation of some constructs.
|
||
* Static Assertions:: Compile-time tests for conditions.
|
||
|
||
@end detailmenu
|
||
@end menu
|
||
|
||
@node The First Example
|
||
@chapter The First Example
|
||
|
||
This chapter presents the source code for a very simple C program and
|
||
uses it to explain a few features of the language. If you already
|
||
know the basic points of C presented in this chapter, you can skim it
|
||
or skip it.
|
||
|
||
We present examples of C source code (other than comments) using a
|
||
fixed-width typeface, since that's the way they look when you edit
|
||
them in an editor such as GNU Emacs.
|
||
|
||
@menu
|
||
* Recursive Fibonacci:: Writing a simple function recursively.
|
||
* Stack:: Each function call uses space in the stack.
|
||
* Iterative Fibonacci:: Writing the same function iteratively.
|
||
@end menu
|
||
|
||
@node Recursive Fibonacci
|
||
@section Example: Recursive Fibonacci
|
||
@cindex recursive Fibonacci function
|
||
@cindex Fibonacci function, recursive
|
||
|
||
To introduce the most basic features of C, let's look at code for a
|
||
simple mathematical function that does calculations on integers. This
|
||
function calculates the @var{n}th number in the Fibonacci series, in
|
||
which each number is the sum of the previous two: 1, 1, 2, 3, 5, 8,
|
||
13, 21, 34, 55, @dots{}.
|
||
|
||
@example
|
||
int
|
||
fib (int n)
|
||
@{
|
||
if (n <= 2) /* @r{This avoids infinite recursion.} */
|
||
return 1;
|
||
else
|
||
return fib (n - 1) + fib (n - 2);
|
||
@}
|
||
@end example
|
||
|
||
This very simple program illustrates several features of C:
|
||
|
||
@itemize @bullet
|
||
@item
|
||
A function definition, whose first two lines constitute the function
|
||
header. @xref{Function Definitions}.
|
||
|
||
@item
|
||
A function parameter @code{n}, referred to as the variable @code{n}
|
||
inside the function body. @xref{Function Parameter Variables}.
|
||
A function definition uses parameters to refer to the argument
|
||
values provided in a call to that function.
|
||
|
||
@item
|
||
Arithmetic. C programs add with @samp{+} and subtract with
|
||
@samp{-}. @xref{Arithmetic}.
|
||
|
||
@item
|
||
Numeric comparisons. The operator @samp{<=} tests for ``less than or
|
||
equal.'' @xref{Numeric Comparisons}.
|
||
|
||
@item
|
||
Integer constants written in base 10.
|
||
@xref{Integer Constants}.
|
||
|
||
@item
|
||
A function call. The function call @code{fib (n - 1)} calls the
|
||
function @code{fib}, passing as its argument the value @code{n - 1}.
|
||
@xref{Function Calls}.
|
||
|
||
@item
|
||
A comment, which starts with @samp{/*} and ends with @samp{*/}. The
|
||
comment has no effect on the execution of the program. Its purpose is
|
||
to provide explanations to people reading the source code. Including
|
||
comments in the code is tremendously important---they provide
|
||
background information so others can understand the code more quickly.
|
||
@xref{Comments}.
|
||
|
||
In this manual, we present comment text in the variable-width typeface
|
||
used for the text of the chapters, not in the fixed-width typeface
|
||
used for the rest of the code. That is to make comments easier to
|
||
read. This distinction of typeface does not exist in a real file of C
|
||
source code.
|
||
|
||
@item
|
||
Two kinds of statements, the @code{return} statement and the
|
||
@code{if}@dots{}@code{else} statement. @xref{Statements}.
|
||
|
||
@item
|
||
Recursion. The function @code{fib} calls itself; that is called a
|
||
@dfn{recursive call}. These are valid in C, and quite common.
|
||
|
||
The @code{fib} function would not be useful if it didn't return.
|
||
Thus, recursive definitions, to be of any use, must avoid
|
||
@dfn{infinite recursion}.
|
||
|
||
This function definition prevents infinite recursion by specially
|
||
handling the case where @code{n} is two or less. Thus the maximum
|
||
depth of recursive calls is less than @code{n}.
|
||
@end itemize
|
||
|
||
@menu
|
||
* Function Header:: The function's name and how it is called.
|
||
* Function Body:: Declarations and statements that implement the function.
|
||
@end menu
|
||
|
||
@node Function Header
|
||
@subsection Function Header
|
||
@cindex function header
|
||
|
||
In our example, the first two lines of the function definition are the
|
||
@dfn{header}. Its purpose is to state the function's name and say how
|
||
it is called:
|
||
|
||
@example
|
||
int
|
||
fib (int n)
|
||
@end example
|
||
|
||
@noindent
|
||
says that the function returns an integer (type @code{int}), its name is
|
||
@code{fib}, and it takes one argument named @code{n} which is also an
|
||
integer. (Data types will be explained later, in @ref{Primitive Types}.)
|
||
|
||
@node Function Body
|
||
@subsection Function Body
|
||
@cindex function body
|
||
@cindex recursion
|
||
|
||
The rest of the function definition is called the @dfn{function body}.
|
||
Like every function body, this one starts with @samp{@{}, ends with
|
||
@samp{@}}, and contains zero or more @dfn{statements} and
|
||
@dfn{declarations}. Statements specify actions to take, whereas
|
||
declarations define names of variables, functions, and so on. Each
|
||
statement and each declaration ends with a semicolon (@samp{;}).
|
||
|
||
Statements and declarations often contain @dfn{expressions}; an
|
||
expression is a construct whose execution produces a @dfn{value} of
|
||
some data type, but may also take actions through ``side effects''
|
||
that alter subsequent execution. A statement, by contrast, does not
|
||
have a value; it affects further execution of the program only through
|
||
the actions it takes.
|
||
|
||
This function body contains no declarations, and just one statement,
|
||
but that one is a complex statement in that it contains nested
|
||
statements. This function uses two kinds of statements:
|
||
|
||
@table @code
|
||
@item return
|
||
The @code{return} statement makes the function return immediately.
|
||
It looks like this:
|
||
|
||
@example
|
||
return @var{value};
|
||
@end example
|
||
|
||
Its meaning is to compute the expression @var{value} and exit the
|
||
function, making it return whatever value that expression produced.
|
||
For instance,
|
||
|
||
@example
|
||
return 1;
|
||
@end example
|
||
|
||
@noindent
|
||
returns the integer 1 from the function, and
|
||
|
||
@example
|
||
return fib (n - 1) + fib (n - 2);
|
||
@end example
|
||
|
||
@noindent
|
||
returns a value computed by performing two function calls
|
||
as specified and adding their results.
|
||
|
||
@item @code{if}@dots{}@code{else}
|
||
The @code{if}@dots{}@code{else} statement is a @dfn{conditional}.
|
||
Each time it executes, it chooses one of its two substatements to execute
|
||
and ignores the other. It looks like this:
|
||
|
||
@example
|
||
if (@var{condition})
|
||
@var{if-true-statement}
|
||
else
|
||
@var{if-false-statement}
|
||
@end example
|
||
|
||
Its meaning is to compute the expression @var{condition} and, if it's
|
||
``true,'' execute @var{if-true-statement}. Otherwise, execute
|
||
@var{if-false-statement}. @xref{if-else Statement}.
|
||
|
||
Inside the @code{if}@dots{}@code{else} statement, @var{condition} is
|
||
simply an expression. It's considered ``true'' if its value is
|
||
nonzero. (A comparison operation, such as @code{n <= 2}, produces the
|
||
value 1 if it's ``true'' and 0 if it's ``false.'' @xref{Numeric
|
||
Comparisons}.) Thus,
|
||
|
||
@example
|
||
if (n <= 2)
|
||
return 1;
|
||
else
|
||
return fib (n - 1) + fib (n - 2);
|
||
@end example
|
||
|
||
@noindent
|
||
first tests whether the value of @code{n} is less than or equal to 2.
|
||
If so, the expression @code{n <= 2} has the value 1. So execution
|
||
continues with the statement
|
||
|
||
@example
|
||
return 1;
|
||
@end example
|
||
|
||
@noindent
|
||
Otherwise, execution continues with this statement:
|
||
|
||
@example
|
||
return fib (n - 1) + fib (n - 2);
|
||
@end example
|
||
|
||
Each of these statements ends the execution of the function and
|
||
provides a value for it to return. @xref{return Statement}.
|
||
@end table
|
||
|
||
Calculating @code{fib} using ordinary integers in C works only for
|
||
@var{n} < 47 because the value of @code{fib (47)} is too large to fit
|
||
in type @code{int}. In GNU C, type @code{int} holds 32 bits
|
||
(@pxref{Integer Types}), so the addition operation that tries to add
|
||
@code{fib (46)} and @code{fib (45)} cannot deliver the correct result.
|
||
This occurrence is called @dfn{integer overflow}.
|
||
|
||
Overflow can manifest itself in various ways, but one thing that can't
|
||
possibly happen is to produce the correct value, since that can't fit
|
||
in the space for the value. @xref{Integer Overflow}, for more details
|
||
about this situation.
|
||
|
||
@xref{Functions}, for a full explanation about functions.
|
||
|
||
@node Stack
|
||
@section The Stack, And Stack Overflow
|
||
@cindex stack
|
||
@cindex stack frame
|
||
@cindex stack overflow
|
||
@cindex recursion, drawbacks of
|
||
|
||
@cindex stack frame
|
||
Recursion has a drawback: there are limits to how many nested levels of
|
||
function calls a program can make. In C, each function call allocates a block
|
||
of memory which it uses until the call returns. C allocates these
|
||
blocks consecutively within a large area of memory known as the
|
||
@dfn{stack}, so we refer to the blocks as @dfn{stack frames}.
|
||
|
||
The size of the stack is limited; if the program tries to use too
|
||
much, that causes the program to fail because the stack is full. This
|
||
is called @dfn{stack overflow}.
|
||
|
||
@cindex crash
|
||
@cindex segmentation fault
|
||
Stack overflow on GNU/Linux typically manifests itself as the
|
||
@dfn{signal} named @code{SIGSEGV}, also known as a ``segmentation
|
||
fault.'' By default, this signal terminates the program immediately,
|
||
rather than letting the program try to recover, or reach an expected
|
||
ending point. (We commonly say in this case that the program
|
||
``crashes.'') @xref{Signals}.
|
||
|
||
It is inconvenient to observe a crash by passing too large
|
||
an argument to recursive Fibonacci, because the program would run a
|
||
long time before it crashes. This algorithm is simple but
|
||
ridiculously slow: in calculating @code{fib (@var{n})}, the number of
|
||
(recursive) calls @code{fib (1)} or @code{fib (2)} that it makes equals
|
||
the final result.
|
||
|
||
However, you can observe stack overflow very quickly if you use
|
||
this function instead:
|
||
|
||
@example
|
||
int
|
||
fill_stack (int n)
|
||
@{
|
||
if (n <= 1) /* @r{This limits the depth of recursion.} */
|
||
return 1;
|
||
else
|
||
return fill_stack (n - 1);
|
||
@}
|
||
@end example
|
||
|
||
Under gNewSense GNU/Linux on the Lemote Yeeloong, without optimization
|
||
and using the default configuration, an experiment showed there is
|
||
enough stack space to do 261906 nested calls to that function. One
|
||
more, and the stack overflows and the program crashes. On another
|
||
platform, with a different configuration, or with a different
|
||
function, the limit might be bigger or smaller.
|
||
|
||
@node Iterative Fibonacci
|
||
@section Example: Iterative Fibonacci
|
||
@cindex iterative Fibonacci function
|
||
@cindex Fibonacci function, iterative
|
||
|
||
Here's a much faster algorithm for computing the same Fibonacci
|
||
series. It is faster for two reasons. First, it uses @dfn{iteration}
|
||
(that is, repetition or looping) rather than recursion, so it doesn't
|
||
take time for a large number of function calls. But mainly, it is
|
||
faster because the number of repetitions is small---only @code{@var{n}}.
|
||
|
||
@c If you change this, change the duplicate in node Example of for.
|
||
|
||
@example
|
||
int
|
||
fib (int n)
|
||
@{
|
||
int last = 1; /* @r{Initial value is @code{fib (1)}.} */
|
||
int prev = 0; /* @r{Initial value controls @code{fib (2)}.} */
|
||
int i;
|
||
|
||
for (i = 1; i < n; ++i)
|
||
/* @r{If @code{n} is 1 or less, the loop runs zero times,} */
|
||
/* @r{since in that case @code{i < n} is false the first time.} */
|
||
@{
|
||
/* @r{Now @code{last} is @code{fib (@code{i})}}
|
||
@r{and @code{prev} is @code{fib (@code{i} - 1)}.} */
|
||
/* @r{Compute @code{fib (@code{i} + 1)}.} */
|
||
int next = prev + last;
|
||
/* @r{Shift the values down.} */
|
||
prev = last;
|
||
last = next;
|
||
/* @r{Now @code{last} is @code{fib (@code{i} + 1)}}
|
||
@r{and @code{prev} is @code{fib (@code{i})}.}
|
||
@r{But that won't stay true for long,}
|
||
@r{because we are about to increment @code{i}.} */
|
||
@}
|
||
|
||
return last;
|
||
@}
|
||
@end example
|
||
|
||
This definition computes @code{fib (@var{n})} in a time proportional
|
||
to @code{@var{n}}. The comments in the definition explain how it works: it
|
||
advances through the series, always keeps the last two values in
|
||
@code{last} and @code{prev}, and adds them to get the next value.
|
||
|
||
Here are the additional C features that this definition uses:
|
||
|
||
@table @asis
|
||
@item Internal blocks
|
||
Within a function, wherever a statement is called for, you can write a
|
||
@dfn{block}. It looks like @code{@{ @r{@dots{}} @}} and contains zero or
|
||
more statements and declarations. (You can also use additional
|
||
blocks as statements in a block.)
|
||
|
||
The function body also counts as a block, which is why it can contain
|
||
statements and declarations.
|
||
|
||
@xref{Blocks}.
|
||
|
||
@item Declarations of local variables
|
||
This function body contains declarations as well as statements. There
|
||
are three declarations directly in the function body, as well as a
|
||
fourth declaration in an internal block. Each starts with @code{int}
|
||
because it declares a variable whose type is integer. One declaration
|
||
can declare several variables, but each of these declarations is
|
||
simple and declares just one variable.
|
||
|
||
Variables declared inside a block (either a function body or an
|
||
internal block) are @dfn{local variables}. These variables exist only
|
||
within that block; their names are not defined outside the block, and
|
||
exiting the block deallocates their storage. This example declares
|
||
four local variables: @code{last}, @code{prev}, @code{i}, and
|
||
@code{next}.
|
||
|
||
The most basic local variable declaration looks like this:
|
||
|
||
@example
|
||
@var{type} @var{variablename};
|
||
@end example
|
||
|
||
For instance,
|
||
|
||
@example
|
||
int i;
|
||
@end example
|
||
|
||
@noindent
|
||
declares the local variable @code{i} as an integer.
|
||
@xref{Variable Declarations}.
|
||
|
||
@item Initializers
|
||
When you declare a variable, you can also specify its initial value,
|
||
like this:
|
||
|
||
@example
|
||
@var{type} @var{variablename} = @var{value};
|
||
@end example
|
||
|
||
For instance,
|
||
|
||
@example
|
||
int last = 1;
|
||
@end example
|
||
|
||
@noindent
|
||
declares the local variable @code{last} as an integer (type
|
||
@code{int}) and starts it off with the value 1. @xref{Initializers}.
|
||
|
||
@item Assignment
|
||
Assignment: a specific kind of expression, written with the @samp{=}
|
||
operator, that stores a new value in a variable or other place. Thus,
|
||
|
||
@example
|
||
@var{variable} = @var{value}
|
||
@end example
|
||
|
||
@noindent
|
||
is an expression that computes @code{@var{value}} and stores the value in
|
||
@code{@var{variable}}. @xref{Assignment Expressions}.
|
||
|
||
@item Expression statements
|
||
An expression statement is an expression followed by a semicolon.
|
||
That computes the value of the expression, then ignores the value.
|
||
|
||
An expression statement is useful when the expression changes some
|
||
data or has other side effects---for instance, with function calls, or
|
||
with assignments as in this example. @xref{Expression Statement}.
|
||
|
||
Using an expression with no side effects in an expression statement is
|
||
pointless; for instance, the expression statement @code{x;} would
|
||
examine the value of @code{x} and ignore it. That is not
|
||
useful.@footnote{Computing an expression and ignoring the result can
|
||
be useful in peculiar cases. For instance, dereferencing a pointer
|
||
and ignoring the value is a way to cause a fault if a pointer value is
|
||
invalid. @xref{Signals}. But you may need to declare the pointer
|
||
target @code{volatile} or the dereference may be optimized away.
|
||
@xref{volatile}.}
|
||
|
||
@item Increment operator
|
||
The increment operator is @samp{++}. @code{++i} is an
|
||
expression that is short for @code{i = i + 1}.
|
||
@xref{Increment/Decrement}.
|
||
|
||
@item @code{for} statements
|
||
A @code{for} statement is a clean way of executing a statement
|
||
repeatedly---a @dfn{loop} (@pxref{Loop Statements}). Specifically,
|
||
|
||
@example
|
||
for (i = 1; i < n; ++i)
|
||
@var{body}
|
||
@end example
|
||
|
||
@noindent
|
||
means to start by doing @code{i = 1} (set @code{i} to one) to prepare
|
||
for the loop. The loop itself consists of
|
||
|
||
@itemize @bullet
|
||
@item
|
||
Testing @code{i < n} and exiting the loop if that's false.
|
||
|
||
@item
|
||
Executing @var{body}.
|
||
|
||
@item
|
||
Advancing the loop (executing @code{++i}, which increments @code{i}).
|
||
@end itemize
|
||
|
||
The net result is to execute @var{body} with 1 in @code{i},
|
||
then with 2 in @code{i}, and so on, stopping just before the repetition
|
||
where @code{i} would equal @code{n}. If @code{n} is less than 1,
|
||
the loop will execute the body zero times.
|
||
|
||
The body of the @code{for} statement must be one and only one
|
||
statement. You can't write two statements in a row there; if you try
|
||
to, only the first of them will be treated as part of the loop.
|
||
|
||
The way to put multiple statements in such a place is to group them
|
||
with a block, and that's what we do in this example.
|
||
@end table
|
||
|
||
@node Complete Program
|
||
@chapter A Complete Program
|
||
@cindex complete example program
|
||
@cindex example program, complete
|
||
|
||
It's all very well to write a Fibonacci function, but you cannot run
|
||
it by itself. It is a useful program, but it is not a complete
|
||
program.
|
||
|
||
In this chapter we present a complete program that contains the
|
||
@code{fib} function. This example shows how to make the program
|
||
start, how to make it finish, how to do computation, and how to print
|
||
a result.
|
||
|
||
@menu
|
||
* Complete Example:: Turn the simple function into a full program.
|
||
* Complete Explanation:: Explanation of each part of the example.
|
||
* Complete Line-by-Line:: Explaining each line of the example.
|
||
* Compile Example:: Using GCC to compile the example.
|
||
@end menu
|
||
|
||
@node Complete Example
|
||
@section Complete Program Example
|
||
|
||
Here is the complete program that uses the simple, recursive version
|
||
of the @code{fib} function (@pxref{Recursive Fibonacci}):
|
||
|
||
@example
|
||
#include <stdio.h>
|
||
|
||
int
|
||
fib (int n)
|
||
@{
|
||
if (n <= 2) /* @r{This avoids infinite recursion.} */
|
||
return 1;
|
||
else
|
||
return fib (n - 1) + fib (n - 2);
|
||
@}
|
||
|
||
int
|
||
main (void)
|
||
@{
|
||
printf ("Fibonacci series item %d is %d\n",
|
||
20, fib (20));
|
||
return 0;
|
||
@}
|
||
@end example
|
||
|
||
@noindent
|
||
This program prints a message that shows the value of @code{fib (20)}.
|
||
|
||
Now for an explanation of what that code means.
|
||
|
||
@node Complete Explanation
|
||
@section Complete Program Explanation
|
||
|
||
@ifnottex
|
||
Here's the explanation of the code of the example in the
|
||
previous section.
|
||
@end ifnottex
|
||
|
||
This sample program prints a message that shows the value of @code{fib
|
||
(20)}, and exits with code 0 (which stands for successful execution).
|
||
|
||
Every C program is started by running the function named @code{main}.
|
||
Therefore, the example program defines a function named @code{main} to
|
||
provide a way to start it. Whatever that function does is what the
|
||
program does. @xref{The main Function}.
|
||
|
||
The @code{main} function is the first one called when the program
|
||
runs, but it doesn't come first in the example code. The order of the
|
||
function definitions in the source code makes no difference to the
|
||
program's meaning.
|
||
|
||
The initial call to @code{main} always passes certain arguments, but
|
||
@code{main} does not have to pay attention to them. To ignore those
|
||
arguments, define @code{main} with @code{void} as the parameter list.
|
||
(@code{void} as a function's parameter list normally means ``call with
|
||
no arguments,'' but @code{main} is a special case.)
|
||
|
||
The function @code{main} returns 0 because that is
|
||
the conventional way for @code{main} to indicate successful execution.
|
||
It could instead return a positive integer to indicate failure, and
|
||
some utility programs have specific conventions for the meaning of
|
||
certain numeric @dfn{failure codes}. @xref{Values from main}.
|
||
|
||
@cindex @code{printf}
|
||
The simplest way to print text in C is by calling the @code{printf}
|
||
function, so here we explain very briefly what that function does.
|
||
For a full explanation of @code{printf} and the other standard I/O
|
||
functions, see @ref{Input/Output on Streams, The GNU C Library, ,
|
||
libc, The GNU C Library Reference Manual}.
|
||
|
||
@cindex standard output
|
||
The first argument to @code{printf} is a @dfn{string constant}
|
||
(@pxref{String Constants}) that is a template for output. The
|
||
function @code{printf} copies most of that string directly as output,
|
||
including the newline character at the end of the string, which is
|
||
written as @samp{\n}. The output goes to the program's @dfn{standard
|
||
output} destination, which in the usual case is the terminal.
|
||
|
||
@samp{%} in the template introduces a code that substitutes other text
|
||
into the output. Specifically, @samp{%d} means to take the next
|
||
argument to @code{printf} and substitute it into the text as a decimal
|
||
number. (The argument for @samp{%d} must be of type @code{int}; if it
|
||
isn't, @code{printf} will malfunction.) So the output is a line that
|
||
looks like this:
|
||
|
||
@example
|
||
Fibonacci series item 20 is 6765
|
||
@end example
|
||
|
||
This program does not contain a definition for @code{printf} because
|
||
it is defined by the C library, which makes it available in all C
|
||
programs. However, each program does need to @dfn{declare}
|
||
@code{printf} so it will be called correctly. The @code{#include}
|
||
line takes care of that; it includes a @dfn{header file} called
|
||
@file{stdio.h} into the program's code. That file is provided by the
|
||
operating system and it contains declarations for the many standard
|
||
input/output functions in the C library, one of which is
|
||
@code{printf}.
|
||
|
||
Don't worry about header files for now; we'll explain them later in
|
||
@ref{Header Files}.
|
||
|
||
The first argument of @code{printf} does not have to be a string
|
||
constant; it can be any string (@pxref{Strings}). However, using a
|
||
constant is the most common case.
|
||
|
||
@node Complete Line-by-Line
|
||
@section Complete Program, Line by Line
|
||
|
||
Here's the same example, explained line by line.
|
||
@strong{Beginners, do you find this helpful or not?
|
||
Would you prefer a different layout for the example?
|
||
Please tell rms@@gnu.org.}
|
||
|
||
@example
|
||
#include <stdio.h> /* @r{Include declaration of usual} */
|
||
/* @r{I/O functions such as @code{printf}.} */
|
||
/* @r{Most programs need these.} */
|
||
|
||
int /* @r{This function returns an @code{int}.} */
|
||
fib (int n) /* @r{Its name is @code{fib};} */
|
||
/* @r{its argument is called @code{n}.} */
|
||
@{ /* @r{Start of function body.} */
|
||
/* @r{This stops the recursion from being infinite.} */
|
||
if (n <= 2) /* @r{If @code{n} is 1 or 2,} */
|
||
return 1; /* @r{make @code{fib} return 1.} */
|
||
else /* @r{Otherwise, add the two previous} */
|
||
/* @r{Fibonacci numbers.} */
|
||
return fib (n - 1) + fib (n - 2);
|
||
@}
|
||
|
||
int /* @r{This function returns an @code{int}.} */
|
||
main (void) /* @r{Start here; ignore arguments.} */
|
||
@{ /* @r{Print message with numbers in it.} */
|
||
printf ("Fibonacci series item %d is %d\n",
|
||
20, fib (20));
|
||
return 0; /* @r{Terminate program, report success.} */
|
||
@}
|
||
@end example
|
||
|
||
@node Compile Example
|
||
@section Compiling the Example Program
|
||
@cindex compiling
|
||
@cindex executable file
|
||
|
||
To run a C program requires converting the source code into an
|
||
@dfn{executable file}. This is called @dfn{compiling} the program,
|
||
and the command to do that using GNU C is @command{gcc}.
|
||
|
||
This example program consists of a single source file. If we
|
||
call that file @file{fib1.c}, the complete command to compile it is
|
||
this:
|
||
|
||
@example
|
||
gcc -g -O -o fib1 fib1.c
|
||
@end example
|
||
|
||
@noindent
|
||
Here, @option{-g} says to generate debugging information, @option{-O}
|
||
says to optimize at the basic level, and @option{-o fib1} says to put
|
||
the executable program in the file @file{fib1}.
|
||
|
||
To run the program, use its file name as a shell command.
|
||
For instance,
|
||
|
||
@example
|
||
./fib1
|
||
@end example
|
||
|
||
@noindent
|
||
However, unless you are sure the program is correct, you should
|
||
expect to need to debug it. So use this command,
|
||
|
||
@example
|
||
gdb fib1
|
||
@end example
|
||
|
||
@noindent
|
||
which starts the GDB debugger (@pxref{Sample Session, Sample Session,
|
||
A Sample GDB Session, gdb, Debugging with GDB}) so you can run and
|
||
debug the executable program @code{fib1}.
|
||
|
||
Richard Stallman's advice, from personal experience, is to turn to the
|
||
debugger as soon as you can reproduce the problem. Don't try to avoid
|
||
it by using other methods instead---occasionally they are shortcuts,
|
||
but usually they waste an unbounded amount of time. With the
|
||
debugger, you will surely find the bug in a reasonable time; overall,
|
||
you will get your work done faster. The sooner you get serious and
|
||
start the debugger, the sooner you are likely to find the bug.
|
||
|
||
@xref{Compilation}, for an introduction to compiling more complex
|
||
programs which consist of more than one source file.
|
||
|
||
@node Storage
|
||
@chapter Storage and Data
|
||
@cindex bytes
|
||
@cindex storage organization
|
||
@cindex memory organization
|
||
|
||
Storage in C programs is made up of units called @dfn{bytes}. A byte
|
||
is the smallest unit of storage that can be used in a first-class
|
||
manner.
|
||
|
||
On nearly all computers, a byte consists of 8 bits. There are a few
|
||
peculiar computers (mostly ``embedded controllers'' for very small
|
||
systems) where a byte is longer than that, but this manual does not
|
||
try to explain the peculiarity of those computers; we assume that a
|
||
byte is 8 bits.
|
||
|
||
Every C data type is made up of a certain number of bytes; that number
|
||
is the data type's @dfn{size}. @xref{Type Size}, for details. The
|
||
types @code{signed char} and @code{unsigned char} are one byte long;
|
||
use those types to operate on data byte by byte. @xref{Signed and
|
||
Unsigned Types}. You can refer to a series of consecutive bytes as an
|
||
array of @code{char} elements; that's what a character string looks
|
||
like in memory. @xref{String Constants}.
|
||
|
||
@node Beyond Integers
|
||
@chapter Beyond Integers
|
||
|
||
So far we've presented programs that operate on integers. In this
|
||
chapter we'll present examples of handling non-integral numbers and
|
||
arrays of numbers.
|
||
|
||
@menu
|
||
* Float Example:: A function that uses floating-point numbers.
|
||
* Array Example:: A function that works with arrays.
|
||
* Array Example Call:: How to call that function.
|
||
* Array Example Variations:: Different ways to write the call example.
|
||
@end menu
|
||
|
||
@node Float Example
|
||
@section An Example with Non-Integer Numbers
|
||
@cindex floating point example
|
||
|
||
Here's a function that operates on and returns @dfn{floating point}
|
||
numbers that don't have to be integers. Floating point represents a
|
||
number as a fraction together with a power of 2. (For more detail,
|
||
@pxref{Floating-Point Data Types}.) This example calculates the
|
||
average of three floating point numbers that are passed to it as
|
||
arguments:
|
||
|
||
@example
|
||
double
|
||
average_of_three (double a, double b, double c)
|
||
@{
|
||
return (a + b + c) / 3;
|
||
@}
|
||
@end example
|
||
|
||
The values of the parameter @var{a}, @var{b} and @var{c} do not have to be
|
||
integers, and even when they happen to be integers, most likely their
|
||
average is not an integer.
|
||
|
||
@code{double} is the usual data type in C for calculations on
|
||
floating-point numbers.
|
||
|
||
To print a @code{double} with @code{printf}, we must use @samp{%f}
|
||
instead of @samp{%d}:
|
||
|
||
@example
|
||
printf ("Average is %f\n",
|
||
average_of_three (1.1, 9.8, 3.62));
|
||
@end example
|
||
|
||
The code that calls @code{printf} must pass a @code{double} for
|
||
printing with @samp{%f} and an @code{int} for printing with @samp{%d}.
|
||
If the argument has the wrong type, @code{printf} will produce meaningless
|
||
output.
|
||
|
||
Here's a complete program that computes the average of three
|
||
specific numbers and prints the result:
|
||
|
||
@example
|
||
double
|
||
average_of_three (double a, double b, double c)
|
||
@{
|
||
return (a + b + c) / 3;
|
||
@}
|
||
|
||
int
|
||
main (void)
|
||
@{
|
||
printf ("Average is %f\n",
|
||
average_of_three (1.1, 9.8, 3.62));
|
||
return 0;
|
||
@}
|
||
@end example
|
||
|
||
From now on we will not present examples of calls to @code{main}.
|
||
Instead we encourage you to write them for yourself when you want
|
||
to test executing some code.
|
||
|
||
@node Array Example
|
||
@section An Example with Arrays
|
||
@cindex array example
|
||
|
||
A function to take the average of three numbers is very specific and
|
||
limited. A more general function would take the average of any number
|
||
of numbers. That requires passing the numbers in an array. An array
|
||
is an object in memory that contains a series of values of the same
|
||
data type. This chapter presents the basic concepts and use of arrays
|
||
through an example; for the full explanation, see @ref{Arrays}.
|
||
|
||
Here's a function definition to take the average of several
|
||
floating-point numbers, passed as type @code{double}. The first
|
||
parameter, @code{length}, specifies how many numbers are passed. The
|
||
second parameter, @code{input_data}, is an array that holds those
|
||
numbers.
|
||
|
||
@example
|
||
double
|
||
avg_of_double (int length, double input_data[])
|
||
@{
|
||
double sum = 0;
|
||
int i;
|
||
|
||
for (i = 0; i < length; i++)
|
||
sum = sum + input_data[i];
|
||
|
||
return sum / length;
|
||
@}
|
||
@end example
|
||
|
||
This introduces the expression to refer to an element of an array:
|
||
@code{input_data[i]} means the element at index @code{i} in
|
||
@code{input_data}. The index of the element can be any expression
|
||
with an integer value; in this case, the expression is @code{i}.
|
||
@xref{Accessing Array Elements}.
|
||
|
||
@cindex zero-origin indexing
|
||
The lowest valid index in an array is 0, @emph{not} 1, and the highest
|
||
valid index is one less than the number of elements. (This is known
|
||
as @dfn{zero-origin indexing}.)
|
||
|
||
This example also introduces the way to declare that a function
|
||
parameter is an array. Such declarations are modeled after the syntax
|
||
for an element of the array. Just as @code{double foo} declares that
|
||
@code{foo} is of type @code{double}, @code{double input_data[]}
|
||
declares that each element of @code{input_data} is of type
|
||
@code{double}. Therefore, @code{input_data} itself has type ``array
|
||
of @code{double}.''
|
||
|
||
When declaring an array parameter, it's not necessary to say how long
|
||
the array is. In this case, the parameter @code{input_data} has no
|
||
length information. That's why the function needs another parameter,
|
||
@code{length}, for the caller to provide that information to the
|
||
function @code{avg_of_double}.
|
||
|
||
@node Array Example Call
|
||
@section Calling the Array Example
|
||
|
||
To call the function @code{avg_of_double} requires making an
|
||
array and then passing it as an argument. Here is an example.
|
||
|
||
@example
|
||
@{
|
||
/* @r{The array of values to compute the average of.} */
|
||
double nums_to_average[5];
|
||
/* @r{The average, once we compute it.} */
|
||
double average;
|
||
|
||
/* @r{Fill in elements of @code{nums_to_average}.} */
|
||
|
||
nums_to_average[0] = 58.7;
|
||
nums_to_average[1] = 5.1;
|
||
nums_to_average[2] = 7.7;
|
||
nums_to_average[3] = 105.2;
|
||
nums_to_average[4] = -3.14159;
|
||
|
||
average = avg_of_double (5, nums_to_average);
|
||
|
||
/* @r{@dots{}now make use of @code{average}@dots{}} */
|
||
@}
|
||
@end example
|
||
|
||
This shows an array subscripting expression again, this time
|
||
on the left side of an assignment, storing a value into an
|
||
element of an array.
|
||
|
||
It also shows how to declare a local variable that is an array:
|
||
@code{double nums_to_average[5];}. Since this declaration allocates the
|
||
space for the array, it needs to know the array's length. You can
|
||
specify the length with any expression whose value is an integer, but
|
||
in this declaration the length is a constant, the integer 5.
|
||
|
||
The name of the array, when used by itself as an expression, stands
|
||
for the address of the array's data, and that's what gets passed to
|
||
the function @code{avg_of_double} in @code{avg_of_double (5,
|
||
nums_to_average)}.
|
||
|
||
We can make the code easier to maintain by avoiding the need to write
|
||
5, the array length, when calling @code{avg_of_double}. That way, if
|
||
we change the array to include more elements, we won't have to change
|
||
that call. One way to do this is with the @code{sizeof} operator:
|
||
|
||
@example
|
||
average = avg_of_double ((sizeof (nums_to_average)
|
||
/ sizeof (nums_to_average[0])),
|
||
nums_to_average);
|
||
@end example
|
||
|
||
This computes the number of elements in @code{nums_to_average} by dividing
|
||
its total size by the size of one element. @xref{Type Size}, for more
|
||
details of using @code{sizeof}.
|
||
|
||
We don't show in this example what happens after storing the result of
|
||
@code{avg_of_double} in the variable @code{average}. Presumably
|
||
more code would follow that uses that result somehow. (Why compute
|
||
the average and not use it?) But that isn't part of this topic.
|
||
|
||
@node Array Example Variations
|
||
@section Variations for Array Example
|
||
|
||
The code to call @code{avg_of_double} has two declarations that
|
||
start with the same data type:
|
||
|
||
@example
|
||
/* @r{The array of values to average.} */
|
||
double nums_to_average[5];
|
||
/* @r{The average, once we compute it.} */
|
||
double average;
|
||
@end example
|
||
|
||
In C, you can combine the two, like this:
|
||
|
||
@example
|
||
double nums_to_average[5], average;
|
||
@end example
|
||
|
||
This declares @code{nums_to_average} so each of its elements is a
|
||
@code{double}, and @code{average} itself as a
|
||
@code{double}.
|
||
|
||
However, while you @emph{can} combine them, that doesn't mean you
|
||
@emph{should}. If it is useful to write comments about the variables,
|
||
and usually it is, then it's clearer to keep the declarations separate
|
||
so you can put a comment on each one. That also helps with using
|
||
textual tools to find occurrences of a variable in source files.
|
||
|
||
We set all of the elements of the array @code{nums_to_average} with
|
||
assignments, but it is more convenient to use an initializer in the
|
||
declaration:
|
||
|
||
@example
|
||
@{
|
||
/* @r{The array of values to average.} */
|
||
double nums_to_average[]
|
||
= @{ 58.7, 5.1, 7.7, 105.2, -3.14159 @};
|
||
|
||
/* @r{The average, once we compute it.} */
|
||
average = avg_of_double ((sizeof (nums_to_average)
|
||
/ sizeof (nums_to_average[0])),
|
||
nums_to_average);
|
||
|
||
/* @r{@dots{}now make use of @code{average}@dots{}} */
|
||
@}
|
||
@end example
|
||
|
||
The array initializer is a comma-separated list of values, delimited
|
||
by braces. @xref{Initializers}.
|
||
|
||
Note that the declaration does not specify a size for
|
||
@code{nums_to_average}, so the size is determined from the
|
||
initializer. There are five values in the initializer, so
|
||
@code{nums_to_average} gets length 5. If we add another element to
|
||
the initializer, @code{nums_to_average} will have six elements.
|
||
|
||
Because the code computes the number of elements from the size of
|
||
the array, using @code{sizeof}, the program will operate on all the
|
||
elements in the initializer, regardless of how many those are.
|
||
|
||
@node Lexical Syntax
|
||
@chapter Lexical Syntax
|
||
@cindex lexical syntax
|
||
@cindex token
|
||
|
||
To start the full description of the C language, we explain the
|
||
lexical syntax and lexical units of C code. The lexical units of a
|
||
programming language are known as @dfn{tokens}. This chapter covers
|
||
all the tokens of C except for constants, which are covered in a later
|
||
chapter (@pxref{Constants}). One vital kind of token is the
|
||
@dfn{identifier} (@pxref{Identifiers}), which is used for names of any
|
||
kind.
|
||
|
||
@menu
|
||
* English:: Write programs in English!
|
||
* Characters:: The characters allowed in C programs.
|
||
* Whitespace:: The particulars of whitespace characters.
|
||
* Comments:: How to include comments in C code.
|
||
* Identifiers:: How to form identifiers (names).
|
||
* Operators/Punctuation:: Characters used as operators or punctuation.
|
||
* Line Continuation:: Splitting one line into multiple lines.
|
||
@end menu
|
||
|
||
@node English
|
||
@section Write Programs in English!
|
||
|
||
In principle, you can write the function and variable names in a
|
||
program, and the comments, in any human language. C allows any kinds
|
||
of Unicode characters in comments, and you can put them into
|
||
identifiers with a special prefix (@pxref{Unicode Character Codes}).
|
||
However, to enable programmers in all countries to understand and
|
||
develop the program, it is best under today's circumstances to write
|
||
all identifiers and comments in English.
|
||
|
||
English is the common language of programmers; in all countries,
|
||
programmers generally learn English. If names and comments in a
|
||
program are written in English, most programmers in Bangladesh,
|
||
Belgium, Bolivia, Brazil, Bulgaria and Burundi can understand them.
|
||
In all those countries, most programmers can speak English, or at least
|
||
read it, but they do not read each other's languages at all. In
|
||
India, with so many languages, two programmers may have no common
|
||
language other than English.
|
||
|
||
If you don't feel confident in writing English, do the best you can,
|
||
and follow each English comment with a version in a language you
|
||
write better; add a note asking others to translate that to English.
|
||
Someone will eventually do that.
|
||
|
||
The program's user interface is a different matter. We don't need to
|
||
choose one language for that; it is easy to support multiple languages
|
||
and let each user choose the language for display. This requires writing
|
||
the program to support localization of its interface. (The
|
||
@code{gettext} package exists to support this; @pxref{Message
|
||
Translation, The GNU C Library, , libc, The GNU C Library Reference
|
||
Manual}.) Then a community-based translation effort can provide
|
||
support for all the languages users want to use.
|
||
|
||
@node Characters
|
||
@section Characters
|
||
@cindex character set
|
||
@cindex Unicode
|
||
|
||
@c ??? How to express ¶?
|
||
|
||
GNU C source files are usually written in the
|
||
@url{https://en.wikipedia.org/wiki/ASCII,,ASCII} character set, which
|
||
was defined in the 1960s for English. However, they can also include
|
||
Unicode characters represented in the
|
||
@url{https://en.wikipedia.org/wiki/UTF-8,,UTF-8} multibyte encoding.
|
||
This makes it possible to represent accented letters such as @samp{á},
|
||
as well as other scripts such as Arabic, Chinese, Cyrillic, Hebrew,
|
||
Japanese, and Korean.@footnote{On some obscure systems, GNU C uses
|
||
UTF-EBCDIC instead of UTF-8, but that is not worth describing in this
|
||
manual.}
|
||
|
||
In C source code, non-ASCII characters are valid in comments, in wide
|
||
character constants (@pxref{Wide Character Constants}), and in string
|
||
constants (@pxref{String Constants}).
|
||
|
||
@c ??? valid in identifiers?
|
||
Another way to specify non-ASCII characters in constants (character or
|
||
string) and identifiers is with an escape sequence starting with
|
||
backslash, specifying the intended Unicode character. (@xref{Unicode
|
||
Character Codes}.) This specifies non-ASCII characters without
|
||
putting a real non-ASCII character in the source file itself.
|
||
|
||
C accepts two-character aliases called @dfn{digraphs} for certain
|
||
characters. @xref{Digraphs}.
|
||
|
||
@node Whitespace
|
||
@section Whitespace
|
||
@cindex whitespace characters in source files
|
||
@cindex space character in source
|
||
@cindex tab character in source
|
||
@cindex formfeed in source
|
||
@cindex linefeed in source
|
||
@cindex newline in source
|
||
@cindex carriage return in source
|
||
@cindex vertical tab in source
|
||
|
||
Whitespace means characters that exist in a file but appear blank in a
|
||
printed listing of a file (or traditionally did appear blank, several
|
||
decades ago). The C language requires whitespace in order to separate
|
||
two consecutive identifiers, or to separate an identifier from a
|
||
numeric constant. Other than that, and a few special situations
|
||
described later, whitespace is optional; you can put it in when you
|
||
wish, to make the code easier to read.
|
||
|
||
Space and tab in C code are treated as whitespace characters. So are
|
||
line breaks. You can represent a line break with the newline
|
||
character (also called @dfn{linefeed} or LF), CR (carriage return), or
|
||
the CRLF sequence (two characters: carriage return followed by a
|
||
newline character).
|
||
|
||
The @dfn{formfeed} character, Control-L, was traditionally used to
|
||
divide a file into pages. It is still used this way in source code,
|
||
and the tools that generate nice printouts of source code still start
|
||
a new page after each ``formfeed'' character. Dividing code into
|
||
pages separated by formfeed characters is a good way to break it up
|
||
into comprehensible pieces and show other programmers where they start
|
||
and end.
|
||
|
||
The @dfn{vertical tab} character, Control-K, was traditionally used to
|
||
make printing advance down to the next section of a page. We know of
|
||
no particular reason to use it in source code, but it is still
|
||
accepted as whitespace in C.
|
||
|
||
Comments are also syntactically equivalent to whitespace.
|
||
@ifinfo
|
||
@xref{Comments}.
|
||
@end ifinfo
|
||
|
||
@node Comments
|
||
@section Comments
|
||
@cindex comments
|
||
|
||
A comment encapsulates text that has no effect on the program's
|
||
execution or meaning.
|
||
|
||
The purpose of comments is to explain the code to people that read it.
|
||
Writing good comments for your code is tremendously important---they
|
||
should provide background information that helps programmers
|
||
understand the reasons why the code is written the way it is. You,
|
||
returning to the code six months from now, will need the help of these
|
||
comments to remember why you wrote it this way.
|
||
|
||
Outdated comments that become incorrect are counterproductive, so part
|
||
of the software developer's responsibility is to update comments as
|
||
needed to correspond with changes to the program code.
|
||
|
||
C allows two kinds of comment syntax, the traditional style and the
|
||
C@t{++} style. A traditional C comment starts with @samp{/*} and ends
|
||
with @samp{*/}. For instance,
|
||
|
||
@example
|
||
/* @r{This is a comment in traditional C syntax.} */
|
||
@end example
|
||
|
||
A traditional comment can contain @samp{/*}, but these delimiters do
|
||
not nest as pairs. The first @samp{*/} ends the comment regardless of
|
||
whether it contains @samp{/*} sequences.
|
||
|
||
@example
|
||
/* @r{This} /* @r{is a comment} */ But this is not! */
|
||
@end example
|
||
|
||
A @dfn{line comment} starts with @samp{//} and ends at the end of the line.
|
||
For instance,
|
||
|
||
@example
|
||
// @r{This is a comment in C@t{++} style.}
|
||
@end example
|
||
|
||
Line comments do nest, in effect, because @samp{//} inside a line
|
||
comment is part of that comment:
|
||
|
||
@example
|
||
// @r{this whole line is} // @r{one comment}
|
||
This is code, not comment.
|
||
@end example
|
||
|
||
It is safe to put line comments inside block comments, or vice versa.
|
||
|
||
@example
|
||
@group
|
||
/* @r{traditional comment}
|
||
// @r{contains line comment}
|
||
@r{more traditional comment}
|
||
*/ text here is not a comment
|
||
|
||
// @r{line comment} /* @r{contains traditional comment} */
|
||
@end group
|
||
@end example
|
||
|
||
But beware of commenting out one end of a traditional comment with a line
|
||
comment. The delimiter @samp{/*} doesn't start a comment if it occurs
|
||
inside an already-started comment.
|
||
|
||
@example
|
||
@group
|
||
// @r{line comment} /* @r{That would ordinarily begin a block comment.}
|
||
Oops! The line comment has ended;
|
||
this isn't a comment any more. */
|
||
@end group
|
||
@end example
|
||
|
||
Comments are not recognized within string constants. @t{@w{"/* blah
|
||
*/"}} is the string constant @samp{@w{/* blah */}}, not an empty
|
||
string.
|
||
|
||
In this manual we show the text in comments in a variable-width font,
|
||
for readability, but this font distinction does not exist in source
|
||
files.
|
||
|
||
A comment is syntactically equivalent to whitespace, so it always
|
||
separates tokens. Thus,
|
||
|
||
@example
|
||
@group
|
||
int/* @r{comment} */foo;
|
||
@r{is equivalent to}
|
||
int foo;
|
||
@end group
|
||
@end example
|
||
|
||
@noindent
|
||
but clean code always uses real whitespace to separate the comment
|
||
visually from surrounding code.
|
||
|
||
@node Identifiers
|
||
@section Identifiers
|
||
@cindex identifiers
|
||
|
||
An @dfn{identifier} (name) in C is a sequence of letters and digits,
|
||
as well as @samp{_}, that does not start with a digit. Most C compilers
|
||
also allow @samp{$}; GNU C allows it. An identifier can be as long as
|
||
you like; for example,
|
||
|
||
@example
|
||
int anti_dis_establishment_arian_ism;
|
||
@end example
|
||
|
||
@cindex case of letters in identifiers
|
||
Letters in identifiers are case-sensitive in C; thus, @code{a}
|
||
and @code{A} are two different identifiers.
|
||
|
||
@cindex keyword
|
||
@cindex reserved words
|
||
Identifiers in C are used as variable names, function names, typedef
|
||
names, enumeration constants, type tags, field names, and labels.
|
||
Certain identifiers in C are @dfn{keywords}, which means they have
|
||
specific syntactic meanings. Keywords in C are @dfn{reserved words},
|
||
meaning you cannot use them in any other way. For instance, you can't
|
||
define a variable or function named @code{return} or @code{if}.
|
||
|
||
You can also include other characters, even non-ASCII characters, in
|
||
identifiers by writing their Unicode character names, which start with
|
||
@samp{\u} or @samp{\U}, in the identifier name. @xref{Unicode
|
||
Character Codes}. However, it is usually a bad idea to use non-ASCII
|
||
characters in identifiers, and when the names are written in English,
|
||
they never need non-ASCII characters. @xref{English}.
|
||
|
||
As stated above, whitespace is required to separate two consecutive
|
||
identifiers, or to separate an identifier from a preceding or
|
||
following numeric constant.
|
||
|
||
@node Operators/Punctuation
|
||
@section Operators and Punctuation
|
||
@cindex operators
|
||
@cindex punctuation
|
||
|
||
Here we describe the lexical syntax of operators and punctuation in C.
|
||
The specific operators of C and their meanings are presented in
|
||
subsequent chapters.
|
||
|
||
Some characters that are generally considered punctuation have a
|
||
different sort of meaning in the C language. C uses double-quote
|
||
@samp{"} to delimit string constants (@pxref{String Constants}) and
|
||
@samp{'} to delimit character constants (@pxref{Character Constants}).
|
||
The characters @samp{$} and @samp{_} can be part of an identifier or a
|
||
keyword.
|
||
|
||
Most operators in C consist of one or two characters that can't be
|
||
used in identifiers. The characters used for such operators in C are
|
||
@samp{!~^&|*/%+-=<>,.?:}. (C preprocessing uses @dfn{preprocessing
|
||
operators}, based on @samp{#}, which are entirely different from
|
||
these operators; @ref{Preprocessing}.)
|
||
|
||
Some operators are a single character. For instance, @samp{-} is the
|
||
operator for negation (with one operand) and the operator for
|
||
subtraction (with two operands).
|
||
|
||
Some operators are two characters. For example, @samp{++} is the
|
||
increment operator. Recognition of multicharacter operators works by
|
||
reading and grouping as many successive characters as can
|
||
constitute one operator, and making them one token.
|
||
|
||
For instance, the character sequence @samp{++} is always interpreted
|
||
as the increment operator; therefore, if we want to write two
|
||
consecutive instances of the operator @samp{+}, we must separate them
|
||
with a space so that they do not combine as one token. Applying the
|
||
same rule, @code{a+++++b} is always tokenized as @code{@w{a++ ++ +
|
||
b}}, not as @code{@w{a++ + ++b}}, even though the latter could be part
|
||
of a valid C program and the former could not (since @code{a++}
|
||
is not an lvalue and thus can't be the operand of @code{++}).
|
||
|
||
A few C operators are keywords rather than special characters. They
|
||
include @code{sizeof} (@pxref{Type Size}) and @code{_Alignof}
|
||
(@pxref{Type Alignment}).
|
||
|
||
The characters @samp{;@{@}[]()} are used for punctuation and grouping.
|
||
Semicolon (@samp{;}) ends a statement. Braces (@samp{@{} and
|
||
@samp{@}}) begin and end a block at the statement level
|
||
(@pxref{Blocks}), and surround the initializer (@pxref{Initializers})
|
||
for a variable with multiple elements or fields (such as arrays or
|
||
structures).
|
||
|
||
Square brackets (@samp{[} and @samp{]}) do array indexing, as in
|
||
@code{array[5]}.
|
||
|
||
Parentheses are used in expressions for explicit nesting of
|
||
expressions (@pxref{Basic Arithmetic}), around the parameter
|
||
declarations in a function declaration or definition, and around the
|
||
arguments in a function call, as in @code{printf ("Foo %d\n", i)}
|
||
(@pxref{Function Calls}). Several kinds of statements also use
|
||
parentheses as part of their syntax---for instance, @code{if}
|
||
statements, @code{for} statements, @code{while} statements, and
|
||
@code{switch} statements. @xref{if Statement}, and following
|
||
sections.
|
||
|
||
Parentheses are also required around the operand of the operator
|
||
keywords @code{sizeof} and @code{_Alignof} when the operand is a data
|
||
type rather than a value. @xref{Type Size}.
|
||
|
||
@node Line Continuation
|
||
@section Line Continuation
|
||
@cindex line continuation
|
||
@cindex continuation of lines
|
||
|
||
The sequence of a backslash and a newline is ignored absolutely
|
||
anywhere in a C program. This makes it possible to split a single
|
||
source line into multiple lines in the source file. GNU C tolerates
|
||
and ignores other whitespace between the backslash and the newline.
|
||
In particular, it always ignores a CR (carriage return) character
|
||
there, in case some text editor decided to end the line with the CRLF
|
||
sequence.
|
||
|
||
The main use of line continuation in C is for macro definitions that
|
||
would be inconveniently long for a single line (@pxref{Macros}).
|
||
|
||
It is possible to continue a line comment onto another line with
|
||
backslash-newline. You can put backslash-newline in the middle of an
|
||
identifier, even a keyword, or an operator. You can even split
|
||
@samp{/*}, @samp{*/}, and @samp{//} onto multiple lines with
|
||
backslash-newline. Here's an ugly example:
|
||
|
||
@example
|
||
@group
|
||
/\
|
||
*
|
||
*/ fo\
|
||
o +\
|
||
= 1\
|
||
0;
|
||
@end group
|
||
@end example
|
||
|
||
@noindent
|
||
That's equivalent to @samp{/* */ foo += 10;}.
|
||
|
||
Don't do those things in real programs, since they make code hard to
|
||
read.
|
||
|
||
@strong{Note:} For the sake of using certain tools on the source code, it is
|
||
wise to end every source file with a newline character which is not
|
||
preceded by a backslash, so that it really ends the last line.
|
||
|
||
@node Arithmetic
|
||
@chapter Arithmetic
|
||
@cindex arithmetic operators
|
||
@cindex operators, arithmetic
|
||
|
||
@c ??? Duplication with other sections -- get rid of that?
|
||
|
||
Arithmetic operators in C attempt to be as similar as possible to the
|
||
abstract arithmetic operations, but it is impossible to do this
|
||
perfectly. Numbers in a computer have a finite range of possible
|
||
values, and non-integer values have a limit on their possible
|
||
accuracy. Nonetheless, except when results are out of range, you will
|
||
encounter no surprises in using @samp{+} for addition, @samp{-} for
|
||
subtraction, and @samp{*} for multiplication.
|
||
|
||
Each C operator has a @dfn{precedence}, which is its rank in the
|
||
grammatical order of the various operators. The operators with the
|
||
highest precedence grab adjoining operands first; these expressions
|
||
then become operands for operators of lower precedence. We give some
|
||
information about precedence of operators in this chapter where we
|
||
describe the operators; for the full explanation, see @ref{Binary
|
||
Operator Grammar}.
|
||
|
||
The arithmetic operators always @dfn{promote} their operands before
|
||
operating on them. This means converting narrow integer data types to
|
||
a wider data type (@pxref{Operand Promotions}). If you are just
|
||
learning C, don't worry about this yet.
|
||
|
||
Given two operands that have different types, most arithmetic
|
||
operations convert them both to their @dfn{common type}. For
|
||
instance, if one is @code{int} and the other is @code{double}, the
|
||
common type is @code{double}. (That's because @code{double} can
|
||
represent all the values that an @code{int} can hold, but not vice
|
||
versa.) For the full details, see @ref{Common Type}.
|
||
|
||
@menu
|
||
* Basic Arithmetic:: Addition, subtraction, multiplication,
|
||
and division.
|
||
* Integer Arithmetic:: How C performs arithmetic with integer values.
|
||
* Integer Overflow:: When an integer value exceeds the range
|
||
of its type.
|
||
* Mixed Mode:: Calculating with both integer values
|
||
and floating-point values.
|
||
* Division and Remainder:: How integer division works.
|
||
* Numeric Comparisons:: Comparing numeric values for equality or order.
|
||
* Shift Operations:: Shift integer bits left or right.
|
||
* Bitwise Operations:: Bitwise conjunction, disjunction, negation.
|
||
@end menu
|
||
|
||
@node Basic Arithmetic
|
||
@section Basic Arithmetic
|
||
@cindex addition operator
|
||
@cindex subtraction operator
|
||
@cindex multiplication operator
|
||
@cindex division operator
|
||
@cindex negation operator
|
||
@cindex operator, addition
|
||
@cindex operator, subtraction
|
||
@cindex operator, multiplication
|
||
@cindex operator, division
|
||
@cindex operator, negation
|
||
|
||
Basic arithmetic in C is done with the usual binary operators of
|
||
algebra: addition (@samp{+}), subtraction (@samp{-}), multiplication
|
||
(@samp{*}) and division (@samp{/}). The unary operator @samp{-} is
|
||
used to change the sign of a number. The unary @code{+} operator also
|
||
exists; it yields its operand unaltered.
|
||
|
||
@samp{/} is the division operator, but dividing integers may not give
|
||
the result you expect. Its value is an integer, which is not equal to
|
||
the mathematical quotient when that is a fraction. Use @samp{%} to
|
||
get the corresponding integer remainder when necessary.
|
||
@xref{Division and Remainder}. Floating-point division yields a value
|
||
as close as possible to the mathematical quotient.
|
||
|
||
These operators use algebraic syntax with the usual algebraic
|
||
precedence rule (@pxref{Binary Operator Grammar}) that multiplication
|
||
and division are done before addition and subtraction, but you can use
|
||
parentheses to explicitly specify how the operators nest. They are
|
||
left-associative (@pxref{Associativity and Ordering}). Thus,
|
||
|
||
@example
|
||
-a + b - c + d * e / f
|
||
@end example
|
||
|
||
@noindent
|
||
is equivalent to
|
||
|
||
@example
|
||
(((-a) + b) - c) + ((d * e) / f)
|
||
@end example
|
||
|
||
@node Integer Arithmetic
|
||
@section Integer Arithmetic
|
||
@cindex integer arithmetic
|
||
|
||
Each of the basic arithmetic operations in C has two variants for
|
||
integers: @dfn{signed} and @dfn{unsigned}. The choice is determined
|
||
by the data types of their operands.
|
||
|
||
Each integer data type in C is either @dfn{signed} or @dfn{unsigned}.
|
||
A signed type can hold a range of positive and negative numbers, with
|
||
zero near the middle of the range. An unsigned type can hold only
|
||
nonnegative numbers; its range starts with zero and runs upward.
|
||
|
||
The most basic integer types are @code{int}, which normally can hold
|
||
numbers from @minus{}2,147,483,648 to 2,147,483,647, and @code{unsigned
|
||
int}, which normally can hold numbers from 0 to 4,294,967,295. (This
|
||
assumes @code{int} is 32 bits wide, always true for GNU C on real
|
||
computers but not always on embedded controllers.) @xref{Integer
|
||
Types}, for full information about integer types.
|
||
|
||
When a basic arithmetic operation is given two signed operands, it
|
||
does signed arithmetic. Given two unsigned operands, it does
|
||
unsigned arithmetic.
|
||
|
||
If one operand is @code{unsigned int} and the other is @code{int}, the
|
||
operator treats them both as unsigned. More generally, the common
|
||
type of the operands determines whether the operation is signed or
|
||
not. @xref{Common Type}.
|
||
|
||
Printing the results of unsigned arithmetic with @code{printf} using
|
||
@samp{%d} can produce surprising results for values far away from
|
||
zero. Even though the rules above say that the computation was done
|
||
with unsigned arithmetic, the printed result may appear to be signed!
|
||
|
||
The explanation is that the bit pattern resulting from addition,
|
||
subtraction or multiplication is actually the same for signed and
|
||
unsigned operations. The difference is only in the data type of the
|
||
result, which affects the @emph{interpretation} of the result bit pattern,
|
||
and whether the arithmetic operation can overflow (see the next section).
|
||
|
||
But @samp{%d} doesn't know its argument's data type. It sees only the
|
||
value's bit pattern, and it is defined to interpret that as
|
||
@code{signed int}. To print it as unsigned requires using @samp{%u}
|
||
instead of @samp{%d}. @xref{Formatted Output, The GNU C Library, ,
|
||
libc, The GNU C Library Reference Manual}.
|
||
|
||
Arithmetic in C never operates directly on narrow integer types (those
|
||
with fewer bits than @code{int}; @ref{Narrow Integers}). Instead it
|
||
``promotes'' them to @code{int}. @xref{Operand Promotions}.
|
||
|
||
@node Integer Overflow
|
||
@section Integer Overflow
|
||
@cindex integer overflow
|
||
@cindex overflow, integer
|
||
|
||
When the mathematical value of an arithmetic operation doesn't fit in
|
||
the range of the data type in use, that's called @dfn{overflow}.
|
||
When it happens in integer arithmetic, it is @dfn{integer overflow}.
|
||
|
||
Integer overflow happens only in arithmetic operations. Type conversion
|
||
operations, by definition, do not cause overflow, not even when the
|
||
result can't fit in its new type. @xref{Integer Conversion}.
|
||
|
||
Signed numbers use two's-complement representation, in which the most
|
||
negative number lacks a positive counterpart (@pxref{Integers in
|
||
Depth}). Thus, the unary @samp{-} operator on a signed integer can
|
||
overflow.
|
||
|
||
@menu
|
||
* Unsigned Overflow:: Overflow in unsigned integer arithmetic.
|
||
* Signed Overflow:: Overflow in signed integer arithmetic.
|
||
@end menu
|
||
|
||
@node Unsigned Overflow
|
||
@subsection Overflow with Unsigned Integers
|
||
|
||
Unsigned arithmetic in C ignores overflow; it produces the true result
|
||
modulo the @var{n}th power of 2, where @var{n} is the number of bits
|
||
in the data type. We say it ``truncates'' the true result to the
|
||
lowest @var{n} bits.
|
||
|
||
A true result that is negative, when taken modulo the @var{n}th power
|
||
of 2, yields a positive number. For instance,
|
||
|
||
@example
|
||
unsigned int x = 1;
|
||
unsigned int y;
|
||
|
||
y = -x;
|
||
@end example
|
||
|
||
@noindent
|
||
causes overflow because the negative number @minus{}1 can't be stored
|
||
in an unsigned type. The actual result, which is @minus{}1 modulo the
|
||
@var{n}th power of 2, is one less than the @var{n}th power of 2. That
|
||
is the largest value that the unsigned data type can store. For a
|
||
32-bit @code{unsigned int}, the value is 4,294,967,295. @xref{Maximum
|
||
and Minimum Values}.
|
||
|
||
Adding that number to itself, as here,
|
||
|
||
@example
|
||
unsigned int z;
|
||
|
||
z = y + y;
|
||
@end example
|
||
|
||
@noindent
|
||
ought to yield 8,489,934,590; however, that is again too large to fit,
|
||
so overflow truncates the value to 4,294,967,294. If that were a
|
||
signed integer, it would mean @minus{}2, which (not by coincidence)
|
||
equals @minus{}1 + @minus{}1.
|
||
|
||
@node Signed Overflow
|
||
@subsection Overflow with Signed Integers
|
||
@cindex compiler options for integer overflow
|
||
@cindex integer overflow, compiler options
|
||
@cindex overflow, compiler options
|
||
|
||
For signed integers, the result of overflow in C is @emph{in
|
||
principle} undefined, meaning that anything whatsoever could happen.
|
||
Therefore, C compilers can do optimizations that treat the overflow
|
||
case with total unconcern. (Since the result of overflow is undefined
|
||
in principle, one cannot claim that these optimizations are
|
||
erroneous.)
|
||
|
||
@strong{Watch out:} These optimizations can do surprising things. For
|
||
instance,
|
||
|
||
@example
|
||
int i;
|
||
@r{@dots{}}
|
||
if (i < i + 1)
|
||
x = 5;
|
||
@end example
|
||
|
||
@noindent
|
||
could be optimized to do the assignment unconditionally, because the
|
||
@code{if}-condition is always true if @code{i + 1} does not overflow.
|
||
|
||
GCC offers compiler options to control handling signed integer
|
||
overflow. These options operate per module; that is, each module
|
||
behaves according to the options it was compiled with.
|
||
|
||
These two options specify particular ways to handle signed integer
|
||
overflow, other than the default way:
|
||
|
||
@table @option
|
||
@item -fwrapv
|
||
Make signed integer operations well-defined, like unsigned integer
|
||
operations: they produce the @var{n} low-order bits of the true
|
||
result. The highest of those @var{n} bits is the sign bit of the
|
||
result. With @option{-fwrapv}, these out-of-range operations are not
|
||
considered overflow, so (strictly speaking) integer overflow never
|
||
happens.
|
||
|
||
The option @option{-fwrapv} enables some optimizations based on the
|
||
defined values of out-of-range results. In GCC 8, it disables
|
||
optimizations that are based on assuming signed integer operations
|
||
will not overflow.
|
||
|
||
@item -ftrapv
|
||
Generate a signal @code{SIGFPE} when signed integer overflow occurs.
|
||
This terminates the program unless the program handles the signal.
|
||
@xref{Signals}.
|
||
@end table
|
||
|
||
One other option is useful for finding where overflow occurs:
|
||
|
||
@ignore
|
||
@item -fno-strict-overflow
|
||
Disable optimizations that are based on assuming signed integer
|
||
operations will not overflow.
|
||
@end ignore
|
||
|
||
@table @option
|
||
@item -fsanitize=signed-integer-overflow
|
||
Output a warning message at run time when signed integer overflow
|
||
occurs. This checks the @samp{+}, @samp{*}, and @samp{-} operators.
|
||
This takes priority over @option{-ftrapv}.
|
||
@end table
|
||
|
||
@node Mixed Mode
|
||
@section Mixed-Mode Arithmetic
|
||
|
||
Mixing integers and floating-point numbers in a basic arithmetic
|
||
operation converts the integers automatically to floating point.
|
||
In most cases, this gives exactly the desired results.
|
||
But sometimes it matters precisely where the conversion occurs.
|
||
|
||
If @code{i} and @code{j} are integers, @code{(i + j) * 2.0} adds them
|
||
as an integer, then converts the sum to floating point for the
|
||
multiplication. If the addition causes an overflow, that is not
|
||
equivalent to converting each integer to floating point and then
|
||
adding the two floating point numbers. You can get the latter result
|
||
by explicitly converting the integers, as in @code{((double) i +
|
||
(double) j) * 2.0}. @xref{Explicit Type Conversion}.
|
||
|
||
@c Eggert's report
|
||
Adding or multiplying several values, including some integers and some
|
||
floating point, performs the operations left to right. Thus, @code{3.0 +
|
||
i + j} converts @code{i} to floating point, then adds 3.0, then
|
||
converts @code{j} to floating point and adds that. You can specify a
|
||
different order using parentheses: @code{3.0 + (i + j)} adds @code{i}
|
||
and @code{j} first and then adds that sum (converted to floating
|
||
point) to 3.0. In this respect, C differs from other languages, such
|
||
as Fortran.
|
||
|
||
@node Division and Remainder
|
||
@section Division and Remainder
|
||
@cindex remainder operator
|
||
@cindex modulus
|
||
@cindex operator, remainder
|
||
|
||
Division of integers in C rounds the result to an integer. The result
|
||
is always rounded towards zero.
|
||
|
||
@example
|
||
16 / 3 @result{} 5
|
||
-16 / 3 @result{} -5
|
||
16 / -3 @result{} -5
|
||
-16 / -3 @result{} 5
|
||
@end example
|
||
|
||
@noindent
|
||
To get the corresponding remainder, use the @samp{%} operator:
|
||
|
||
@example
|
||
16 % 3 @result{} 1
|
||
-16 % 3 @result{} -1
|
||
16 % -3 @result{} 1
|
||
-16 % -3 @result{} -1
|
||
@end example
|
||
|
||
@noindent
|
||
@samp{%} has the same operator precedence as @samp{/} and @samp{*}.
|
||
|
||
From the rounded quotient and the remainder, you can reconstruct
|
||
the dividend, like this:
|
||
|
||
@example
|
||
int
|
||
original_dividend (int divisor, int quotient, int remainder)
|
||
@{
|
||
return divisor * quotient + remainder;
|
||
@}
|
||
@end example
|
||
|
||
To do unrounded division, use floating point. If only one operand is
|
||
floating point, @samp{/} converts the other operand to floating
|
||
point.
|
||
|
||
@example
|
||
16.0 / 3 @result{} 5.333333333333333
|
||
16 / 3.0 @result{} 5.333333333333333
|
||
16.0 / 3.0 @result{} 5.333333333333333
|
||
16 / 3 @result{} 5
|
||
@end example
|
||
|
||
The remainder operator @samp{%} is not allowed for floating-point
|
||
operands, because it is not needed. The concept of remainder makes
|
||
sense for integers because the result of division of integers has to
|
||
be an integer. For floating point, the result of division is a
|
||
floating-point number, in other words a fraction, which will differ
|
||
from the exact result only by a very small amount.
|
||
|
||
There are functions in the standard C library to calculate remainders
|
||
from integral-values division of floating-point numbers.
|
||
@xref{Remainder Functions, The GNU C Library, , libc, The GNU C Library
|
||
Reference Manual}.
|
||
|
||
Integer division overflows in one specific case: dividing the smallest
|
||
negative value for the data type (@pxref{Maximum and Minimum Values})
|
||
by @minus{}1. That's because the correct result, which is the
|
||
corresponding positive number, does not fit (@pxref{Integer Overflow})
|
||
in the same number of bits. On some computers now in use, this always
|
||
causes a signal @code{SIGFPE} (@pxref{Signals}), the same behavior
|
||
that the option @option{-ftrapv} specifies (@pxref{Signed Overflow}).
|
||
|
||
Division by zero leads to unpredictable results---depending on the
|
||
type of computer, it might cause a signal @code{SIGFPE}, or it might
|
||
produce a numeric result.
|
||
|
||
@cindex division by zero
|
||
@cindex zero, division by
|
||
@strong{Watch out:} Make sure the program does not divide by zero. If
|
||
you can't prove that the divisor is not zero, test whether it is zero,
|
||
and skip the division if so.
|
||
|
||
@node Numeric Comparisons
|
||
@section Numeric Comparisons
|
||
@cindex numeric comparisons
|
||
@cindex comparisons
|
||
@cindex operators, comparison
|
||
@cindex equal operator
|
||
@cindex not-equal operator
|
||
@cindex less-than operator
|
||
@cindex greater-than operator
|
||
@cindex less-or-equal operator
|
||
@cindex greater-or-equal operator
|
||
@cindex operator, equal
|
||
@cindex operator, not-equal
|
||
@cindex operator, less-than
|
||
@cindex operator, greater-than
|
||
@cindex operator, less-or-equal
|
||
@cindex operator, greater-or-equal
|
||
@cindex truth value
|
||
|
||
There are two kinds of comparison operators: @dfn{equality} and
|
||
@dfn{ordering}. Equality comparisons test whether two expressions
|
||
have the same value. The result is a @dfn{truth value}: a number that
|
||
is 1 for ``true'' and 0 for ``false.''
|
||
|
||
@example
|
||
a == b /* @r{Test for equal.} */
|
||
a != b /* @r{Test for not equal.} */
|
||
@end example
|
||
|
||
The equality comparison is written @code{==} because plain @code{=}
|
||
is the assignment operator.
|
||
|
||
Ordering comparisons test which operand is greater or less. Their
|
||
results are truth values. These are the ordering comparisons of C:
|
||
|
||
@example
|
||
a < b /* @r{Test for less-than.} */
|
||
a > b /* @r{Test for greater-than.} */
|
||
a <= b /* @r{Test for less-than-or-equal.} */
|
||
a >= b /* @r{Test for greater-than-or-equal.} */
|
||
@end example
|
||
|
||
For any integers @code{a} and @code{b}, exactly one of the comparisons
|
||
@code{a < b}, @code{a == b} and @code{a > b} is true, just as in
|
||
mathematics. However, if @code{a} and @code{b} are special floating
|
||
point values (not ordinary numbers), all three can be false.
|
||
@xref{Special Float Values}, and @ref{Invalid Optimizations}.
|
||
|
||
@node Shift Operations
|
||
@section Shift Operations
|
||
@cindex shift operators
|
||
@cindex operators, shift
|
||
@cindex operators, shift
|
||
@cindex shift count
|
||
|
||
@dfn{Shifting} an integer means moving the bit values to the left or
|
||
right within the bits of the data type. Shifting is defined only for
|
||
integers. Here's the way to write it:
|
||
|
||
@example
|
||
/* @r{Left shift.} */
|
||
5 << 2 @result{} 20
|
||
|
||
/* @r{Right shift.} */
|
||
5 >> 2 @result{} 1
|
||
@end example
|
||
|
||
@noindent
|
||
The left operand is the value to be shifted, and the right operand
|
||
says how many bits to shift it (the @dfn{shift count}). The left
|
||
operand is promoted (@pxref{Operand Promotions}), so shifting never
|
||
operates on a narrow integer type; it's always either @code{int} or
|
||
wider. The result of the shift operation has the same type as the
|
||
promoted left operand.
|
||
|
||
The examples in this section use binary constants, starting with
|
||
@samp{0b} (@pxref{Integer Constants}). They stand for 32-bit integers
|
||
of type @code{int}.
|
||
|
||
@menu
|
||
* Bits Shifted In:: How shifting makes new bits to shift in.
|
||
* Shift Caveats:: Caveats of shift operations.
|
||
* Shift Hacks:: Clever tricks with shift operations.
|
||
@end menu
|
||
|
||
@node Bits Shifted In
|
||
@subsection Shifting Makes New Bits
|
||
|
||
A shift operation shifts towards one end of the number and has to
|
||
generate new bits at the other end.
|
||
|
||
Shifting left one bit must generate a new least significant bit. It
|
||
always brings in zero there. It is equivalent to multiplying by the
|
||
appropriate power of 2. For example,
|
||
|
||
@example
|
||
5 << 3 @r{is equivalent to} 5 * 2*2*2
|
||
-10 << 4 @r{is equivalent to} -10 * 2*2*2*2
|
||
@end example
|
||
|
||
The meaning of shifting right depends on whether the data type is
|
||
signed or unsigned (@pxref{Signed and Unsigned Types}). For a signed
|
||
data type, GNU C performs ``arithmetic shift,'' which keeps the number's
|
||
sign unchanged by duplicating the sign bit. For an unsigned data
|
||
type, it performs ``logical shift,'' which always shifts in zeros at
|
||
the most significant bit.
|
||
|
||
In both cases, shifting right one bit is division by two, rounding
|
||
towards negative infinity. For example,
|
||
|
||
@example
|
||
(unsigned) 19 >> 2 @result{} 4
|
||
(unsigned) 20 >> 2 @result{} 5
|
||
(unsigned) 21 >> 2 @result{} 5
|
||
@end example
|
||
|
||
For a negative left operand @code{a}, @code{a >> 1} is not equivalent
|
||
to @code{a / 2}. Both operations divide by 2, but @samp{/} rounds
|
||
toward zero.
|
||
|
||
The shift count must be zero or greater. Shifting by a negative
|
||
number of bits gives machine-dependent results.
|
||
|
||
@node Shift Caveats
|
||
@subsection Caveats for Shift Operations
|
||
|
||
@strong{Warning:} If the shift count is greater than or equal to the
|
||
width in bits of the promoted first operand, the results are
|
||
machine-dependent. Logically speaking, the ``correct'' value would be
|
||
either @minus{}1 (for right shift of a negative number) or 0 (in all other
|
||
cases), but the actual result is whatever the machine's shift
|
||
instruction does in that case. So unless you can prove that the
|
||
second operand is not too large, write code to check it at run time.
|
||
|
||
@strong{Warning:} Never rely on how the shift operators relate in
|
||
precedence to other arithmetic binary operators. Programmers don't
|
||
remember these precedences, and won't understand the code. Always use
|
||
parentheses to explicitly specify the nesting, like this:
|
||
|
||
@example
|
||
a + (b << 5) /* @r{Shift first, then add.} */
|
||
(a + b) << 5 /* @r{Add first, then shift.} */
|
||
@end example
|
||
|
||
Note: according to the C standard, shifting of signed values isn't
|
||
guaranteed to work properly when the value shifted is negative, or
|
||
becomes negative during shifting. However, only pedants have a reason
|
||
to be concerned about this; only computers with strange shift
|
||
instructions could plausibly do this wrong. In GNU C, the operation
|
||
always works as expected.
|
||
|
||
@node Shift Hacks
|
||
@subsection Shift Hacks
|
||
|
||
You can use the shift operators for various useful hacks. For
|
||
example, given a date specified by day of the month @code{d}, month
|
||
@code{m}, and year @code{y}, you can store the entire date in a single
|
||
integer @code{date}:
|
||
|
||
The examples in this section use binary constants, starting with
|
||
@samp{0b} (@pxref{Integer Constants}). They stand for 32-bit integers
|
||
of type @code{int}.
|
||
|
||
@example
|
||
unsigned int d = 12; /* @r{12 in binary is 0b1100.} */
|
||
unsigned int m = 6; /* @r{6 in binary is 0b110.} */
|
||
unsigned int y = 1983; /* @r{1983 in binary is 0b11110111111.} */
|
||
unsigned int date = (((y << 4) + m) << 5) + d;
|
||
/* @r{Add 0b11110111111000000000}
|
||
@r{and 0b11000000 and 0b1100.}
|
||
@r{Sum is 0b11110111111011001100.} */
|
||
@end example
|
||
|
||
@noindent
|
||
To extract the day, month, and year out of
|
||
@code{date}, use a combination of shift and remainder:
|
||
|
||
@example
|
||
/* @r{32 in binary is 0b100000.} */
|
||
/* @r{Remainder dividing by 32 gives lowest 5 bits, 0b1100.} */
|
||
d = date % 32;
|
||
/* @r{Shifting 5 bits right discards the day, leaving 0b111101111110110.}
|
||
@r{Remainder dividing by 16 gives lowest remaining 4 bits, 0b110.} */
|
||
m = (date >> 5) % 16;
|
||
/* @r{Shifting 9 bits right discards day and month,}
|
||
@r{leaving 0b11110111111.} */
|
||
y = date >> 9;
|
||
@end example
|
||
|
||
@code{-1 << LOWBITS} is a clever way to make an integer whose
|
||
@code{LOWBITS} lowest bits are all 0 and the rest are all 1.
|
||
@code{-(1 << LOWBITS)} is equivalent to that, since negating a value
|
||
is equivalent to multiplying it by @minus{}1.
|
||
|
||
@node Bitwise Operations
|
||
@section Bitwise Operations
|
||
@cindex bitwise operators
|
||
@cindex operators, bitwise
|
||
@cindex negation, bitwise
|
||
@cindex conjunction, bitwise
|
||
@cindex disjunction, bitwise
|
||
|
||
Bitwise operators operate on integers, treating each bit independently.
|
||
They are not allowed for floating-point types.
|
||
|
||
As in the previous section, the examples in this section use binary
|
||
constants, starting with @samp{0b} (@pxref{Integer Constants}). They
|
||
stand for 32-bit integers of type @code{int}.
|
||
|
||
@table @code
|
||
@item ~@code{a}
|
||
Unary operator for bitwise negation; this changes each bit of
|
||
@code{a} from 1 to 0 or from 0 to 1.
|
||
|
||
@example
|
||
~0b10101000 @result{} 0b11111111111111111111111101010111
|
||
~0 @result{} 0b11111111111111111111111111111111
|
||
~0b11111111111111111111111111111111 @result{} 0
|
||
~ (-1) @result{} 0
|
||
@end example
|
||
|
||
It is useful to remember that @code{~@var{x} + 1} equals
|
||
@code{-@var{x}}, for integers, and @code{~@var{x}} equals
|
||
@code{-@var{x} - 1}. The last example above shows this with @minus{}1
|
||
as @var{x}.
|
||
|
||
@item @code{a} & @code{b}
|
||
Binary operator for bitwise ``and'' or ``conjunction.'' Each bit in
|
||
the result is 1 if that bit is 1 in both @code{a} and @code{b}.
|
||
|
||
@example
|
||
0b10101010 & 0b11001100 @result{} 0b10001000
|
||
@end example
|
||
|
||
@item @code{a} | @code{b}
|
||
Binary operator for bitwise ``or'' (``inclusive or'' or
|
||
``disjunction''). Each bit in the result is 1 if that bit is 1 in
|
||
either @code{a} or @code{b}.
|
||
|
||
@example
|
||
0b10101010 | 0b11001100 @result{} 0b11101110
|
||
@end example
|
||
|
||
@item @code{a} ^ @code{b}
|
||
Binary operator for bitwise ``xor'' (``exclusive or''). Each bit in
|
||
the result is 1 if that bit is 1 in exactly one of @code{a} and @code{b}.
|
||
|
||
@example
|
||
0b10101010 ^ 0b11001100 @result{} 0b01100110
|
||
@end example
|
||
@end table
|
||
|
||
To understand the effect of these operators on signed integers, keep
|
||
in mind that all modern computers use two's-complement representation
|
||
(@pxref{Integer Representations}) for negative integers. This means
|
||
that the highest bit of the number indicates the sign; it is 1 for a
|
||
negative number and 0 for a positive number. In a negative number,
|
||
the value in the other bits @emph{increases} as the number gets closer
|
||
to zero, so that @code{0b111@r{@dots{}}111} is @minus{}1 and
|
||
@code{0b100@r{@dots{}}000} is the most negative possible integer.
|
||
|
||
@strong{Warning:} C defines a precedence ordering for the bitwise
|
||
binary operators, but you should never rely on it. Likewise, you
|
||
should never rely on how bitwise binary operators relate in precedence
|
||
to the arithmetic and shift binary operators. Other programmers don't
|
||
remember these aspects of C's precedence ordering; to make your
|
||
programs clear, always use parentheses to explicitly specify the
|
||
nesting among these operators.
|
||
|
||
For example, suppose @code{offset} is an integer that specifies
|
||
the offset within shared memory of a table, except that its bottom few
|
||
bits (@code{LOWBITS} says how many) are special flags. Here's
|
||
how to get just that offset and add it to the base address.
|
||
|
||
@example
|
||
shared_mem_base + (offset & (-1 << LOWBITS))
|
||
@end example
|
||
|
||
Thanks to the outer set of parentheses, we don't need to know whether
|
||
@samp{&} has higher precedence than @samp{+}. Thanks to the inner
|
||
set, we don't need to know whether @samp{&} has higher precedence than
|
||
@samp{<<}. But we can rely on all unary operators to have higher
|
||
precedence than any binary operator, so we don't need parentheses
|
||
around the left operand of @samp{<<}.
|
||
|
||
@node Assignment Expressions
|
||
@chapter Assignment Expressions
|
||
@cindex assignment expressions
|
||
@cindex operators, assignment
|
||
|
||
As a general concept in programming, an @dfn{assignment} is a
|
||
construct that stores a new value into a place where values can be
|
||
stored---for instance, in a variable. Such places are called
|
||
@dfn{lvalues} (@pxref{Lvalues}) because they are locations that hold a value.
|
||
|
||
In C, an assignment is an expression because it has a value; we call
|
||
it an @dfn{assignment expression}. A simple assignment looks like
|
||
|
||
@example
|
||
@var{lvalue} = @var{value-to-store}
|
||
@end example
|
||
|
||
@noindent
|
||
We say it assigns the value of the expression @var{value-to-store} to
|
||
the location @var{lvalue}, or that it stores @var{value-to-store}
|
||
there. You can think of the ``l'' in ``lvalue'' as standing for
|
||
``left,'' since that's what you put on the left side of the assignment
|
||
operator.
|
||
|
||
However, that's not the only way to use an lvalue, and not all lvalues
|
||
can be assigned to. To use the lvalue in the left side of an
|
||
assignment, it has to be @dfn{modifiable}. In C, that means it was
|
||
not declared with the type qualifier @code{const} (@pxref{const}).
|
||
|
||
The value of the assignment expression is that of @var{lvalue} after
|
||
the new value is stored in it. This means you can use an assignment
|
||
inside other expressions. Assignment operators are right-associative
|
||
so that
|
||
|
||
@example
|
||
x = y = z = 0;
|
||
@end example
|
||
|
||
@noindent
|
||
is equivalent to
|
||
|
||
@example
|
||
x = (y = (z = 0));
|
||
@end example
|
||
|
||
This is the only useful way for them to associate;
|
||
the other way,
|
||
|
||
@example
|
||
((x = y) = z) = 0;
|
||
@end example
|
||
|
||
@noindent
|
||
would be invalid since an assignment expression such as @code{x = y}
|
||
is not a valid lvalue.
|
||
|
||
@strong{Warning:} Write parentheses around an assignment if you nest
|
||
it inside another expression, unless that containing expression is a
|
||
comma-separated series or another assignment. For example,
|
||
see @ref{Logicals and Assignments}, and @ref{Uses of Comma}.
|
||
|
||
@menu
|
||
* Simple Assignment:: The basics of storing a value.
|
||
* Lvalues:: Expressions into which a value can be stored.
|
||
* Modifying Assignment:: Shorthand for changing an lvalue's contents.
|
||
* Increment/Decrement:: Shorthand for incrementing and decrementing
|
||
an lvalue's contents.
|
||
* Postincrement/Postdecrement:: Accessing then incrementing or decrementing.
|
||
* Assignment in Subexpressions:: How to avoid ambiguity.
|
||
* Write Assignments Separately:: Write assignments as separate statements.
|
||
@end menu
|
||
|
||
@node Simple Assignment
|
||
@section Simple Assignment
|
||
@cindex simple assignment
|
||
@cindex assignment, simple
|
||
|
||
A @dfn{simple assignment expression} computes the value of the right
|
||
operand and stores it into the lvalue on the left. Here is a simple
|
||
assignment expression that stores 5 in @code{i}:
|
||
|
||
@example
|
||
i = 5
|
||
@end example
|
||
|
||
@noindent
|
||
We say that this is an @dfn{assignment to} the variable @code{i} and
|
||
that it @dfn{assigns} @code{i} the value 5. It has no semicolon
|
||
because it is an expression (so it has a value). Adding a semicolon
|
||
at the end would make it a statement (@pxref{Expression Statement}).
|
||
|
||
Here is another example of a simple assignment expression. Its
|
||
operands are not simple, but the kind of assignment done here is
|
||
simple assignment.
|
||
|
||
@example
|
||
x[foo ()] = y + 6
|
||
@end example
|
||
|
||
A simple assignment with two different numeric data types converts the
|
||
right operand value to the lvalue's type, if possible. It can convert
|
||
any numeric type to any other numeric type.
|
||
|
||
Simple assignment is also allowed on some non-numeric types: pointers
|
||
(@pxref{Pointers}), structures (@pxref{Structure Assignment}), and
|
||
unions (@pxref{Unions}).
|
||
|
||
@strong{Warning:} Assignment is not allowed on arrays because
|
||
there are no array values in C; C variables can be arrays, but these
|
||
arrays cannot be manipulated as wholes. @xref{Limitations of C
|
||
Arrays}.
|
||
|
||
@xref{Assignment Type Conversions}, for the complete rules about data
|
||
types used in assignments.
|
||
|
||
@node Lvalues
|
||
@section Lvalues
|
||
@cindex lvalues
|
||
|
||
An expression that identifies a memory space that holds a value is
|
||
called an @dfn{lvalue}, because it is a location that can hold a value.
|
||
|
||
The standard kinds of lvalues are:
|
||
|
||
@itemize @bullet
|
||
@item
|
||
A variable.
|
||
|
||
@item
|
||
A pointer-dereference expression (@pxref{Pointer Dereference}) using
|
||
unary @samp{*}, if its type is not a function type.
|
||
|
||
@item
|
||
A structure field reference (@pxref{Structures}) using @samp{.}, if
|
||
the structure value is an lvalue.
|
||
|
||
@item
|
||
A structure field reference using @samp{->}. This is always an lvalue
|
||
since @samp{->} implies pointer dereference.
|
||
|
||
@item
|
||
A union alternative reference (@pxref{Unions}), on the same conditions
|
||
as for structure fields.
|
||
|
||
@item
|
||
An array-element reference using @samp{[@r{@dots{}}]}, if the array
|
||
is an lvalue.
|
||
|
||
@item
|
||
A string constant (@pxref{String Constants}).
|
||
|
||
@item
|
||
An array constructor (@pxref{Constructing Array Values}).
|
||
|
||
@item
|
||
A structure or union constructor (@pxref{Structure Constructors}).
|
||
@end itemize
|
||
|
||
If an expression's outermost operation is any other operator, that
|
||
expression is not an lvalue. Thus, the variable @code{x} is an
|
||
lvalue, but @code{x + 0} is not, even though these two expressions
|
||
compute the same value (assuming @code{x} is a number).
|
||
|
||
It is rare that a structure value or an array value is not an lvalue,
|
||
but that does happen---for instance, the result of a function call or
|
||
a conditional operator can have a structure or array type, but is
|
||
never an lvalue.
|
||
|
||
If an array is an lvalue, using the array in an expression still
|
||
converts it automatically to a pointer to the zeroth element. The
|
||
result of this conversion is not an lvalue. Thus, if the variable
|
||
@code{a} is an array, you can't use @code{a} by itself as the left
|
||
operand of an assignment. But you can assign to an element of
|
||
@code{a}, such as @code{a[0]}. That is an lvalue since @code{a} is an
|
||
lvalue.
|
||
|
||
@node Modifying Assignment
|
||
@section Modifying Assignment
|
||
@cindex modifying assignment
|
||
@cindex assignment, modifying
|
||
|
||
You can abbreviate the common construct
|
||
|
||
@example
|
||
@var{lvalue} = @var{lvalue} + @var{expression}
|
||
@end example
|
||
|
||
@noindent
|
||
as
|
||
|
||
@example
|
||
@var{lvalue} += @var{expression}
|
||
@end example
|
||
|
||
This is known as a @dfn{modifying assignment}. For instance,
|
||
|
||
@example
|
||
i = i + 5;
|
||
i += 5;
|
||
@end example
|
||
|
||
@noindent
|
||
shows two statements that are equivalent. The first uses
|
||
simple assignment; the second uses modifying assignment.
|
||
|
||
Modifying assignment works with any binary arithmetic operator. For
|
||
instance, you can subtract something from an lvalue like this,
|
||
|
||
@example
|
||
@var{lvalue} -= @var{expression}
|
||
@end example
|
||
|
||
@noindent
|
||
or multiply it by a certain amount like this,
|
||
|
||
@example
|
||
@var{lvalue} *= @var{expression}
|
||
@end example
|
||
|
||
@noindent
|
||
or shift it by a certain amount like this.
|
||
|
||
@example
|
||
@var{lvalue} <<= @var{expression}
|
||
@var{lvalue} >>= @var{expression}
|
||
@end example
|
||
|
||
In most cases, this feature adds no power to the language, but it
|
||
provides substantial convenience. Also, when @var{lvalue} contains
|
||
code that has side effects, the simple assignment performs those side
|
||
effects twice, while the modifying assignment performs them once. For
|
||
instance, suppose that the function @code{foo} has a side effect, perhaps
|
||
changing static storage. This statement
|
||
|
||
@example
|
||
x[foo ()] = x[foo ()] + 5;
|
||
@end example
|
||
|
||
@noindent
|
||
calls @code{foo} twice. If @code{foo} operates on static variables,
|
||
it could return a different value each time. If @code{foo ()} will
|
||
return 1 the first time and 3 the second time, the effect could be to
|
||
add @code{x[3]} and 5 and store the result in @code{x[1]}, or to add
|
||
@code{x[1]} and 5 and store the result in @code{x[3]}. We don't know
|
||
which of the two it will do, because C does not specify which call to
|
||
@code{foo} is computed first.
|
||
|
||
Such a statement is not well defined, and shouldn't be used.
|
||
|
||
By contrast,
|
||
|
||
@example
|
||
x[foo ()] += 5;
|
||
@end example
|
||
|
||
@noindent
|
||
is well defined: it calls @code{foo} only once to determine which
|
||
element of @code{x} to adjust, and it adjusts that element by adding 5
|
||
to it.
|
||
|
||
@node Increment/Decrement
|
||
@section Increment and Decrement Operators
|
||
@cindex increment operator
|
||
@cindex decrement operator
|
||
@cindex operator, increment
|
||
@cindex operator, decrement
|
||
@cindex preincrement expression
|
||
@cindex predecrement expression
|
||
|
||
The operators @samp{++} and @samp{--} are the @dfn{increment} and
|
||
@dfn{decrement} operators. When used on a numeric value, they add or
|
||
subtract 1. We don't consider them assignments, but they are
|
||
equivalent to assignments.
|
||
|
||
Using @samp{++} or @samp{--} as a prefix, before an lvalue, is called
|
||
@dfn{preincrement} or @dfn{predecrement}. This adds or subtracts 1
|
||
and the result becomes the expression's value. For instance,
|
||
|
||
@example
|
||
#include <stdio.h> /* @r{Declares @code{printf}.} */
|
||
|
||
int
|
||
main (void)
|
||
@{
|
||
int i = 5;
|
||
printf ("%d\n", i);
|
||
printf ("%d\n", ++i);
|
||
printf ("%d\n", i);
|
||
return 0;
|
||
@}
|
||
@end example
|
||
|
||
@noindent
|
||
prints lines containing @samp{5}, @samp{6}, and @samp{6} again. The
|
||
expression @code{++i} increments @code{i} from 5 to 6, and has the
|
||
value 6, so the output from @code{printf} on that line says @samp{6}.
|
||
|
||
Using @samp{--} instead, for predecrement,
|
||
|
||
@example
|
||
#include <stdio.h> /* @r{Declares @code{printf}.} */
|
||
|
||
int
|
||
main (void)
|
||
@{
|
||
int i = 5;
|
||
printf ("%d\n", i);
|
||
printf ("%d\n", --i);
|
||
printf ("%d\n", i);
|
||
return 0;
|
||
@}
|
||
@end example
|
||
|
||
@noindent
|
||
prints three lines that contain (respectively) @samp{5}, @samp{4}, and
|
||
again @samp{4}.
|
||
|
||
@node Postincrement/Postdecrement
|
||
@section Postincrement and Postdecrement
|
||
@cindex postincrement expression
|
||
@cindex postdecrement expression
|
||
@cindex operator, postincrement
|
||
@cindex operator, postdecrement
|
||
|
||
Using @samp{++} or @samp{--} @emph{after} an lvalue does something
|
||
peculiar: it gets the value directly out of the lvalue and @emph{then}
|
||
increments or decrements it. Thus, the value of @code{i++} is the same
|
||
as the value of @code{i}, but @code{i++} also increments @code{i} ``a
|
||
little later.'' This is called @dfn{postincrement} or
|
||
@dfn{postdecrement}.
|
||
|
||
For example,
|
||
|
||
@example
|
||
#include <stdio.h> /* @r{Declares @code{printf}.} */
|
||
|
||
int
|
||
main (void)
|
||
@{
|
||
int i = 5;
|
||
printf ("%d\n", i);
|
||
printf ("%d\n", i++);
|
||
printf ("%d\n", i);
|
||
return 0;
|
||
@}
|
||
@end example
|
||
|
||
@noindent
|
||
prints lines containing @samp{5}, again @samp{5}, and @samp{6}. The
|
||
expression @code{i++} has the value 5, which is the value of @code{i}
|
||
at the time, but it increments @code{i} from 5 to 6 just a little
|
||
later.
|
||
|
||
How much later is ``just a little later''? The compiler has some
|
||
flexibility in deciding that. The rule is that the increment has to
|
||
happen by the next @dfn{sequence point}; in simple cases, that means
|
||
by the end of the statement. @xref{Sequence Points}.
|
||
|
||
Regardless of precisely where the compiled code increments the value
|
||
of @code{i}, the crucial thing is that the value of @code{i++} is the
|
||
value that @code{i} has @emph{before} incrementing it.
|
||
|
||
If a unary operator precedes a postincrement or postdecrement expression,
|
||
the post-whatever expression nests inside:
|
||
|
||
@example
|
||
-a++ @r{is equivalent to} -(a++)
|
||
@end example
|
||
|
||
In some cases, for instance this one, the other order would not even
|
||
make sense; @code{-a} is not an lvalue, so it can't be incremented.
|
||
|
||
The most common use of postincrement is with arrays. Here's an
|
||
example of using postincrement to access one element of an array and
|
||
advance the index for the next access. Compare this with the example
|
||
@code{avg_of_double} (@pxref{Array Example}), which is almost the same
|
||
but doesn't use postincrement for that.
|
||
|
||
@example
|
||
double
|
||
avg_of_double_alt (int length, double input_data[])
|
||
@{
|
||
double sum = 0;
|
||
int i;
|
||
|
||
/* @r{Fetch each element and add it into @code{sum}.} */
|
||
for (i = 0; i < length;)
|
||
/* @r{Use the index @code{i}, then increment it.} */
|
||
sum += input_data[i++];
|
||
|
||
return sum / length;
|
||
@}
|
||
@end example
|
||
|
||
@node Assignment in Subexpressions
|
||
@section Pitfall: Assignment in Subexpressions
|
||
@cindex assignment in subexpressions
|
||
@cindex subexpressions, assignment in
|
||
|
||
In C, the order of computing parts of an expression is not fixed.
|
||
Aside from a few special cases, the operations can be computed in any
|
||
order. If one part of the expression has an assignment to @code{x}
|
||
and another part of the expression uses @code{x}, the result is
|
||
unpredictable because that use might be computed before or after the
|
||
assignment.
|
||
|
||
Here's an example of ambiguous code:
|
||
|
||
@example
|
||
x = 20;
|
||
printf ("%d %d\n", x, x = 4);
|
||
@end example
|
||
|
||
@noindent
|
||
If the second argument, @code{x}, is computed before the third argument,
|
||
@code{x = 4}, the second argument's value will be 20. If they are
|
||
computed in the other order, the second argument's value will be 4.
|
||
|
||
Here's one way to make that code unambiguous:
|
||
|
||
@example
|
||
y = 20;
|
||
printf ("%d %d\n", y, x = 4);
|
||
@end example
|
||
|
||
Here's another way, with the other meaning:
|
||
|
||
@example
|
||
x = 4;
|
||
printf ("%d %d\n", x, x);
|
||
@end example
|
||
|
||
This issue applies to all kinds of assignments, and to the increment
|
||
and decrement operators, which are equivalent to assignments.
|
||
@xref{Order of Execution}, for more information about this.
|
||
|
||
However, it can be useful to write assignments inside an
|
||
@code{if}-condition or @code{while}-test along with logical operators.
|
||
@xref{Logicals and Assignments}.
|
||
|
||
@node Write Assignments Separately
|
||
@section Write Assignments in Separate Statements
|
||
|
||
It is often convenient to write an assignment inside an
|
||
@code{if}-condition, but that can reduce the readability of the
|
||
program. Here's an example of what to avoid:
|
||
|
||
@example
|
||
if (x = advance (x))
|
||
@r{@dots{}}
|
||
@end example
|
||
|
||
The idea here is to advance @code{x} and test if the value is nonzero.
|
||
However, readers might miss the fact that it uses @samp{=} and not
|
||
@samp{==}. In fact, writing @samp{=} where @samp{==} was intended
|
||
inside a condition is a common error, so GNU C can give warnings when
|
||
@samp{=} appears in a way that suggests it's an error.
|
||
|
||
It is much clearer to write the assignment as a separate statement, like this:
|
||
|
||
@example
|
||
x = advance (x);
|
||
if (x != 0)
|
||
@r{@dots{}}
|
||
@end example
|
||
|
||
@noindent
|
||
This makes it unmistakably clear that @code{x} is assigned a new value.
|
||
|
||
Another method is to use the comma operator (@pxref{Comma Operator}),
|
||
like this:
|
||
|
||
@example
|
||
if (x = advance (x), x != 0)
|
||
@r{@dots{}}
|
||
@end example
|
||
|
||
@noindent
|
||
However, putting the assignment in a separate statement is usually clearer
|
||
(unless the assignment is very short), because it reduces nesting.
|
||
|
||
@node Execution Control Expressions
|
||
@chapter Execution Control Expressions
|
||
@cindex execution control expressions
|
||
@cindex expressions, execution control
|
||
|
||
This chapter describes the C operators that combine expressions to
|
||
control which of those expressions execute, or in which order.
|
||
|
||
@menu
|
||
* Logical Operators:: Logical conjunction, disjunction, negation.
|
||
* Logicals and Comparison:: Logical operators with comparison operators.
|
||
* Logicals and Assignments:: Assignments with logical operators.
|
||
* Conditional Expression:: An if/else construct inside expressions.
|
||
* Comma Operator:: Build a sequence of subexpressions.
|
||
@end menu
|
||
|
||
@node Logical Operators
|
||
@section Logical Operators
|
||
@cindex logical operators
|
||
@cindex operators, logical
|
||
@cindex conjunction operator
|
||
@cindex disjunction operator
|
||
@cindex negation operator, logical
|
||
|
||
The @dfn{logical operators} combine truth values, which are normally
|
||
represented in C as numbers. Any expression with a numeric value is a
|
||
valid truth value: zero means false, and any other value means true.
|
||
A pointer type is also meaningful as a truth value; a null pointer
|
||
(which is zero) means false, and a non-null pointer means true
|
||
(@pxref{Pointer Types}). The value of a logical operator is always 1
|
||
or 0 and has type @code{int} (@pxref{Integer Types}).
|
||
|
||
The logical operators are used mainly in the condition of an @code{if}
|
||
statement, or in the end test in a @code{for} statement or
|
||
@code{while} statement (@pxref{Statements}). However, they are valid
|
||
in any context where an integer-valued expression is allowed.
|
||
|
||
@table @samp
|
||
@item ! @var{exp}
|
||
Unary operator for logical ``not.'' The value is 1 (true) if
|
||
@var{exp} is 0 (false), and 0 (false) if @var{exp} is nonzero (true).
|
||
|
||
@strong{Warning:} If @var{exp} is anything but an lvalue or a
|
||
function call, you should write parentheses around it.
|
||
|
||
@item @var{left} && @var{right}
|
||
The logical ``and'' binary operator computes @var{left} and, if necessary,
|
||
@var{right}. If both of the operands are true, the @samp{&&} expression
|
||
gives the value 1 (true). Otherwise, the @samp{&&} expression
|
||
gives the value 0 (false). If @var{left} yields a false value,
|
||
that determines the overall result, so @var{right} is not computed.
|
||
|
||
@item @var{left} || @var{right}
|
||
The logical ``or'' binary operator computes @var{left} and, if necessary,
|
||
@var{right}. If at least one of the operands is true, the @samp{||} expression
|
||
gives the value 1 (which is true). Otherwise, the @samp{||} expression
|
||
gives the value 0 (false). If @var{left} yields a true value,
|
||
that determines the overall result, so @var{right} is not computed.
|
||
@end table
|
||
|
||
@strong{Warning:} Never rely on the relative precedence of @samp{&&}
|
||
and @samp{||}. When you use them together, always use parentheses to
|
||
specify explicitly how they nest, as shown here:
|
||
|
||
@example
|
||
if ((r != 0 && x % r == 0)
|
||
||
|
||
(s != 0 && x % s == 0))
|
||
@end example
|
||
|
||
@node Logicals and Comparison
|
||
@section Logical Operators and Comparisons
|
||
|
||
The most common thing to use inside the logical operators is a
|
||
comparison. Conveniently, @samp{&&} and @samp{||} have lower
|
||
precedence than comparison operators and arithmetic operators, so we
|
||
can write expressions like this without parentheses and get the
|
||
nesting that is natural: two comparison operations that must both be
|
||
true.
|
||
|
||
@example
|
||
if (r != 0 && x % r == 0)
|
||
@end example
|
||
|
||
@noindent
|
||
This example also shows how it is useful that @samp{&&} guarantees to
|
||
skip the right operand if the left one turns out false. Because of
|
||
that, this code never tries to divide by zero.
|
||
|
||
This is equivalent:
|
||
|
||
@example
|
||
if (r && x % r == 0)
|
||
@end example
|
||
|
||
@noindent
|
||
A truth value is simply a number, so using @code{r} as a truth value
|
||
tests whether it is nonzero. But @code{r}'s meaning as an expression
|
||
is not a truth value---it is a number to divide by. So it is clearer
|
||
style to write the explicit @code{!= 0}.
|
||
|
||
Here's another equivalent way to write it:
|
||
|
||
@example
|
||
if (!(r == 0) && x % r == 0)
|
||
@end example
|
||
|
||
@noindent
|
||
This illustrates the unary @samp{!} operator, as well as the need to
|
||
write parentheses around its operand.
|
||
|
||
@node Logicals and Assignments
|
||
@section Logical Operators and Assignments
|
||
|
||
There are cases where assignments nested inside the condition can
|
||
actually make a program @emph{easier} to read. Here is an example
|
||
using a hypothetical type @code{list} which represents a list; it
|
||
tests whether the list has at least two links, using hypothetical
|
||
functions, @code{nonempty} which is true if the argument is a nonempty
|
||
list, and @code{list_next} which advances from one list link to the
|
||
next. We assume that a list is never a null pointer, so that the
|
||
assignment expressions are always ``true.''
|
||
|
||
@example
|
||
if (nonempty (list)
|
||
&& (temp1 = list_next (list))
|
||
&& nonempty (temp1)
|
||
&& (temp2 = list_next (temp1)))
|
||
@r{@dots{}} /* @r{use @code{temp1} and @code{temp2}} */
|
||
@end example
|
||
|
||
@noindent
|
||
Here we take advantage of the @samp{&&} operator to avoid executing
|
||
the rest of the code if a call to @code{nonempty} returns ``false.'' The
|
||
only natural place to put the assignments is among those calls.
|
||
|
||
It would be possible to rewrite this as several statements, but that
|
||
could make it much more cumbersome. On the other hand, when the test
|
||
is even more complex than this one, splitting it into multiple
|
||
statements might be necessary for clarity.
|
||
|
||
If an empty list is a null pointer, we can dispense with calling
|
||
@code{nonempty}:
|
||
|
||
@example
|
||
if (list
|
||
&& (temp1 = list_next (list))
|
||
&& (temp2 = list_next (temp1)))
|
||
@r{@dots{}}
|
||
@end example
|
||
|
||
@node Conditional Expression
|
||
@section Conditional Expression
|
||
@cindex conditional expression
|
||
@cindex expression, conditional
|
||
|
||
C has a conditional expression that selects one of two expressions
|
||
to compute and get the value from. It looks like this:
|
||
|
||
@example
|
||
@var{condition} ? @var{iftrue} : @var{iffalse}
|
||
@end example
|
||
|
||
@menu
|
||
* Conditional Rules:: Rules for the conditional operator.
|
||
* Conditional Branches:: About the two branches in a conditional.
|
||
@end menu
|
||
|
||
@node Conditional Rules
|
||
@subsection Rules for the Conditional Operator
|
||
|
||
The first operand, @var{condition}, should be a value that can be
|
||
compared with zero---a number or a pointer. If it is true (nonzero),
|
||
then the conditional expression computes @var{iftrue} and its value
|
||
becomes the value of the conditional expression. Otherwise the
|
||
conditional expression computes @var{iffalse} and its value becomes
|
||
the value of the conditional expression. The conditional expression
|
||
always computes just one of @var{iftrue} and @var{iffalse}, never both
|
||
of them.
|
||
|
||
Here's an example: the absolute value of a number @code{x}
|
||
can be written as @code{(x >= 0 ? x : -x)}.
|
||
|
||
@strong{Warning:} The conditional expression has rather low
|
||
syntactic precedence. Except when the conditional expression is used
|
||
as an argument in a function call, write parentheses around it. For
|
||
clarity, always write parentheses around it if it extends across more
|
||
than one line.
|
||
|
||
@strong{Warning:} Assignment operators and the comma operator
|
||
(@pxref{Comma Operator}) have lower precedence than conditional
|
||
expressions, so write parentheses around those when they appear inside
|
||
a conditional expression. @xref{Order of Execution}.
|
||
|
||
@c ??? Are there any other cases where it is fine to omit them?
|
||
@strong{Warning:} When nesting a conditional expression within another
|
||
conditional expression, unless a pair of matching delimiters surrounds
|
||
the inner conditional expression for some other reason, write
|
||
parentheses around it:
|
||
|
||
@example
|
||
((foo > 0 ? test1 : test2) ? (ifodd (foo) ? 5 : 10)
|
||
: (ifodd (whatever) ? 5 : 10));
|
||
@end example
|
||
|
||
@noindent
|
||
In the first operand, those parentheses are necessary to prevent
|
||
incorrect parsing. In the second and third operands, the computer may
|
||
not need the parentheses, but they will help human beings.
|
||
|
||
@node Conditional Branches
|
||
@subsection Conditional Operator Branches
|
||
@cindex branches of conditional expression
|
||
|
||
We call @var{iftrue} and @var{iffalse} the @dfn{branches} of the
|
||
conditional.
|
||
|
||
The two branches should normally have the same type, but a few
|
||
exceptions are allowed. If they are both numeric types, the
|
||
conditional converts both to their common type (@pxref{Common Type}).
|
||
|
||
With pointers (@pxref{Pointers}), the two values can be pointers to
|
||
nearly compatible types (@pxref{Compatible Types}). In this case, the
|
||
result type is a similar pointer whose target type combines all the
|
||
type qualifiers (@pxref{Type Qualifiers}) of both branches.
|
||
|
||
If one branch has type @code{void *} and the other is a pointer to an
|
||
object (not to a function), the conditional converts the latter to
|
||
@code{void *}.
|
||
|
||
If one branch is an integer constant with value zero and the other is
|
||
a pointer, the conditional converts zero to the pointer's type.
|
||
|
||
In GNU C, you can omit @var{iftrue} in a conditional expression. In
|
||
that case, if @var{condition} is nonzero, its value becomes the value of
|
||
the conditional expression, after conversion to the common type.
|
||
Thus,
|
||
|
||
@example
|
||
x ? : y
|
||
@end example
|
||
|
||
@noindent
|
||
has the value of @code{x} if that is nonzero; otherwise, the value of
|
||
@code{y}.
|
||
|
||
@cindex side effect in ?:
|
||
@cindex ?: side effect
|
||
Omitting @var{iftrue} is useful when @var{condition} has side effects.
|
||
In that case, writing that expression twice would carry out the side
|
||
effects twice, but writing it once does them just once. For example,
|
||
if we suppose that the function @code{next_element} advances a pointer
|
||
variable to point to the next element in a list and returns the new
|
||
pointer,
|
||
|
||
@example
|
||
next_element () ? : default_pointer
|
||
@end example
|
||
|
||
@noindent
|
||
is a way to advance the pointer and use its new value if it isn't
|
||
null, but use @code{default_pointer} if that is null. We cannot do
|
||
it this way,
|
||
|
||
@example
|
||
next_element () ? next_element () : default_pointer
|
||
@end example
|
||
|
||
@noindent
|
||
because that would advance the pointer a second time.
|
||
|
||
@node Comma Operator
|
||
@section Comma Operator
|
||
@cindex comma operator
|
||
@cindex operator, comma
|
||
|
||
The comma operator stands for sequential execution of expressions.
|
||
The value of the comma expression comes from the last expression in
|
||
the sequence; the previous expressions are computed only for their
|
||
side effects. It looks like this:
|
||
|
||
@example
|
||
@var{exp1}, @var{exp2} @r{@dots{}}
|
||
@end example
|
||
|
||
@noindent
|
||
You can bundle any number of expressions together this way, by putting
|
||
commas between them.
|
||
|
||
@menu
|
||
* Uses of Comma:: When to use the comma operator.
|
||
* Clean Comma:: Clean use of the comma operator.
|
||
* Avoid Comma:: When to not use the comma operator.
|
||
@end menu
|
||
|
||
@node Uses of Comma
|
||
@subsection The Uses of the Comma Operator
|
||
|
||
With commas, you can put several expressions into a place that allows
|
||
one expression---for example, in the header of a @code{for} statement.
|
||
This statement
|
||
|
||
@example
|
||
for (i = 0, j = 10, k = 20; i < n; i++)
|
||
@end example
|
||
|
||
@noindent
|
||
contains three assignment expressions, to initialize @code{i}, @code{j}
|
||
and @code{k}. The syntax of @code{for} requires just one expression
|
||
for initialization; to include three assignments, we use commas to
|
||
bundle them into a single larger expression, @code{i = 0, j = 10, k =
|
||
20}. This technique is also useful in the loop-advance expression,
|
||
the last of the three inside the @code{for} parentheses.
|
||
|
||
In the @code{for} statement and the @code{while} statement
|
||
(@pxref{Loop Statements}), a comma provides a way to perform some side
|
||
effect before the loop-exit test. For example,
|
||
|
||
@example
|
||
while (printf ("At the test, x = %d\n", x), x != 0)
|
||
@end example
|
||
|
||
@node Clean Comma
|
||
@subsection Clean Use of the Comma Operator
|
||
|
||
Always write parentheses around a series of comma operators, except
|
||
when it is at top level in an expression statement, or within the
|
||
parentheses of an @code{if}, @code{for}, @code{while}, or @code{switch}
|
||
statement (@pxref{Statements}). For instance, in
|
||
|
||
@example
|
||
for (i = 0, j = 10, k = 20; i < n; i++)
|
||
@end example
|
||
|
||
@noindent
|
||
the commas between the assignments are clear because they are between
|
||
a parenthesis and a semicolon.
|
||
|
||
The arguments in a function call are also separated by commas, but that is
|
||
not an instance of the comma operator. Note the difference between
|
||
|
||
@example
|
||
foo (4, 5, 6)
|
||
@end example
|
||
|
||
@noindent
|
||
which passes three arguments to @code{foo} and
|
||
|
||
@example
|
||
foo ((4, 5, 6))
|
||
@end example
|
||
|
||
@noindent
|
||
which uses the comma operator and passes just one argument
|
||
(with value 6).
|
||
|
||
@strong{Warning:} Don't use the comma operator within an argument
|
||
of a function unless it makes the code more readable. When you do so,
|
||
don't put part of another argument on the same line. Instead, add a
|
||
line break to make the parentheses around the comma operator easier to
|
||
see, like this.
|
||
|
||
@example
|
||
foo ((mumble (x, y), frob (z)),
|
||
*p)
|
||
@end example
|
||
|
||
@node Avoid Comma
|
||
@subsection When Not to Use the Comma Operator
|
||
|
||
You can use a comma in any subexpression, but in most cases it only
|
||
makes the code confusing, and it is clearer to raise all but the last
|
||
of the comma-separated expressions to a higher level. Thus, instead
|
||
of this:
|
||
|
||
@example
|
||
x = (y += 4, 8);
|
||
@end example
|
||
|
||
@noindent
|
||
it is much clearer to write this:
|
||
|
||
@example
|
||
y += 4, x = 8;
|
||
@end example
|
||
|
||
@noindent
|
||
or this:
|
||
|
||
@example
|
||
y += 4;
|
||
x = 8;
|
||
@end example
|
||
|
||
Use commas only in the cases where there is no clearer alternative
|
||
involving multiple statements.
|
||
|
||
By contrast, don't hesitate to use commas in the expansion in a macro
|
||
definition. The trade-offs of code clarity are different in that
|
||
case, because the @emph{use} of the macro may improve overall clarity
|
||
so much that the ugliness of the macro's @emph{definition} is a small
|
||
price to pay. @xref{Macros}.
|
||
|
||
@node Binary Operator Grammar
|
||
@chapter Binary Operator Grammar
|
||
@cindex binary operator grammar
|
||
@cindex grammar, binary operator
|
||
@cindex operator precedence
|
||
@cindex precedence, operator
|
||
@cindex left-associative
|
||
|
||
@dfn{Binary operators} are those that take two operands, one
|
||
on the left and one on the right.
|
||
|
||
All the binary operators in C are syntactically left-associative.
|
||
This means that @w{@code{a @var{op} b @var{op} c}} means @w{@code{(a
|
||
@var{op} b) @var{op} c}}. However, the only operators you should
|
||
repeat in this way without parentheses are @samp{+}, @samp{-},
|
||
@samp{*} and @samp{/}, because those cases are clear from algebra. So
|
||
it is OK to write @code{a + b + c} or @code{a - b - c}, but never
|
||
@code{a == b == c} or @code{a % b % c}. For those operators, use
|
||
explicit parentheses to show how the operations nest.
|
||
|
||
Each C operator has a @dfn{precedence}, which is its rank in the
|
||
grammatical order of the various operators. The operators with the
|
||
highest precedence grab adjoining operands first; these expressions
|
||
then become operands for operators of lower precedence.
|
||
|
||
The precedence order of operators in C is fully specified, so any
|
||
combination of operations leads to a well-defined nesting. We state
|
||
only part of the full precedence ordering here because it is bad
|
||
practice for C code to depend on the other cases. For cases not
|
||
specified in this chapter, always use parentheses to make the nesting
|
||
explicit.@footnote{Personal note from Richard Stallman: I wrote GCC
|
||
without remembering anything about the C precedence order beyond
|
||
what's stated here. I studied the full precedence table to write the
|
||
parser, and promptly forgot it again. If you need to look up the full
|
||
precedence order to understand some C code, add enough parentheses so
|
||
nobody else needs to do that.}
|
||
|
||
Clean code can depend on this subsequence of the precedence ordering
|
||
(stated from highest precedence to lowest):
|
||
|
||
@enumerate
|
||
@item
|
||
Postfix operations: access to a field or alternative (@samp{.} and
|
||
@samp{->}), array subscripting, function calls, and unary postfix
|
||
operators.
|
||
|
||
@item
|
||
Unary prefix operations.
|
||
|
||
@item
|
||
Multiplication, division, and remainder (they have the same precedence).
|
||
|
||
@item
|
||
Addition and subtraction (they have the same precedence).
|
||
|
||
@item
|
||
Comparisons---but watch out!
|
||
|
||
@item
|
||
Logical operations @samp{&&} and @samp{||}---but watch out!
|
||
|
||
@item
|
||
Conditional expression with @samp{?} and @samp{:}.
|
||
|
||
@item
|
||
Assignments.
|
||
|
||
@item
|
||
Sequential execution (the comma operator, @samp{,}).
|
||
@end enumerate
|
||
|
||
Two of the lines in the above list say ``but watch out!'' That means
|
||
that the line covers operations with subtly different precedence. When
|
||
you use two comparison operations together, don't depend on the
|
||
grammar of C to control how they nest. Instead, always use
|
||
parentheses to show their nesting.
|
||
|
||
You can let several @samp{&&} operations associate, or several
|
||
@samp{||} operations, but always use parentheses to show how @samp{&&}
|
||
and @samp{||} nest with each other. @xref{Logical Operators}.
|
||
|
||
There is one other precedence ordering that clean code can depend on:
|
||
|
||
@enumerate
|
||
@item
|
||
Unary postfix operations.
|
||
|
||
@item
|
||
Bitwise and shift operations---but watch out!
|
||
|
||
@item
|
||
Conditional expression with @samp{?} and @samp{:}.
|
||
@end enumerate
|
||
|
||
The caveat for bitwise and shift operations is like that for logical
|
||
operators: you can let multiple uses of one bitwise operation
|
||
associate, but always use parentheses to control nesting of dissimilar
|
||
operations.
|
||
|
||
These lists do not specify any precedence ordering between the bitwise
|
||
and shift operations of the second list and the binary operations
|
||
above conditional expressions in the first list. When they come
|
||
together, parenthesize them. @xref{Bitwise Operations}.
|
||
|
||
@node Order of Execution
|
||
@chapter Order of Execution
|
||
@cindex order of execution
|
||
|
||
The order of execution of a C program is not always obvious, and not
|
||
necessarily predictable. This chapter describes what you can count on.
|
||
|
||
@menu
|
||
* Reordering of Operands:: Operations in C are not necessarily computed
|
||
in the order they are written.
|
||
* Associativity and Ordering:: Some associative operations are performed
|
||
in a particular order; others are not.
|
||
* Sequence Points:: Some guarantees about the order of operations.
|
||
* Postincrement and Ordering:: Ambiguous execution order with postincrement.
|
||
* Ordering of Operands:: Evaluation order of operands
|
||
and function arguments.
|
||
* Optimization and Ordering:: Compiler optimizations can reorder operations
|
||
only if it has no impact on program results.
|
||
@end menu
|
||
|
||
@node Reordering of Operands
|
||
@section Reordering of Operands
|
||
@cindex ordering of operands
|
||
@cindex reordering of operands
|
||
@cindex operand execution ordering
|
||
|
||
The C language does not necessarily carry out operations within an
|
||
expression in the order they appear in the code. For instance, in
|
||
this expression,
|
||
|
||
@example
|
||
foo () + bar ()
|
||
@end example
|
||
|
||
@noindent
|
||
@code{foo} might be called first or @code{bar} might be called first.
|
||
If @code{foo} updates a datum and @code{bar} uses that datum, the
|
||
results can be unpredictable.
|
||
|
||
The unpredictable order of computation of subexpressions also makes a
|
||
difference when one of them contains an assignment. We already saw
|
||
this example of bad code,
|
||
|
||
@example
|
||
x = 20;
|
||
printf ("%d %d\n", x, x = 4);
|
||
@end example
|
||
|
||
@noindent
|
||
in which the second argument, @code{x}, has a different value
|
||
depending on whether it is computed before or after the assignment in
|
||
the third argument.
|
||
|
||
@node Associativity and Ordering
|
||
@section Associativity and Ordering
|
||
@cindex associativity and ordering
|
||
|
||
@c ??? What to say about signed overflow and associativity.
|
||
|
||
The bitwise binary operators, @code{&}, @code{|} and @code{^}, are
|
||
associative. The arithmetic binary operators @code{+} and @code{*}
|
||
are associative if the operand type is unsigned. An associative
|
||
binary operator, when used repeatedly, can combine any number of
|
||
operands. The operands' values may be computed in any order, and
|
||
since the operation is associative, they can be combined in any order
|
||
too.
|
||
|
||
Thus, given four functions that return @code{unsigned int}, calling
|
||
them and adding their results as here
|
||
|
||
@example
|
||
(foo () + bar ()) + (baz () + quux ())
|
||
@end example
|
||
|
||
@noindent
|
||
may add up the results in any order.
|
||
|
||
By contrast, arithmetic on signed integers is not always associative
|
||
because there is the possibility of overflow (@pxref{Integer
|
||
Overflow}). Thus, the additions must be done in the order specified,
|
||
obeying parentheses (or left-association in the absence of
|
||
parentheses). That means computing @code{(foo () + bar ())} and
|
||
@code{(baz () + quux ())} first (in either order), then adding the
|
||
two.
|
||
|
||
@c ??? Does use of -fwrapv make signed addition count as associative?
|
||
|
||
The same applies to arithmetic on floating-point values, since that
|
||
too is not really associative. However, the GCC option
|
||
@option{-funsafe-math-optimizations} allows the compiler to change the
|
||
order of calculation when an associative operation (associative in
|
||
exact mathematics) combines several operands. The option takes effect
|
||
when compiling a module (@pxref{Compilation}). Changing the order
|
||
of association can enable GCC to optimize the floating-point
|
||
operations better.
|
||
|
||
In all these examples, the four function calls can be done in any
|
||
order. There is no right or wrong about that.
|
||
|
||
@node Sequence Points
|
||
@section Sequence Points
|
||
@cindex sequence points
|
||
@cindex full expression
|
||
|
||
There are some points in the code where C makes limited guarantees
|
||
about the order of operations. These are called @dfn{sequence
|
||
points}. Here is where they occur:
|
||
|
||
@itemize @bullet
|
||
@item
|
||
At the end of a @dfn{full expression}; that is to say, an expression
|
||
that is not part of a larger expression. All side effects specified
|
||
by that expression are carried out before execution moves
|
||
on to subsequent code.
|
||
|
||
@item
|
||
At the end of the first operand of certain operators: @samp{,},
|
||
@samp{&&}, @samp{||}, and @samp{?:}. All side effects specified by
|
||
that expression are carried out before any execution of the
|
||
next operand.
|
||
|
||
The commas that separate arguments in a function call are @emph{not}
|
||
comma operators, and they do not create sequence points. The
|
||
sequence-point rule for function arguments and the rule for operands
|
||
(@pxref{Ordering of Operands}) are different.
|
||
|
||
@item
|
||
Just before calling a function. All side effects specified by the
|
||
argument expressions are carried out before calling the function.
|
||
|
||
If the function to be called is not constant---that is, if it is
|
||
computed by an expression---all side effects in that expression are
|
||
carried out before calling the function.
|
||
@end itemize
|
||
|
||
The ordering imposed by a sequence point applies locally to a limited
|
||
range of code, as stated above in each case. For instance, the
|
||
ordering imposed by the comma operator does not apply to code outside
|
||
the operands of that comma operator. Thus, in this code,
|
||
|
||
@example
|
||
(x = 5, foo (x)) + x * x
|
||
@end example
|
||
|
||
@noindent
|
||
the sequence point of the comma operator orders @code{x = 5} before
|
||
@code{foo (x)}, but @code{x * x} could be computed before or after
|
||
them.
|
||
|
||
@node Postincrement and Ordering
|
||
@section Postincrement and Ordering
|
||
@cindex postincrement and ordering
|
||
@cindex ordering and postincrement
|
||
|
||
The ordering requirements for the postincrement and postdecrement
|
||
operations (@pxref{Postincrement/Postdecrement}) are loose: those side
|
||
effects must happen ``a little later,'' before the next sequence
|
||
point. That still leaves room for various orders that give different
|
||
results. In this expression,
|
||
|
||
@example
|
||
z = x++ - foo ()
|
||
@end example
|
||
|
||
@noindent
|
||
it's unpredictable whether @code{x} gets incremented before or after
|
||
calling the function @code{foo}. If @code{foo} refers to @code{x},
|
||
it might see the old value or it might see the incremented value.
|
||
|
||
In this perverse expression,
|
||
|
||
@example
|
||
x = x++
|
||
@end example
|
||
|
||
@noindent
|
||
@code{x} will certainly be incremented but the incremented value may
|
||
be replaced with the old value. That's because the incrementation and
|
||
the assignment may occur in either order. If the incrementation of
|
||
@code{x} occurs after the assignment to @code{x}, the incremented
|
||
value will remain in place. But if the incrementation happens first,
|
||
the assignment will put the not-yet-incremented value back into
|
||
@code{x}, so the expression as a whole will leave @code{x} unchanged.
|
||
|
||
The conclusion: @strong{avoid such expressions}. Take care, when you
|
||
use postincrement and postdecrement, that the specific expression you
|
||
use is not ambiguous as to order of execution.
|
||
|
||
@node Ordering of Operands
|
||
@section Ordering of Operands
|
||
@cindex ordering of operands
|
||
@cindex operand ordering
|
||
|
||
Operands and arguments can be computed in any order, but there are limits to
|
||
this intermixing in GNU C:
|
||
|
||
@itemize @bullet
|
||
@item
|
||
The operands of a binary arithmetic operator can be computed in either
|
||
order, but they can't be intermixed: one of them has to come first,
|
||
followed by the other. Any side effects in the operand that's computed
|
||
first are executed before the other operand is computed.
|
||
|
||
@item
|
||
That applies to assignment operators too, except that, in simple assignment,
|
||
the previous value of the left operand is unused.
|
||
|
||
@item
|
||
The arguments in a function call can be computed in any order, but
|
||
they can't be intermixed. Thus, one argument is fully computed, then
|
||
another, and so on until they have all been done. Any side effects in
|
||
one argument are executed before computation of another argument
|
||
begins.
|
||
@end itemize
|
||
|
||
These rules don't cover side effects caused by postincrement and
|
||
postdecrement operators---those can be deferred up to the next
|
||
sequence point.
|
||
|
||
If you want to get pedantic, the fact is that GCC can reorder the
|
||
computations in many other ways provided that it doesn't alter the result
|
||
of running the program. However, because it doesn't alter the result
|
||
of running the program, it is negligible, unless you are concerned
|
||
with the values in certain variables at various times as seen by other
|
||
processes. In those cases, you should use @code{volatile} to prevent
|
||
optimizations that would make them behave strangely. @xref{volatile}.
|
||
|
||
@node Optimization and Ordering
|
||
@section Optimization and Ordering
|
||
@cindex optimization and ordering
|
||
@cindex ordering and optimization
|
||
|
||
Sequence points limit the compiler's freedom to reorder operations
|
||
arbitrarily, but optimizations can still reorder them if the compiler
|
||
concludes that this won't alter the results. Thus, in this code,
|
||
|
||
@example
|
||
x++;
|
||
y = z;
|
||
x++;
|
||
@end example
|
||
|
||
@noindent
|
||
there is a sequence point after each statement, so the code is
|
||
supposed to increment @code{x} once before the assignment to @code{y}
|
||
and once after. However, incrementing @code{x} has no effect on
|
||
@code{y} or @code{z}, and setting @code{y} can't affect @code{x}, so
|
||
the code could be optimized into this:
|
||
|
||
@example
|
||
y = z;
|
||
x += 2;
|
||
@end example
|
||
|
||
Normally that has no effect except to make the program faster. But
|
||
there are special situations where it can cause trouble due to things
|
||
that the compiler cannot know about, such as shared memory. To limit
|
||
optimization in those places, use the @code{volatile} type qualifier
|
||
(@pxref{volatile}).
|
||
|
||
@node Primitive Types
|
||
@chapter Primitive Data Types
|
||
@cindex primitive types
|
||
@cindex types, primitive
|
||
|
||
This chapter describes all the primitive data types of C---that is,
|
||
all the data types that aren't built up from other types. They
|
||
include the types @code{int} and @code{double} that we've already covered.
|
||
|
||
@menu
|
||
* Integer Types:: Description of integer types.
|
||
* Floating-Point Data Types:: Description of floating-point types.
|
||
* Complex Data Types:: Description of complex number types.
|
||
* The Void Type:: A type indicating no value at all.
|
||
* Other Data Types:: A brief summary of other types.
|
||
* Type Designators:: Referring to a data type abstractly.
|
||
@end menu
|
||
|
||
These types are all made up of bytes (@pxref{Storage}).
|
||
|
||
@node Integer Types
|
||
@section Integer Data Types
|
||
@cindex integer types
|
||
@cindex types, integer
|
||
|
||
Here we describe all the integer types and their basic
|
||
characteristics. @xref{Integers in Depth}, for more information about
|
||
the bit-level integer data representations and arithmetic.
|
||
|
||
@menu
|
||
* Basic Integers:: Overview of the various kinds of integers.
|
||
* Signed and Unsigned Types:: Integers can either hold both negative and
|
||
non-negative values, or only non-negative.
|
||
* Narrow Integers:: When to use smaller integer types.
|
||
* Integer Conversion:: Casting a value from one integer type
|
||
to another.
|
||
* Boolean Type:: An integer type for boolean values.
|
||
* Integer Variations:: Sizes of integer types can vary
|
||
across platforms.
|
||
@end menu
|
||
|
||
@node Basic Integers
|
||
@subsection Basic Integers
|
||
|
||
@findex char
|
||
@findex int
|
||
@findex short int
|
||
@findex long int
|
||
@findex long long int
|
||
|
||
Integer data types in C can be signed or unsigned. An unsigned type
|
||
can represent only positive numbers and zero. A signed type can
|
||
represent both positive and negative numbers, in a range spread almost
|
||
equally on both sides of zero.
|
||
|
||
Aside from signedness, the integer data types vary in size: how many
|
||
bytes long they are. The size determines the range of integer values
|
||
the type can hold.
|
||
|
||
Here's a list of the signed integer data types, with the sizes they
|
||
have on most computers. Each has a corresponding unsigned type; see
|
||
@ref{Signed and Unsigned Types}.
|
||
|
||
@table @code
|
||
@item signed char
|
||
One byte (8 bits). This integer type is used mainly for integers that
|
||
represent characters, usually as elements of arrays or fields of other
|
||
data structures.
|
||
|
||
@item short
|
||
@itemx short int
|
||
Two bytes (16 bits).
|
||
|
||
@item int
|
||
Four bytes (32 bits).
|
||
|
||
@item long
|
||
@itemx long int
|
||
Four bytes (32 bits) or eight bytes (64 bits), depending on the
|
||
platform. Typically it is 32 bits on 32-bit computers
|
||
and 64 bits on 64-bit computers, but there are exceptions.
|
||
|
||
@item long long
|
||
@itemx long long int
|
||
Eight bytes (64 bits). Supported in GNU C in the 1980s, and
|
||
incorporated into standard C as of ISO C99.
|
||
@end table
|
||
|
||
You can omit @code{int} when you use @code{long} or @code{short}.
|
||
This is harmless and customary.
|
||
|
||
@node Signed and Unsigned Types
|
||
@subsection Signed and Unsigned Types
|
||
@cindex signed types
|
||
@cindex unsigned types
|
||
@cindex types, signed
|
||
@cindex types, unsigned
|
||
@findex signed
|
||
@findex unsigned
|
||
|
||
An unsigned integer type can represent only positive numbers and zero.
|
||
A signed type can represent both positive and negative numbers, in a
|
||
range spread almost equally on both sides of zero. For instance,
|
||
@code{unsigned char} holds numbers from 0 to 255 (on most computers),
|
||
while @code{signed char} holds numbers from @minus{}128 to 127. Each of
|
||
these types holds 256 different possible values, since they are both 8
|
||
bits wide.
|
||
|
||
Write @code{signed} or @code{unsigned} before the type keyword to
|
||
specify a signed or an unsigned type. However, the integer types
|
||
other than @code{char} are signed by default; with them, @code{signed}
|
||
is a no-op.
|
||
|
||
Plain @code{char} may be signed or unsigned; this depends on the
|
||
compiler, the machine in use, and its operating system. It is not
|
||
@emph{the same type} as either @code{signed char} or @code{unsigned
|
||
char}, but it is always equivalent to one of those two.
|
||
|
||
In many programs, it makes no difference whether the type @code{char}
|
||
is signed. When signedness does matter for a certain value, don't
|
||
leave it to chance; declare it as @code{signed char} or @code{unsigned
|
||
char} instead.@footnote{Personal note from Richard Stallman: Eating
|
||
with hackers at a fish restaurant, I ordered arctic char. When my
|
||
meal arrived, I noted that the chef had not signed it. So I told
|
||
other hackers, ``This char is unsigned---I wanted a signed char!''}
|
||
|
||
@node Narrow Integers
|
||
@subsection Narrow Integers
|
||
|
||
The types that are narrower than @code{int} are rarely used for
|
||
ordinary variables---we declare them @code{int} instead. This is
|
||
because C converts those narrower types to @code{int} for any
|
||
arithmetic. There is literally no reason to declare a local variable
|
||
@code{char}, for instance.
|
||
|
||
In particular, if the value is really a character, you should declare
|
||
the variable @code{int}. Not @code{char}! Using that narrow type can
|
||
force the compiled code to truncate values to @code{char} before
|
||
conversion, which is a waste. Furthermore, some functions return
|
||
either a character value or @minus{}1 for ``no character.'' Using
|
||
type @code{int} makes it possible to distinguish @minus{}1 from any
|
||
character, by sign.
|
||
|
||
The narrow integer types are useful as parts of other objects, such as
|
||
arrays and structures. Compare these array declarations, whose sizes
|
||
on 32-bit processors are shown:
|
||
|
||
@example
|
||
signed char ac[1000]; /* @r{1000 bytes} */
|
||
short as[1000]; /* @r{2000 bytes} */
|
||
int ai[1000]; /* @r{4000 bytes} */
|
||
long long all[1000]; /* @r{8000 bytes} */
|
||
@end example
|
||
|
||
In addition, character strings must be made up of @code{char}s,
|
||
because that's what all the standard library string functions expect.
|
||
Thus, array @code{ac} could be used as a character string, but the
|
||
others could not be.
|
||
|
||
@node Integer Conversion
|
||
@subsection Conversion among Integer Types
|
||
|
||
C converts between integer types implicitly in many situations. It
|
||
converts the narrow integer types, @code{char} and @code{short}, to
|
||
@code{int} whenever they are used in arithmetic. Assigning a new
|
||
value to an integer variable (or other lvalue) converts the value to
|
||
the variable's type.
|
||
|
||
You can also convert one integer type to another explicitly with a
|
||
@dfn{cast} operator. @xref{Explicit Type Conversion}.
|
||
|
||
The process of conversion to a wider type is straightforward: the
|
||
value is unchanged. The only exception is when converting a negative
|
||
value (in a signed type, obviously) to a wider unsigned type. In that
|
||
case, the result is a positive value with the same bits
|
||
(@pxref{Integers in Depth}), padded on the left with zeros.
|
||
|
||
@cindex truncation
|
||
Converting to a narrower type, also called @dfn{truncation}, involves
|
||
discarding some of the value's bits. This is not considered overflow
|
||
(@pxref{Integer Overflow}) because loss of significant bits is a
|
||
normal consequence of truncation. Likewise for conversion between
|
||
signed and unsigned types of the same width.
|
||
|
||
More information about conversion for assignment is in
|
||
@ref{Assignment Type Conversions}. For conversion for arithmetic,
|
||
see @ref{Argument Promotions}.
|
||
|
||
@node Boolean Type
|
||
@subsection Boolean Type
|
||
@cindex boolean type
|
||
@cindex type, boolean
|
||
@findex bool
|
||
|
||
The unsigned integer type @code{bool} holds truth values: its possible
|
||
values are 0 and 1. Converting any nonzero value to @code{bool}
|
||
results in 1. For example:
|
||
|
||
@example
|
||
bool a = 0;
|
||
bool b = 1;
|
||
bool c = 4; /* @r{Stores the value 1 in @code{c}.} */
|
||
@end example
|
||
|
||
Unlike @code{int}, @code{bool} is not a keyword. It is defined in
|
||
the header file @file{stdbool.h}.
|
||
|
||
@node Integer Variations
|
||
@subsection Integer Variations
|
||
|
||
The integer types of C have standard @emph{names}, but what they
|
||
@emph{mean} varies depending on the kind of platform in use:
|
||
which kind of computer, which operating system, and which compiler.
|
||
It may even depend on the compiler options used.
|
||
|
||
Plain @code{char} may be signed or unsigned; this depends on the
|
||
platform, too. Even for GNU C, there is no general rule.
|
||
|
||
In theory, all of the integer types' sizes can vary. @code{char} is
|
||
always considered one ``byte'' for C, but it is not necessarily an
|
||
8-bit byte; on some platforms it may be more than 8 bits. @code{short
|
||
int} and @code{int} are at least two bytes long (it may be longer).
|
||
@code{long int} is at least four bytes long, and @code{long long int}
|
||
at least eight bytes long.
|
||
|
||
It is possible that in the future GNU C will support platforms where
|
||
@code{int} is 64 bits long. In practice, however, on today's real
|
||
computers, there is little variation; you can rely on the table
|
||
given previously (@pxref{Basic Integers}).
|
||
|
||
To be completely sure of the size of an integer type,
|
||
use the types @code{int16_t}, @code{int32_t} and @code{int64_t}.
|
||
Their corresponding unsigned types add @samp{u} at the front:
|
||
@code{uint16_t}, @code{uint32_t} and @code{uint64_t}.
|
||
To define all these types, include the header file @file{stdint.h}.
|
||
|
||
The GNU C Compiler can compile for some embedded controllers that use two
|
||
bytes for @code{int}. On some, @code{int} is just one ``byte,'' and
|
||
so is @code{short int}---but that ``byte'' may contain 16 bits or even
|
||
32 bits. These processors can't support an ordinary operating system
|
||
(they may have their own specialized operating systems), and most C
|
||
programs do not try to support them.
|
||
|
||
@node Floating-Point Data Types
|
||
@section Floating-Point Data Types
|
||
@cindex floating-point types
|
||
@cindex types, floating-point
|
||
@findex double
|
||
@findex float
|
||
@findex long double
|
||
|
||
@dfn{Floating point} is the binary analogue of scientific notation:
|
||
internally it represents a number as a fraction and a binary exponent;
|
||
the value is that fraction multiplied by the specified power of 2.
|
||
(The C standard nominally permits other bases, but in GNU C the base
|
||
is always 2.)
|
||
@c ???
|
||
|
||
For instance, to represent 6, the fraction would be 0.75 and the
|
||
exponent would be 3; together they stand for the value @math{0.75 * 2@sup{3}},
|
||
meaning 0.75 * 8. The value 1.5 would use 0.75 as the fraction and 1
|
||
as the exponent. The value 0.75 would use 0.75 as the fraction and 0
|
||
as the exponent. The value 0.375 would use 0.75 as the fraction and
|
||
@minus{}1 as the exponent.
|
||
|
||
These binary exponents are used by machine instructions. You can
|
||
write a floating-point constant this way if you wish, using
|
||
hexadecimal; but normally we write floating-point numbers in decimal (base 10).
|
||
@xref{Floating Constants}.
|
||
|
||
C has three floating-point data types:
|
||
|
||
@table @code
|
||
@item double
|
||
``Double-precision'' floating point, which uses 64 bits. This is the
|
||
normal floating-point type, and modern computers normally do
|
||
their floating-point computations in this type, or some wider type.
|
||
Except when there is a special reason to do otherwise, this is the
|
||
type to use for floating-point values.
|
||
|
||
@item float
|
||
``Single-precision'' floating point, which uses 32 bits. It is useful
|
||
for floating-point values stored in structures and arrays, to save
|
||
space when the full precision of @code{double} is not needed. In
|
||
addition, single-precision arithmetic is faster on some computers, and
|
||
occasionally that is useful. But not often---most programs don't use
|
||
the type @code{float}.
|
||
|
||
C would be cleaner if @code{float} were the name of the type we
|
||
use for most floating-point values; however, for historical reasons,
|
||
that's not so.
|
||
|
||
@item long double
|
||
``Extended-precision'' floating point is either 80-bit or 128-bit
|
||
precision, depending on the machine in use. On some machines, which
|
||
have no floating-point format wider than @code{double}, this is
|
||
equivalent to @code{double}.
|
||
@end table
|
||
|
||
Floating-point arithmetic raises many subtle issues. @xref{Floating
|
||
Point in Depth}, for more information.
|
||
|
||
@node Complex Data Types
|
||
@section Complex Data Types
|
||
@cindex complex numbers
|
||
@cindex types, complex
|
||
@cindex @code{_Complex} keyword
|
||
@cindex @code{__complex__} keyword
|
||
@findex _Complex
|
||
@findex __complex__
|
||
|
||
Complex numbers can include both a real part and an imaginary part.
|
||
The numeric constants covered above have real-numbered values. An
|
||
imaginary-valued constant is an ordinary real-valued constant followed
|
||
by @samp{i}.
|
||
|
||
To declare numeric variables as complex, use the @code{_Complex}
|
||
keyword.@footnote{For compatibility with older versions of GNU C, the
|
||
keyword @code{__complex__} is also allowed. Going forward, however,
|
||
use the new @code{_Complex} keyword as defined in ISO C11.} The
|
||
standard C complex data types are floating point,
|
||
|
||
@example
|
||
_Complex float foo;
|
||
_Complex double bar;
|
||
_Complex long double quux;
|
||
@end example
|
||
|
||
@noindent
|
||
but GNU C supports integer complex types as well.
|
||
|
||
Since @code{_Complex} is a keyword just like @code{float} and
|
||
@code{double} and @code{long}, the keywords can appear in any order,
|
||
but the order shown above seems most logical.
|
||
|
||
GNU C supports constants for complex values; for instance, @code{4.0 +
|
||
3.0i} has the value 4 + 3i as type @code{_Complex double}.
|
||
@samp{j} is equivalent to @samp{i}, as a numeric suffix.
|
||
@xref{Imaginary Constants}.
|
||
|
||
To pull the real and imaginary parts of the number back out, GNU C
|
||
provides the keywords @code{__real__} and @code{__imag__}:
|
||
|
||
@example
|
||
_Complex double foo = 4.0 + 3.0i;
|
||
|
||
double a = __real__ foo; /* @r{@code{a} is now 4.0.} */
|
||
double b = __imag__ foo; /* @r{@code{b} is now 3.0.} */
|
||
@end example
|
||
|
||
@noindent
|
||
Standard C does not include these keywords, and instead relies on
|
||
functions defined in @code{complex.h} for accessing the real and
|
||
imaginary parts of a complex number: @code{crealf}, @code{creal}, and
|
||
@code{creall} extract the real part of a float, double, or long double
|
||
complex number, respectively; @code{cimagf}, @code{cimag}, and
|
||
@code{cimagl} extract the imaginary part.
|
||
|
||
@cindex complex conjugation
|
||
GNU C also defines @samp{~} as an operator for complex conjugation,
|
||
which means negating the imaginary part of a complex number:
|
||
|
||
@example
|
||
_Complex double foo = 4.0 + 3.0i;
|
||
_Complex double bar = ~foo; /* @r{@code{bar} is now 4.0 @minus{} 3.0i.} */
|
||
@end example
|
||
|
||
@noindent
|
||
For standard C compatibility, you can use the appropriate library
|
||
function: @code{conjf}, @code{conj}, or @code{conjl}.
|
||
|
||
@node The Void Type
|
||
@section The Void Type
|
||
@cindex void type
|
||
@cindex type, void
|
||
@findex void
|
||
|
||
The data type @code{void} is a dummy---it allows no operations. It
|
||
really means ``no value at all.'' When a function is meant to return
|
||
no value, we write @code{void} for its return type. Then
|
||
@code{return} statements in that function should not specify a value
|
||
(@pxref{return Statement}). Here's an example:
|
||
|
||
@example
|
||
void
|
||
print_if_positive (double x, double y)
|
||
@{
|
||
if (x <= 0)
|
||
return;
|
||
if (y <= 0)
|
||
return;
|
||
printf ("Next point is (%f,%f)\n", x, y);
|
||
@}
|
||
@end example
|
||
|
||
A @code{void}-returning function is comparable to what some other
|
||
languages (for instance, Fortran and Pascal) call a ``procedure''
|
||
instead of a ``function.''
|
||
|
||
@c ??? Already presented
|
||
@c @samp{%f} in an output template specifies to format a @code{double} value
|
||
@c as a decimal number, using a decimal point if needed.
|
||
|
||
@node Other Data Types
|
||
@section Other Data Types
|
||
|
||
Beyond the primitive types, C provides several ways to construct new
|
||
data types. For instance, you can define @dfn{pointers}, values that
|
||
represent the addresses of other data (@pxref{Pointers}). You can
|
||
define @dfn{structures}, as in many other languages
|
||
(@pxref{Structures}), and @dfn{unions}, which define multiple ways to
|
||
interpret the contents of the same memory space (@pxref{Unions}).
|
||
@dfn{Enumerations} are collections of named integer codes
|
||
(@pxref{Enumeration Types}).
|
||
|
||
@dfn{Array types} in C are used for allocating space for objects,
|
||
but C does not permit operating on an array value as a whole. @xref{Arrays}.
|
||
|
||
@node Type Designators
|
||
@section Type Designators
|
||
@cindex type designator
|
||
|
||
Some C constructs require a way to designate a specific data type
|
||
independent of any particular variable or expression which has that
|
||
type. The way to do this is with a @dfn{type designator}. The
|
||
constructs that need one include casts (@pxref{Explicit Type
|
||
Conversion}) and @code{sizeof} (@pxref{Type Size}).
|
||
|
||
We also use type designators to talk about the type of a value in C,
|
||
so you will see many type designators in this manual. When we say,
|
||
``The value has type @code{int},'' @code{int} is a type designator.
|
||
|
||
To make the designator for any type, imagine a variable declaration
|
||
for a variable of that type and delete the variable name and the final
|
||
semicolon.
|
||
|
||
@c ??? Is the rest of this so obvious it can be shortened?
|
||
For example, to designate the type of full-word integers, we start
|
||
with the declaration for a variable @code{foo} with that type,
|
||
which is this:
|
||
|
||
@example
|
||
int foo;
|
||
@end example
|
||
|
||
@noindent
|
||
Then we delete the variable name @code{foo} and the semicolon, leaving
|
||
@code{int}---exactly the keyword used in such a declaration.
|
||
Therefore, the type designator for this type is @code{int}.
|
||
|
||
What about long unsigned integers? From the declaration
|
||
|
||
@example
|
||
unsigned long int foo;
|
||
@end example
|
||
|
||
@noindent
|
||
we determine that the designator is @code{unsigned long int}.
|
||
|
||
Following this procedure, the designator for any primitive type is
|
||
simply the set of keywords which specifies that type in a declaration.
|
||
The same is true for structure types, union types, and
|
||
enumeration types.
|
||
|
||
@c ??? This graf is needed.
|
||
|
||
Designators for pointer types do follow the rule of deleting the
|
||
variable name and semicolon, but the result is not so simple.
|
||
@xref{Pointer Type Designators}, as part of the chapter about
|
||
pointers. @xref{Array Type Designators}, for designators for array
|
||
types.
|
||
|
||
To understand what type a designator stands for, imagine a variable
|
||
name inserted into the right place in the designator to make a valid
|
||
declaration. What type would that variable be declared as? That is the
|
||
type the designator designates.
|
||
|
||
@node Constants
|
||
@chapter Constants
|
||
@cindex constants
|
||
|
||
A @dfn{constant} is an expression that stands for a specific value by
|
||
explicitly representing the desired value. C allows constants for
|
||
numbers, characters, and strings. We have already seen numeric and
|
||
string constants in the examples.
|
||
|
||
@menu
|
||
* Integer Constants:: Literal integer values.
|
||
* Integer Const Type:: Types of literal integer values.
|
||
* Floating Constants:: Literal floating-point values.
|
||
* Imaginary Constants:: Literal imaginary number values.
|
||
* Invalid Numbers:: Avoiding preprocessing number misconceptions.
|
||
* Character Constants:: Literal character values.
|
||
* String Constants:: Literal string values.
|
||
* UTF-8 String Constants:: Literal UTF-8 string values.
|
||
* Unicode Character Codes:: Unicode characters represented
|
||
in either UTF-16 or UTF-32.
|
||
* Wide Character Constants:: Literal characters values larger than 8 bits.
|
||
* Wide String Constants:: Literal string values made up of
|
||
16- or 32-bit characters.
|
||
@end menu
|
||
|
||
@node Integer Constants
|
||
@section Integer Constants
|
||
@cindex integer constants
|
||
@cindex constants, integer
|
||
|
||
An integer constant consists of a number to specify the value,
|
||
followed optionally by suffix letters to specify the data type.
|
||
|
||
The simplest integer constants are numbers written in base 10
|
||
(decimal), such as @code{5}, @code{77}, and @code{403}. A decimal
|
||
constant cannot start with the character @samp{0} (zero) because
|
||
that makes the constant octal.
|
||
|
||
You can get the effect of a negative integer constant by putting a
|
||
minus sign at the beginning. In grammatical terms, that is an
|
||
arithmetic expression rather than a constant, but it behaves just like
|
||
a true constant.
|
||
|
||
Integer constants can also be written in octal (base 8), hexadecimal
|
||
(base 16), or binary (base 2). An octal constant starts with the
|
||
character @samp{0} (zero), followed by any number of octal digits
|
||
(@samp{0} to @samp{7}):
|
||
|
||
@example
|
||
0 // @r{zero}
|
||
077 // @r{63}
|
||
0403 // @r{259}
|
||
@end example
|
||
|
||
@noindent
|
||
Pedantically speaking, the constant @code{0} is an octal constant, but
|
||
we can think of it as decimal; it has the same value either way.
|
||
|
||
A hexadecimal constant starts with @samp{0x} (upper or lower case)
|
||
followed by hex digits (@samp{0} to @samp{9}, as well as @samp{a}
|
||
through @samp{f} in upper or lower case):
|
||
|
||
@example
|
||
0xff // @r{255}
|
||
0XA0 // @r{160}
|
||
0xffFF // @r{65535}
|
||
@end example
|
||
|
||
@cindex binary integer constants
|
||
A binary constant starts with @samp{0b} (upper or lower case) followed
|
||
by bits (each represented by the characters @samp{0} or @samp{1}):
|
||
|
||
@example
|
||
0b101 // @r{5}
|
||
@end example
|
||
|
||
@noindent
|
||
Binary constants are a GNU C extension, not part of the C standard.
|
||
|
||
Sometimes a space is needed after an integer constant to avoid
|
||
lexical confusion with the following tokens. @xref{Invalid Numbers}.
|
||
|
||
@node Integer Const Type
|
||
@section Integer Constant Data Types
|
||
@cindex integer constant data types
|
||
@cindex constant data types, integer
|
||
@cindex types of integer constants
|
||
|
||
The type of an integer constant is normally @code{int}, if the value
|
||
fits in that type, but here are the complete rules. The type
|
||
of an integer constant is the first one in this sequence that can
|
||
properly represent the value,
|
||
|
||
@enumerate
|
||
@item
|
||
@code{int}
|
||
@item
|
||
@code{unsigned int}
|
||
@item
|
||
@code{long int}
|
||
@item
|
||
@code{unsigned long int}
|
||
@item
|
||
@code{long long int}
|
||
@item
|
||
@code{unsigned long long int}
|
||
@end enumerate
|
||
|
||
@noindent
|
||
and that isn't excluded by the following rules.
|
||
|
||
If the constant has @samp{l} or @samp{L} as a suffix, that excludes the
|
||
first two types (those that are not @code{long}).
|
||
|
||
If the constant has @samp{ll} or @samp{LL} as a suffix, that excludes
|
||
first four types (those that are not @code{long long}).
|
||
|
||
If the constant has @samp{u} or @samp{U} as a suffix, that excludes
|
||
the signed types.
|
||
|
||
Otherwise, if the constant is decimal (not binary, octal, or
|
||
hexadecimal), that excludes the unsigned types.
|
||
@c ### This said @code{unsigned int} is excluded.
|
||
@c ### See 17 April 2016
|
||
|
||
Here are some examples of the suffixes.
|
||
|
||
@example
|
||
3000000000u // @r{three billion as @code{unsigned int}.}
|
||
0LL // @r{zero as a @code{long long int}.}
|
||
0403l // @r{259 as a @code{long int}.}
|
||
2147483648 // @r{This is of type @code{long long int}.}
|
||
// @r{on typical 32-bit machines,}
|
||
// @r{since it won't fit in 32 bits as a signed number.}
|
||
2147483648U // @r{This is of type @code{unsigned int},}
|
||
// @r{since it fits in 32 unsigned bits.}
|
||
@end example
|
||
|
||
Suffixes in integer constants are rarely used. When the precise type
|
||
is important, it is cleaner to convert explicitly (@pxref{Explicit
|
||
Type Conversion}).
|
||
|
||
@xref{Integer Types}.
|
||
|
||
@node Floating Constants
|
||
@section Floating-Point Constants
|
||
@cindex floating-point constants
|
||
@cindex constants, floating-point
|
||
|
||
A floating-point decimal constant must have either a decimal point, an
|
||
exponent-of-ten, or both; they distinguish it from an integer
|
||
constant. Just adding the floating-point suffix, @samp{f}, to an
|
||
integer does not make a valid floating-point constant, and adding
|
||
@samp{l} would instead make it a long integer.
|
||
|
||
To indicate an exponent, write @samp{e} or @samp{E}. The exponent
|
||
value follows. It is always written as a decimal number; it can
|
||
optionally start with a sign. The exponent @var{n} means to multiply
|
||
the constant's value by ten to the @var{n}th power.
|
||
|
||
Thus, @samp{1500.0}, @samp{15e2}, @samp{15e+2}, @samp{15.0e2},
|
||
@samp{1.5e+3}, @samp{.15e4}, and @samp{15000e-1} are six ways of
|
||
writing a floating-point number whose value is 1500. They are all
|
||
equivalent in principle.
|
||
@c ??? Are the resulting valus guaranteed to be equal
|
||
@c ??? in GCC for the targets that we describe in this manual?
|
||
|
||
Here are more examples with decimal points:
|
||
|
||
@example
|
||
1.0
|
||
1000.
|
||
3.14159
|
||
.05
|
||
.0005
|
||
@end example
|
||
|
||
For each of them, here are some equivalent constants written with
|
||
exponents:
|
||
|
||
@example
|
||
1e0, 1.0000e0
|
||
100e1, 100e+1, 100E+1, 1e3, 10000e-1
|
||
3.14159e0
|
||
5e-2, .0005e+2, 5E-2, .0005E2
|
||
.05e-2
|
||
@end example
|
||
|
||
A floating-point constant normally has type @code{double}. You can
|
||
force it to type @code{float} by adding @samp{f} or @samp{F}
|
||
at the end. For example,
|
||
|
||
@example
|
||
3.14159f
|
||
3.14159e0f
|
||
1000.f
|
||
100E1F
|
||
.0005f
|
||
.05e-2f
|
||
@end example
|
||
|
||
Likewise, @samp{l} or @samp{L} at the end forces the constant
|
||
to type @code{long double}.
|
||
|
||
@cindex hexadecimal floating constants
|
||
There are also @dfn{hexadecimal floating constants}. These
|
||
@emph{must} have an exponent, but since @samp{e} would be interpreted
|
||
as a hexadecimal digit, the character @samp{p} or @samp{P} (for
|
||
``power'') indicates the exponent.
|
||
|
||
The exponent in a hexadecimal floating constant is an optionally signed
|
||
decimal integer that specifies a power of 2 (@emph{not} 10 or 16) to
|
||
multiply into the number.
|
||
|
||
Here are some examples:
|
||
|
||
@example
|
||
@group
|
||
0xAp2 // @r{40 in decimal}
|
||
0xAp-1 // @r{5 in decimal}
|
||
0x2.0Bp4 // @r{32.6875 decimal}
|
||
0xE.2p3 // @r{113 decimal}
|
||
0x123.ABCp0 // @r{291.6708984375 in decimal}
|
||
0x123.ABCp4 // @r{4666.734375 in decimal}
|
||
0x100p-8 // @r{1}
|
||
0x10p-4 // @r{1}
|
||
0x1p+4 // @r{16}
|
||
0x1p+8 // @r{256}
|
||
@end group
|
||
@end example
|
||
|
||
@xref{Floating-Point Data Types}.
|
||
|
||
@node Imaginary Constants
|
||
@section Imaginary Constants
|
||
@cindex imaginary constants
|
||
@cindex complex constants
|
||
@cindex constants, imaginary
|
||
|
||
A complex number consists of a real part plus an imaginary part. (You
|
||
may omit one part if it is zero.) This section explains how to write
|
||
numeric constants with imaginary values. By adding these to ordinary
|
||
real-valued numeric constants, we can make constants with complex
|
||
values.
|
||
|
||
The simple way to write an imaginary-number constant is to attach the
|
||
suffix @samp{i} or @samp{I}, or @samp{j} or @samp{J}, to an integer or
|
||
floating-point constant. For example, @code{2.5fi} has type
|
||
@code{_Complex float} and @code{3i} has type @code{_Complex int}.
|
||
The four alternative suffix letters are all equivalent.
|
||
|
||
@cindex _Complex_I
|
||
The other way to write an imaginary constant is to multiply a real
|
||
constant by @code{_Complex_I}, which represents the imaginary number
|
||
i. Standard C doesn't support suffixes for imaginary constants, so
|
||
this clunky method is needed.
|
||
|
||
To write a complex constant with a nonzero real part and a nonzero
|
||
imaginary part, write the two separately and add them, like this:
|
||
|
||
@example
|
||
4.0 + 3.0i
|
||
@end example
|
||
|
||
@noindent
|
||
That gives the value 4 + 3i, with type @code{_Complex double}.
|
||
|
||
Such a sum can include multiple real constants, or none. Likewise, it
|
||
can include multiple imaginary constants, or none. For example:
|
||
|
||
@example
|
||
_Complex double foo, bar, quux;
|
||
|
||
foo = 2.0i + 4.0 + 3.0i; /* @r{Imaginary part is 5.0.} */
|
||
bar = 4.0 + 12.0; /* @r{Imaginary part is 0.0.} */
|
||
quux = 3.0i + 15.0i; /* @r{Real part is 0.0.} */
|
||
buux = 3.0i + 15.0j; /* @r{Equal to @code{quux}.} */
|
||
@end example
|
||
|
||
@xref{Complex Data Types}.
|
||
|
||
@node Invalid Numbers
|
||
@section Invalid Numbers
|
||
|
||
Some number-like constructs which are not really valid as numeric
|
||
constants are treated as numbers in preprocessing directives. If
|
||
these constructs appear outside of preprocessing, they are erroneous.
|
||
@xref{Preprocessing Tokens}.
|
||
|
||
Sometimes we need to insert spaces to separate tokens so that they
|
||
won't be combined into a single number-like construct. For example,
|
||
@code{0xE+12} is a preprocessing number that is not a valid numeric
|
||
constant, so it is a syntax error. If what we want is the three
|
||
tokens @code{@w{0xE + 12}}, we have to insert spaces as separators.
|
||
|
||
@node Character Constants
|
||
@section Character Constants
|
||
@cindex character constants
|
||
@cindex constants, character
|
||
@cindex escape sequence
|
||
|
||
A @dfn{character constant} is written with single quotes, as in
|
||
@code{'@var{c}'}. In the simplest case, @var{c} is a single ASCII
|
||
character that the constant should represent. The constant has type
|
||
@code{int}, and its value is the character code of that character.
|
||
For instance, @code{'a'} represents the character code for the letter
|
||
@samp{a}: 97, that is.
|
||
|
||
To put the @samp{'} character (single quote) in the character
|
||
constant, @dfn{escape} it with a backslash (@samp{\}). This character
|
||
constant looks like @code{'\''}. The backslash character here
|
||
functions as an @dfn{escape character}, and such a sequence,
|
||
starting with @samp{\}, is called an @dfn{escape sequence}.
|
||
|
||
To put the @samp{\} character (backslash) in the character constant,
|
||
escape it with @samp{\} (another backslash). This character
|
||
constant looks like @code{'\\'}.
|
||
|
||
@cindex bell character
|
||
@cindex @samp{\a}
|
||
@cindex backspace
|
||
@cindex @samp{\b}
|
||
@cindex tab (ASCII character)
|
||
@cindex @samp{\t}
|
||
@cindex vertical tab
|
||
@cindex @samp{\v}
|
||
@cindex formfeed
|
||
@cindex @samp{\f}
|
||
@cindex newline
|
||
@cindex @samp{\n}
|
||
@cindex return (ASCII character)
|
||
@cindex @samp{\r}
|
||
@cindex escape (ASCII character)
|
||
@cindex @samp{\e}
|
||
Here are all the escape sequences that represent specific characters
|
||
in a character constant. The numeric values shown are the
|
||
corresponding ASCII character codes, as decimal numbers. The comments
|
||
give the characters' conventional or traditional names, as well as the
|
||
appearance for graphical characters.
|
||
|
||
@example
|
||
'\a' @result{} 7 /* @r{alarm, bell, @kbd{CTRL-g}} */
|
||
'\b' @result{} 8 /* @r{backspace, @key{BS}, @kbd{CTRL-h}} */
|
||
'\t' @result{} 9 /* @r{tab, @key{TAB}, @kbd{CTRL-i}} */
|
||
'\n' @result{} 10 /* @r{newline, @kbd{CTRL-j}} */
|
||
'\v' @result{} 11 /* @r{vertical tab, @kbd{CTRL-k}} */
|
||
'\f' @result{} 12 /* @r{formfeed, @kbd{CTRL-l}} */
|
||
'\r' @result{} 13 /* @r{carriage return, @key{RET}, @kbd{CTRL-m}} */
|
||
'\e' @result{} 27 /* @r{escape character, @key{ESC}, @kbd{CTRL-[}} */
|
||
'\\' @result{} 92 /* @r{backslash character, @kbd{\}} */
|
||
'\'' @result{} 39 /* @r{single quote character, @kbd{'}} */
|
||
'\"' @result{} 34 /* @r{double quote character, @kbd{"}} */
|
||
'\?' @result{} 63 /* @r{question mark, @kbd{?}} */
|
||
@end example
|
||
|
||
@samp{\e} is a GNU C extension; to stick to standard C, write
|
||
@samp{\33}. (The number after @samp{\} is octal.) To specify
|
||
a character constant using decimal, use a cast; for instance,
|
||
@code{(unsigned char) 27}.
|
||
|
||
You can also write octal and hex character codes as
|
||
@samp{\@var{octalcode}} or @samp{\x@var{hexcode}}. Decimal is not an
|
||
option here, so octal codes do not need to start with @samp{0}. An
|
||
octal code is limited to three octal digits, and any non-octal
|
||
character terminates it.
|
||
|
||
The character constant's value has type @code{int}. However, the
|
||
character code is treated initially as a @code{char} value, which is
|
||
then converted to @code{int}. If the character code is greater than
|
||
127 (@code{0177} in octal), the resulting @code{int} may be negative
|
||
on a platform where the type @code{char} is 8 bits long and signed.
|
||
|
||
@node String Constants
|
||
@section String Constants
|
||
@cindex string constants
|
||
@cindex constants, string
|
||
|
||
A @dfn{string constant} represents a series of characters. It starts
|
||
with @samp{"} and ends with @samp{"}; in between are the contents of
|
||
the string. Quoting special characters such as @samp{"}, @samp{\} and
|
||
newline in the contents works in string constants as in character
|
||
constants. In a string constant, @samp{'} does not need to be quoted.
|
||
|
||
A string constant defines an array of characters which contains the
|
||
specified characters followed by the null character (code 0). Using
|
||
the string constant is equivalent to using the name of an array with
|
||
those contents. In simple cases, where there are no backslash escape
|
||
sequences, the length in bytes of the string constant is one greater
|
||
than the number of characters written in it.
|
||
|
||
As with any array in C, using the string constant in an expression
|
||
converts the array to a pointer (@pxref{Pointers}) to the array's
|
||
zeroth element (@pxref{Accessing Array Elements}). This pointer will
|
||
have type @code{char *} because it points to an element of type
|
||
@code{char}. @code{char *} is an example of a type designator for a
|
||
pointer type (@pxref{Pointer Type Designators}). That type is used
|
||
for operating on strings generally, not just the strings expressed as
|
||
constants.
|
||
|
||
Thus, the string constant @code{"Foo!"} is almost
|
||
equivalent to declaring an array like this
|
||
|
||
@example
|
||
char string_array_1[] = @{'F', 'o', 'o', '!', '\0' @};
|
||
@end example
|
||
|
||
@noindent
|
||
and then using @code{string_array_1} in the program (which converts it
|
||
to type @code{char *}). There are two differences, however:
|
||
|
||
@itemize @bullet
|
||
@item
|
||
The string constant doesn't define a name for the array.
|
||
|
||
@item
|
||
The string constant is probably stored in a read-only area of memory.
|
||
@end itemize
|
||
|
||
Newlines are not allowed in the text of a string constant. The motive
|
||
for this prohibition is to catch the error of omitting the closing
|
||
@samp{"}. To put a newline in a constant string, write it as
|
||
@samp{\n} in the string constant.
|
||
|
||
A real null character in the source code inside a string constant
|
||
causes a warning. To put a null character in the middle of a string
|
||
constant, write @samp{\0} or @samp{\000}.
|
||
|
||
Consecutive string constants are effectively concatenated. Thus,
|
||
|
||
@example
|
||
"Fo" "o!" @r{is equivalent to} "Foo!"
|
||
@end example
|
||
|
||
This is useful for writing a string containing multiple lines,
|
||
like this:
|
||
|
||
@example
|
||
"This message is so long that it needs more than\n"
|
||
"a single line of text. C does not allow a newline\n"
|
||
"to represent itself in a string constant, so we have to\n"
|
||
"write \\n to put it in the string. For readability of\n"
|
||
"the source code, it is advisable to put line breaks in\n"
|
||
"the source where they occur in the contents of the\n"
|
||
"constant.\n"
|
||
@end example
|
||
|
||
The sequence of a backslash and a newline is ignored anywhere
|
||
in a C program, and that includes inside a string constant.
|
||
Thus, you can write multi-line string constants this way:
|
||
|
||
@example
|
||
"This is another way to put newlines in a string constant\n\
|
||
and break the line after them in the source code."
|
||
@end example
|
||
|
||
@noindent
|
||
However, concatenation is the recommended way to do this.
|
||
|
||
You can also write perverse string constants like this,
|
||
|
||
@example
|
||
"Fo\
|
||
o!"
|
||
@end example
|
||
|
||
@noindent
|
||
but don't do that---write it like this instead:
|
||
|
||
@example
|
||
"Foo!"
|
||
@end example
|
||
|
||
Be careful to avoid passing a string constant to a function that
|
||
modifies the string it receives. The memory where the string constant
|
||
is stored may be read-only, which would cause a fatal @code{SIGSEGV}
|
||
signal that normally terminates the function (@pxref{Signals}). Even
|
||
worse, the memory may not be read-only. Then the function might
|
||
modify the string constant, thus spoiling the contents of other string
|
||
constants that are supposed to contain the same value and are unified
|
||
by the compiler.
|
||
|
||
@node UTF-8 String Constants
|
||
@section UTF-8 String Constants
|
||
@cindex UTF-8 String Constants
|
||
|
||
Writing @samp{u8} immediately before a string constant, with no
|
||
intervening space, means to represent that string in UTF-8 encoding as
|
||
a sequence of bytes. UTF-8 represents ASCII characters with a single
|
||
byte, and represents non-ASCII Unicode characters (codes 128 and up)
|
||
as multibyte sequences. Here is an example of a UTF-8 constant:
|
||
|
||
@example
|
||
u8"A cónstà ñt"
|
||
@end example
|
||
|
||
This constant occupies 13 bytes plus the terminating null,
|
||
because each of the accented letters is a two-byte sequence.
|
||
|
||
Concatenating an ordinary string with a UTF-8 string conceptually
|
||
produces another UTF-8 string. However, if the ordinary string
|
||
contains character codes 128 and up, the results cannot be relied on.
|
||
|
||
@node Unicode Character Codes
|
||
@section Unicode Character Codes
|
||
@cindex Unicode character codes
|
||
@cindex universal character names
|
||
@cindex code point
|
||
|
||
You can specify Unicode characters using escape sequences called
|
||
@dfn{universal character names} that start with @samp{\u} and
|
||
@samp{\U}. They are valid in C for individual character constants,
|
||
inside string constants (@pxref{String Constants}), and even in
|
||
identifiers. These escape sequences include a hexadecimal Unicode
|
||
character code, also called a @dfn{code point} in Unicode terminology.
|
||
|
||
Use the @samp{\u} escape sequence with a 16-bit hexadecimal Unicode
|
||
character code. If the character's numeric code is too big for 16
|
||
bits, use the @samp{\U} escape sequence with a 32-bit hexadecimal
|
||
Unicode character code. Here are some examples.
|
||
|
||
@example
|
||
\u6C34 /* @r{16-bit code (Chinese for ``water''), UTF-16} */
|
||
\U0010ABCD /* @r{32-bit code, UTF-32} */
|
||
@end example
|
||
|
||
@noindent
|
||
One way to use these is in UTF-8 string constants (@pxref{UTF-8 String
|
||
Constants}). For instance, here we use two of them, each preceded by
|
||
a space.
|
||
|
||
@example
|
||
u8"fóó \u6C34 \U0010ABCD"
|
||
@end example
|
||
|
||
You can also use them in wide character constants (@pxref{Wide
|
||
Character Constants}), like this:
|
||
|
||
@example
|
||
u'\u6C34' /* @r{16-bit code (water)} */
|
||
U'\U0010ABCD' /* @r{32-bit code} */
|
||
@end example
|
||
|
||
@noindent
|
||
and in wide string constants (@pxref{Wide String Constants}), like
|
||
this:
|
||
|
||
@example
|
||
u"\u6C34\u706B" /* @r{16-bit codes (water, fire)} */
|
||
U"\U0010ABCD" /* @r{32-bit code} */
|
||
@end example
|
||
|
||
@noindent
|
||
And in an identifier:
|
||
|
||
@example
|
||
int foo\u6C34bar = 0;
|
||
@end example
|
||
|
||
Codes in the range of D800 through DFFF are invalid in universal
|
||
character names. Trying to write them using @samp{\u} causes an
|
||
error. Unicode calls them ``surrogate code points'' and uses them in
|
||
UTF-16 for purposes too specialized to explain here.
|
||
|
||
Codes less than 00A0 are likewise invalid in universal character
|
||
names, and likewise cause errors, except for 0024 (@samp{$}), 0040
|
||
(@samp{@@}), and 0060 (@samp{`}). Characters which can't be
|
||
represented with universal character names can be specified with octal
|
||
or hexadecimal escape sequences (@pxref{Character Constants}).
|
||
|
||
@node Wide Character Constants
|
||
@section Wide Character Constants
|
||
@cindex wide character constants
|
||
@cindex constants, wide character
|
||
|
||
A @dfn{wide character constant} represents characters with more than 8
|
||
bits of character code. This is an obscure feature that we need to
|
||
document but that you probably won't ever use. If you're just
|
||
learning C, you may as well skip this section.
|
||
|
||
The original C wide character constant looks like @samp{L} (upper
|
||
case!) followed immediately by an ordinary character constant (with no
|
||
intervening space). Its data type is @code{wchar_t}, which is an
|
||
alias defined in @file{stddef.h} for one of the standard integer
|
||
types. Depending on the platform, it could be 16 bits or 32 bits. If
|
||
it is 16 bits, these character constants use the UTF-16 form of
|
||
Unicode; if 32 bits, UTF-32.
|
||
|
||
There are also Unicode wide character constants which explicitly
|
||
specify the width. These constants start with @samp{u} or @samp{U}
|
||
instead of @samp{L}. @samp{u} specifies a 16-bit Unicode wide
|
||
character constant, and @samp{U} a 32-bit Unicode wide character
|
||
constant. Their types are, respectively, @code{char16_t} and
|
||
@w{@code{char32_t}}; they are declared in the header file
|
||
@file{uchar.h}. These character constants are valid even if
|
||
@file{uchar.h} is not included, but some uses of them may be
|
||
inconvenient without including it to declare those type names.
|
||
|
||
The character represented in a wide character constant can be an
|
||
ordinary ASCII character. @code{L'a'}, @code{u'a'} and @code{U'a'}
|
||
are all valid, and they are all equal to @code{'a'}.
|
||
|
||
In all three kinds of wide character constants, you can write a
|
||
non-ASCII Unicode character in the constant itself; the constant's
|
||
value is the character's Unicode character code. Or you can specify
|
||
the Unicode character with an escape sequence (@pxref{Unicode
|
||
Character Codes}).
|
||
|
||
@node Wide String Constants
|
||
@section Wide String Constants
|
||
@cindex wide string constants
|
||
@cindex constants, wide string
|
||
|
||
A @dfn{wide string constant} stands for an array of 16-bit or 32-bit
|
||
characters. They are rarely used; if you're just
|
||
learning C, you may as well skip this section.
|
||
|
||
There are three kinds of wide string constants, which differ in the
|
||
data type used for each character in the string. Each wide string
|
||
constant is equivalent to an array of integers, but the data type of
|
||
those integers depends on the kind of wide string. Using the constant
|
||
in an expression will convert the array to a pointer to its zeroth
|
||
element, as usual for arrays in C (@pxref{Accessing Array Elements}).
|
||
For each kind of wide string constant, we state here what type that
|
||
pointer will be.
|
||
|
||
@table @code
|
||
@item char16_t
|
||
This is a 16-bit Unicode wide string constant: each element is a
|
||
16-bit Unicode character code with type @code{char16_t}, so the string
|
||
has the array type @code{char16_t[]}. (That is a type designator;
|
||
@pxref{Pointer Type Designators}.) The constant is written as
|
||
@samp{u} (which must be lower case) followed (with no intervening
|
||
space) by a string constant with the usual syntax.
|
||
|
||
@item char32_t
|
||
This is a 32-bit Unicode wide string constant: each element is a
|
||
32-bit Unicode character code, and the string has type @code{char32_t[]}.
|
||
It's written as @samp{U} (which must be upper case) followed (with no
|
||
intervening space) by a string constant with the usual syntax.
|
||
|
||
@item wchar_t
|
||
This is the original kind of wide string constant. It's written as
|
||
@samp{L} (which must be upper case) followed (with no intervening
|
||
space) by a string constant with the usual syntax, and the string has
|
||
type @code{wchar_t[]}.
|
||
|
||
The width of the data type @code{wchar_t} depends on the target
|
||
platform, which makes this kind of wide string somewhat less useful
|
||
than the newer kinds.
|
||
@end table
|
||
|
||
@code{char16_t} and @code{char32_t} are declared in the header file
|
||
@file{uchar.h}. @code{wchar_t} is declared in @file{stddef.h}.
|
||
|
||
Consecutive wide string constants of the same kind concatenate, just
|
||
like ordinary string constants. A wide string constant concatenated
|
||
with an ordinary string constant results in a wide string constant.
|
||
You can't concatenate two wide string constants of different kinds.
|
||
In addition, you can't concatenate a wide string constant (of any
|
||
kind) with a UTF-8 string constant.
|
||
|
||
@node Type Size
|
||
@chapter Type Size
|
||
@cindex type size
|
||
@cindex size of type
|
||
@findex sizeof
|
||
|
||
Each data type has a @dfn{size}, which is the number of bytes
|
||
(@pxref{Storage}) that it occupies in memory. To refer to the size in
|
||
a C program, use @code{sizeof}. There are two ways to use it:
|
||
|
||
@table @code
|
||
@item sizeof @var{expression}
|
||
This gives the size of @var{expression}, based on its data type. It
|
||
does not calculate the value of @var{expression}, only its size, so if
|
||
@var{expression} includes side effects or function calls, they do not
|
||
happen. Therefore, @code{sizeof} with an expression as argument is
|
||
always a compile-time operation that has zero run-time cost, unless it
|
||
applies to a variable-size array.
|
||
|
||
A value that is a bit field (@pxref{Bit Fields}) is not allowed as an
|
||
operand of @code{sizeof}.
|
||
|
||
For example,
|
||
|
||
@example
|
||
double a;
|
||
|
||
i = sizeof a + 10;
|
||
@end example
|
||
|
||
@noindent
|
||
sets @code{i} to 18 on most computers because @code{a} occupies 8 bytes.
|
||
|
||
Here's how to determine the number of elements in an array
|
||
@code{arr}:
|
||
|
||
@example
|
||
(sizeof arr / sizeof arr[0])
|
||
@end example
|
||
|
||
@noindent
|
||
The expression @code{sizeof arr} gives the size of the array, not
|
||
the size of a pointer to an element. However, if @var{expression} is
|
||
a function parameter that was declared as an array, that
|
||
variable really has a pointer type (@pxref{Array Params are Ptrs}), so
|
||
the result is the size of that pointer.
|
||
|
||
@item sizeof (@var{type})
|
||
This gives the size of @var{type}.
|
||
For example,
|
||
|
||
@example
|
||
i = sizeof (double) + 10;
|
||
@end example
|
||
|
||
@noindent
|
||
is equivalent to the previous example.
|
||
|
||
@strong{Warning:} If @var{type} contains expressions which have side
|
||
effects, those expressions are actually computed and any side effects
|
||
in them do occur.
|
||
|
||
You can't apply @code{sizeof} to an incomplete type (@pxref{Incomplete
|
||
Types}). Using it on a function type or @code{void} gives 1 in GNU C,
|
||
which makes adding an integer to these pointer types work as desired
|
||
(@pxref{Pointer Arithmetic}).
|
||
@end table
|
||
|
||
@strong{Warning}: When you use @code{sizeof} with a type
|
||
instead of an expression, you must write parentheses around the type.
|
||
|
||
@strong{Warning}: When applying @code{sizeof} to the result of a cast
|
||
(@pxref{Explicit Type Conversion}), you must write parentheses around
|
||
the cast expression to avoid an ambiguity in the grammar of C@.
|
||
Specifically,
|
||
|
||
@example
|
||
sizeof (int) -x
|
||
@end example
|
||
|
||
@noindent
|
||
parses as
|
||
|
||
@example
|
||
(sizeof (int)) - x
|
||
@end example
|
||
|
||
@noindent
|
||
If what you want is
|
||
|
||
@example
|
||
sizeof ((int) -x)
|
||
@end example
|
||
|
||
@noindent
|
||
you must write it that way, with parentheses.
|
||
|
||
The data type of the value of the @code{sizeof} operator is always an
|
||
unsigned integer type; which one of those types depends on the
|
||
machine. The header file @code{stddef.h} defines @code{size_t} as a
|
||
name for such a type. @xref{Defining Typedef Names}.
|
||
|
||
@node Pointers
|
||
@chapter Pointers
|
||
@cindex pointers
|
||
|
||
Among high-level languages, C is rather low-level, close to the
|
||
machine. This is mainly because it has explicit @dfn{pointers}. A
|
||
pointer value is the numeric address of data in memory. The type of
|
||
data to be found at that address is specified by the data type of the
|
||
pointer itself. Nothing in C can determine the ``correct'' data type
|
||
of data in memory; it can only blindly follow the data type of the
|
||
pointer you use to access the data.
|
||
|
||
The unary operator @samp{*} gets the data that a pointer points
|
||
to---this is called @dfn{dereferencing the pointer}. Its value
|
||
always has the type that the pointer points to.
|
||
|
||
C also allows pointers to functions, but since there are some
|
||
differences in how they work, we treat them later. @xref{Function
|
||
Pointers}.
|
||
|
||
@menu
|
||
* Address of Data:: Using the ``address-of'' operator.
|
||
* Pointer Types:: For each type, there is a pointer type.
|
||
* Pointer Declarations:: Declaring variables with pointer types.
|
||
* Pointer Type Designators:: Designators for pointer types.
|
||
* Pointer Dereference:: Accessing what a pointer points at.
|
||
* Null Pointers:: Pointers which do not point to any object.
|
||
* Invalid Dereference:: Dereferencing null or invalid pointers.
|
||
* Void Pointers:: Totally generic pointers, can cast to any.
|
||
* Pointer Comparison:: Comparing memory address values.
|
||
* Pointer Arithmetic:: Computing memory address values.
|
||
* Pointers and Arrays:: Using pointer syntax instead of array syntax.
|
||
* Low-Level Pointer Arithmetic:: More about computing memory address values.
|
||
* Pointer Increment/Decrement:: Incrementing and decrementing pointers.
|
||
* Pointer Arithmetic Drawbacks:: A common pointer bug to watch out for.
|
||
* Pointer-Integer Conversion:: Converting pointer types to integer types.
|
||
* Printing Pointers:: Using @code{printf} for a pointer's value.
|
||
@end menu
|
||
|
||
@node Address of Data
|
||
@section Address of Data
|
||
|
||
@cindex address-of operator
|
||
The most basic way to make a pointer is with the ``address-of''
|
||
operator, @samp{&}. Let's suppose we have these variables available:
|
||
|
||
@example
|
||
int i;
|
||
double a[5];
|
||
@end example
|
||
|
||
Now, @code{&i} gives the address of the variable @code{i}---a pointer
|
||
value that points to @code{i}'s location---and @code{&a[3]} gives the
|
||
address of the element 3 of @code{a}. (By the usual 1-origin
|
||
numbering convention of ordinary English, it is actually the fourth
|
||
element in the array, since the element at the start has index 0.)
|
||
|
||
The address-of operator is unusual because it operates on a place to
|
||
store a value (an lvalue, @pxref{Lvalues}), not on the value currently
|
||
stored there. (The left argument of a simple assignment is unusual in
|
||
the same way.) You can use it on any lvalue except a bit field
|
||
(@pxref{Bit Fields}) or a constructor (@pxref{Structure
|
||
Constructors}).
|
||
|
||
|
||
@node Pointer Types
|
||
@section Pointer Types
|
||
|
||
For each data type @var{t}, there is a type for pointers to type
|
||
@var{t}. For these variables,
|
||
|
||
@example
|
||
int i;
|
||
double a[5];
|
||
@end example
|
||
|
||
@itemize @bullet
|
||
@item
|
||
@code{i} has type @code{int}; we say
|
||
@code{&i} is a ``pointer to @code{int}.''
|
||
|
||
@item
|
||
@code{a} has type @code{double[5]}; we say @code{&a} is a ``pointer to
|
||
an array of five @code{double}s.''
|
||
|
||
@item
|
||
@code{a[3]} has type @code{double}; we say @code{&a[3]} is a ``pointer
|
||
to @code{double}.''
|
||
@end itemize
|
||
|
||
@node Pointer Declarations
|
||
@section Pointer-Variable Declarations
|
||
|
||
The way to declare that a variable @code{foo} points to type @var{t} is
|
||
|
||
@example
|
||
@var{t} *foo;
|
||
@end example
|
||
|
||
To remember this syntax, think ``if you dereference @code{foo}, using
|
||
the @samp{*} operator, what you get is type @var{t}. Thus, @code{foo}
|
||
points to type @var{t}.''
|
||
|
||
Thus, we can declare variables that hold pointers to these three
|
||
types, like this:
|
||
|
||
@example
|
||
int *ptri; /* @r{Pointer to @code{int}.} */
|
||
double *ptrd; /* @r{Pointer to @code{double}.} */
|
||
double (*ptrda)[5]; /* @r{Pointer to @code{double[5]}.} */
|
||
@end example
|
||
|
||
@samp{int *ptri;} means, ``if you dereference @code{ptri}, you get an
|
||
@code{int}.'' @samp{double (*ptrda)[5];} means, ``if you dereference
|
||
@code{ptrda}, then subscript it by an integer less than 5, you get a
|
||
@code{double}.'' The parentheses express the point that you would
|
||
dereference it first, then subscript it.
|
||
|
||
Contrast the last one with this:
|
||
|
||
@example
|
||
double *aptrd[5]; /* @r{Array of five pointers to @code{double}.} */
|
||
@end example
|
||
|
||
@noindent
|
||
Because @samp{*} has lower syntactic precedence than subscripting,
|
||
@samp{double *aptrd[5]} means, ``if you subscript @code{aptrd} by an
|
||
integer less than 5, then dereference it, you get a @code{double}.''
|
||
Therefore, @code{*aptrd[5]} declares an array of pointers, not a
|
||
pointer to an array.
|
||
|
||
@node Pointer Type Designators
|
||
@section Pointer-Type Designators
|
||
|
||
Every type in C has a designator; you make it by deleting the variable
|
||
name and the semicolon from a declaration (@pxref{Type
|
||
Designators}). Here are the designators for the pointer
|
||
types of the example declarations in the previous section:
|
||
|
||
@example
|
||
int * /* @r{Pointer to @code{int}.} */
|
||
double * /* @r{Pointer to @code{double}.} */
|
||
double (*)[5] /* @r{Pointer to @code{double[5]}.} */
|
||
@end example
|
||
|
||
Remember, to understand what type a designator stands for, imagine the
|
||
corresponding variable declaration with a variable name in it, and
|
||
figure out what type that variable would have. Thus, the type
|
||
designator @code{double (*)[5]} corresponds to the variable declaration
|
||
@code{double (*@var{variable})[5]}. That declares a pointer variable
|
||
which, when dereferenced, gives an array of 5 @code{double}s.
|
||
So the type designator means, ``pointer to an array of 5 @code{double}s.''
|
||
|
||
@node Pointer Dereference
|
||
@section Dereferencing Pointers
|
||
@cindex dereferencing pointers
|
||
@cindex pointer dereferencing
|
||
|
||
The main use of a pointer value is to @dfn{dereference it} (access the
|
||
data it points at) with the unary @samp{*} operator. For instance,
|
||
@code{*&i} is the value at @code{i}'s address---which is just
|
||
@code{i}. The two expressions are equivalent, provided @code{&i} is
|
||
valid.
|
||
|
||
A pointer-dereference expression whose type is data (not a function)
|
||
is an lvalue.
|
||
|
||
Pointers become really useful when we store them somewhere and use
|
||
them later. Here's a simple example to illustrate the practice:
|
||
|
||
@example
|
||
@{
|
||
int i;
|
||
int *ptr;
|
||
|
||
ptr = &i;
|
||
|
||
i = 5;
|
||
|
||
@r{@dots{}}
|
||
|
||
return *ptr; /* @r{Returns 5, fetched from @code{i}.} */
|
||
@}
|
||
@end example
|
||
|
||
This shows how to declare the variable @code{ptr} as type
|
||
@code{int *} (pointer to @code{int}), store a pointer value into it
|
||
(pointing at @code{i}), and use it later to get the value of the
|
||
object it points at (the value in @code{i}).
|
||
|
||
Here is another example of using a pointer to a variable.
|
||
|
||
@example
|
||
/* @r{Define global variable @code{i}.} */
|
||
int i = 2;
|
||
|
||
int
|
||
foo (void)
|
||
@{
|
||
/* @r{Save global variable @code{i}'s address.} */
|
||
int *global_i = &i;
|
||
|
||
/* @r{Declare local @code{i}, shadowing the global @code{i}.} */
|
||
int i = 5;
|
||
|
||
/* @r{Print value of global @code{i} and value of local @code{i}.} */
|
||
printf ("global i: %d\nlocal i: %d\n", *global_i, i);
|
||
return i;
|
||
@}
|
||
@end example
|
||
|
||
Of course, in a real program it would be much cleaner to use different
|
||
names for these two variables, rather than calling both of them
|
||
@code{i}. But it is hard to illustrate this syntaxtical point with
|
||
clean code. If anyone can provide a useful example to illustrate
|
||
this point with, that would be welcome.
|
||
|
||
@node Null Pointers
|
||
@section Null Pointers
|
||
@cindex null pointers
|
||
@cindex pointers, null
|
||
|
||
@c ???stdio loads sttddef
|
||
|
||
A pointer value can be @dfn{null}, which means it does not point to
|
||
any object. The cleanest way to get a null pointer is by writing
|
||
@code{NULL}, a standard macro defined in @file{stddef.h}. You can
|
||
also do it by casting 0 to the desired pointer type, as in
|
||
@code{(char *) 0}. (The cast operator performs explicit type conversion;
|
||
@xref{Explicit Type Conversion}.)
|
||
|
||
You can store a null pointer in any lvalue whose data type
|
||
is a pointer type:
|
||
|
||
@example
|
||
char *foo;
|
||
foo = NULL;
|
||
@end example
|
||
|
||
These two, if consecutive, can be combined into a declaration with
|
||
initializer,
|
||
|
||
@example
|
||
char *foo = NULL;
|
||
@end example
|
||
|
||
You can also explicitly cast @code{NULL} to the specific pointer type
|
||
you want---it makes no difference.
|
||
|
||
@example
|
||
char *foo;
|
||
foo = (char *) NULL;
|
||
@end example
|
||
|
||
To test whether a pointer is null, compare it with zero or
|
||
@code{NULL}, as shown here:
|
||
|
||
@example
|
||
if (p != NULL)
|
||
/* @r{@code{p} is not null.} */
|
||
operate (p);
|
||
@end example
|
||
|
||
Since testing a pointer for not being null is basic and frequent, all
|
||
but beginners in C will understand the conditional without need for
|
||
@code{!= NULL}:
|
||
|
||
@example
|
||
if (p)
|
||
/* @r{@code{p} is not null.} */
|
||
operate (p);
|
||
@end example
|
||
|
||
@node Invalid Dereference
|
||
@section Dereferencing Null or Invalid Pointers
|
||
|
||
Trying to dereference a null pointer is an error. On most platforms,
|
||
it generally causes a signal, usually @code{SIGSEGV}
|
||
(@pxref{Signals}).
|
||
|
||
@example
|
||
char *foo = NULL;
|
||
c = *foo; /* @r{This causes a signal and terminates.} */
|
||
@end example
|
||
|
||
@noindent
|
||
Likewise a pointer that has the wrong alignment for the target data type
|
||
(on most types of computer), or points to a part of memory that has
|
||
not been allocated in the process's address space.
|
||
|
||
The signal terminates the program, unless the program has arranged to
|
||
handle the signal (@pxref{Signal Handling, The GNU C Library, , libc,
|
||
The GNU C Library Reference Manual}).
|
||
|
||
However, the signal might not happen if the dereference is optimized
|
||
away. In the example above, if you don't subsequently use the value
|
||
of @code{c}, GCC might optimize away the code for @code{*foo}. You
|
||
can prevent such optimization using the @code{volatile} qualifier, as
|
||
shown here:
|
||
|
||
@example
|
||
volatile char *p;
|
||
volatile char c;
|
||
c = *p;
|
||
@end example
|
||
|
||
You can use this to test whether @code{p} points to unallocated
|
||
memory. Set up a signal handler first, so the signal won't terminate
|
||
the program.
|
||
|
||
@node Void Pointers
|
||
@section Void Pointers
|
||
@cindex void pointers
|
||
@cindex pointers, void
|
||
|
||
The peculiar type @code{void *}, a pointer whose target type is
|
||
@code{void}, is used often in C@. It represents a pointer to
|
||
we-don't-say-what. Thus,
|
||
|
||
@example
|
||
void *numbered_slot_pointer (int);
|
||
@end example
|
||
|
||
@noindent
|
||
declares a function @code{numbered_slot_pointer} that takes an
|
||
integer parameter and returns a pointer, but we don't say what type of
|
||
data it points to.
|
||
|
||
The functions for dynamic memory allocation (@pxref{Dynamic Memory
|
||
Allocation}) use type @code{void *} to refer to blocks of memory,
|
||
regardless of what sort of data the program stores in those blocks.
|
||
|
||
With type @code{void *}, you can pass the pointer around and test
|
||
whether it is null. However, dereferencing it gives a @code{void}
|
||
value that can't be used (@pxref{The Void Type}). To dereference the
|
||
pointer, first convert it to some other pointer type.
|
||
|
||
|
||
Assignments convert @code{void *} automatically to any other pointer
|
||
type, if the left operand has a pointer type; for instance,
|
||
|
||
@example
|
||
@{
|
||
int *p;
|
||
/* @r{Converts return value to @code{int *}.} */
|
||
p = numbered_slot_pointer (5);
|
||
@r{@dots{}}
|
||
@}
|
||
@end example
|
||
|
||
Passing an argument of type @code{void *} for a parameter that has a
|
||
pointer type also converts. For example, supposing the function
|
||
@code{hack} is declared to require type @code{float *} for its
|
||
parameter, this call to @code{hack} will convert the argument to that
|
||
type.
|
||
|
||
@example
|
||
/* @r{Declare @code{hack} that way.}
|
||
@r{We assume it is defined somewhere else.} */
|
||
void hack (float *);
|
||
@dots{}
|
||
/* @r{Now call @code{hack}.} */
|
||
@{
|
||
/* @r{Converts return value of @code{numbered_slot_pointer}}
|
||
@r{to @code{float *} to pass it to @code{hack}.} */
|
||
hack (numbered_slot_pointer (5));
|
||
@r{@dots{}}
|
||
@}
|
||
@end example
|
||
|
||
You can also convert to another pointer type with an explicit cast
|
||
(@pxref{Explicit Type Conversion}), like this:
|
||
@example
|
||
(int *) numbered_slot_pointer (5)
|
||
@end example
|
||
|
||
Here is an example which decides at run time which pointer
|
||
type to convert to:
|
||
|
||
@example
|
||
void
|
||
extract_int_or_double (void *ptr, bool its_an_int)
|
||
@{
|
||
if (its_an_int)
|
||
handle_an_int (*(int *)ptr);
|
||
else
|
||
handle_a_double (*(double *)ptr);
|
||
@}
|
||
@end example
|
||
|
||
The expression @code{*(int *)ptr} means to convert @code{ptr}
|
||
to type @code{int *}, then dereference it.
|
||
|
||
@node Pointer Comparison
|
||
@section Pointer Comparison
|
||
@cindex pointer comparison
|
||
@cindex comparison, pointer
|
||
|
||
Two pointer values are equal if they point to the same location, or if
|
||
they are both null. You can test for this with @code{==} and
|
||
@code{!=}. Here's a trivial example:
|
||
|
||
@example
|
||
@{
|
||
int i;
|
||
int *p, *q;
|
||
|
||
p = &i;
|
||
q = &i;
|
||
if (p == q)
|
||
printf ("This will be printed.\n");
|
||
if (p != q)
|
||
printf ("This won't be printed.\n");
|
||
@}
|
||
@end example
|
||
|
||
Ordering comparisons such as @code{>} and @code{>=} operate on
|
||
pointers by converting them to unsigned integers. The C standard says
|
||
the two pointers must point within the same object in memory, but on
|
||
GNU/Linux systems these operations simply compare the numeric values
|
||
of the pointers.
|
||
|
||
The pointer values to be compared should in principle have the same type, but
|
||
they are allowed to differ in limited cases. First of all, if the two
|
||
pointers' target types are nearly compatible (@pxref{Compatible
|
||
Types}), the comparison is allowed.
|
||
|
||
If one of the operands is @code{void *} (@pxref{Void Pointers}) and
|
||
the other is another pointer type, the comparison operator converts
|
||
the @code{void *} pointer to the other type so as to compare them.
|
||
(In standard C, this is not allowed if the other type is a function
|
||
pointer type, but it works in GNU C@.)
|
||
|
||
Comparison operators also allow comparing the integer 0 with a pointer
|
||
value. This works by converting 0 to a null pointer of the same type
|
||
as the other operand.
|
||
|
||
@node Pointer Arithmetic
|
||
@section Pointer Arithmetic
|
||
@cindex pointer arithmetic
|
||
@cindex arithmetic, pointer
|
||
|
||
Adding an integer (positive or negative) to a pointer is valid in C@.
|
||
It assumes that the pointer points to an element in an array, and
|
||
advances or retracts the pointer across as many array elements as the
|
||
integer specifies. Here is an example, in which adding a positive
|
||
integer advances the pointer to a later element in the same array.
|
||
|
||
@example
|
||
void
|
||
incrementing_pointers ()
|
||
@{
|
||
int array[5] = @{ 45, 29, 104, -3, 123456 @};
|
||
int elt0, elt1, elt4;
|
||
|
||
int *p = &array[0];
|
||
/* @r{Now @code{p} points at element 0. Fetch it.} */
|
||
elt0 = *p;
|
||
|
||
++p;
|
||
/* @r{Now @code{p} points at element 1. Fetch it.} */
|
||
elt1 = *p;
|
||
|
||
p += 3;
|
||
/* @r{Now @code{p} points at element 4 (the last). Fetch it.} */
|
||
elt4 = *p;
|
||
|
||
printf ("elt0 %d elt1 %d elt4 %d.\n",
|
||
elt0, elt1, elt4);
|
||
/* @r{Prints elt0 45 elt1 29 elt4 123456.} */
|
||
@}
|
||
@end example
|
||
|
||
Here's an example where adding a negative integer retracts the pointer
|
||
to an earlier element in the same array.
|
||
|
||
@example
|
||
void
|
||
decrementing_pointers ()
|
||
@{
|
||
int array[5] = @{ 45, 29, 104, -3, 123456 @};
|
||
int elt0, elt3, elt4;
|
||
|
||
int *p = &array[4];
|
||
/* @r{Now @code{p} points at element 4 (the last). Fetch it.} */
|
||
elt4 = *p;
|
||
|
||
--p;
|
||
/* @r{Now @code{p} points at element 3. Fetch it.} */
|
||
elt3 = *p;
|
||
|
||
p -= 3;
|
||
/* @r{Now @code{p} points at element 0. Fetch it.} */
|
||
elt0 = *p;
|
||
|
||
printf ("elt0 %d elt3 %d elt4 %d.\n",
|
||
elt0, elt3, elt4);
|
||
/* @r{Prints elt0 45 elt3 -3 elt4 123456.} */
|
||
@}
|
||
@end example
|
||
|
||
If one pointer value was made by adding an integer to another
|
||
pointer value, it should be possible to subtract the pointer values
|
||
and recover that integer. That works too in C@.
|
||
|
||
@example
|
||
void
|
||
subtract_pointers ()
|
||
@{
|
||
int array[5] = @{ 45, 29, 104, -3, 123456 @};
|
||
int *p0, *p3, *p4;
|
||
|
||
int *p = &array[4];
|
||
/* @r{Now @code{p} points at element 4 (the last). Save the value.} */
|
||
p4 = p;
|
||
|
||
--p;
|
||
/* @r{Now @code{p} points at element 3. Save the value.} */
|
||
p3 = p;
|
||
|
||
p -= 3;
|
||
/* @r{Now @code{p} points at element 0. Save the value.} */
|
||
p0 = p;
|
||
|
||
printf ("%d, %d, %d, %d\n",
|
||
p4 - p0, p0 - p0, p3 - p0, p0 - p3);
|
||
/* @r{Prints 4, 0, 3, -3.} */
|
||
@}
|
||
@end example
|
||
|
||
The addition operation does not know where arrays begin or end in
|
||
memory. All it does is add the integer (multiplied by target object
|
||
size) to the numeric value of the pointer. When the initial pointer
|
||
and the result point into the same array, the result is well-defined.
|
||
|
||
@strong{Warning:} Only experts should do pointer arithmetic involving pointers
|
||
into different memory objects.
|
||
|
||
The difference between two pointers has type @code{int}, or
|
||
@code{long} if necessary (@pxref{Integer Types}). The clean way to
|
||
declare it is to use the typedef name @code{ptrdiff_t} defined in the
|
||
file @file{stddef.h}.
|
||
|
||
C defines pointer subtraction to be consistent with pointer-integer
|
||
addition, so that @code{(p3 - p1) + p1} equals @code{p3}, as in
|
||
ordinary algebra. Pointer subtraction works by subtracting
|
||
@code{p1}'s numeric value from @code{p3}'s, and dividing by target
|
||
object size. The two pointer arguments should point into the same
|
||
array.
|
||
|
||
In standard C, addition and subtraction are not allowed on @code{void
|
||
*}, since the target type's size is not defined in that case.
|
||
Likewise, they are not allowed on pointers to function types.
|
||
However, these operations work in GNU C, and the ``size of the target
|
||
type'' is taken as 1 byte.
|
||
|
||
@node Pointers and Arrays
|
||
@section Pointers and Arrays
|
||
@cindex pointers and arrays
|
||
@cindex arrays and pointers
|
||
|
||
The clean way to refer to an array element is
|
||
@code{@var{array}[@var{index}]}. Another, complicated way to do the
|
||
same job is to get the address of that element as a pointer, then
|
||
dereference it: @code{* (&@var{array}[0] + @var{index})} (or
|
||
equivalently @code{* (@var{array} + @var{index})}). This first gets a
|
||
pointer to element zero, then increments it with @code{+} to point to
|
||
the desired element, then gets the value from there.
|
||
|
||
That pointer-arithmetic construct is the @emph{definition} of square
|
||
brackets in C@. @code{@var{a}[@var{b}]} means, by definition,
|
||
@code{*(@var{a} + @var{b})}. This definition uses @var{a} and @var{b}
|
||
symmetrically, so one must be a pointer and the other an integer; it
|
||
does not matter which comes first.
|
||
|
||
Since indexing with square brackets is defined in terms of addition
|
||
and dereferencing, that too is symmetrical. Thus, you can write
|
||
@code{3[array]} and it is equivalent to @code{array[3]}. However, it
|
||
would be foolish to write @code{3[array]}, since it has no advantage
|
||
and could confuse people who read the code.
|
||
|
||
It may seem like a discrepancy that the definition @code{*(@var{a} +
|
||
@var{b})} requires a pointer, while @code{array[3]} uses an array value
|
||
instead. Why is this valid? The name of the array, when used by
|
||
itself as an expression (other than in @code{sizeof}), stands for a
|
||
pointer to the array's zeroth element. Thus, @code{array + 3}
|
||
converts @code{array} implicitly to @code{&array[0]}, and the result
|
||
is a pointer to element 3, equivalent to @code{&array[3]}.
|
||
|
||
Since square brackets are defined in terms of such an addition,
|
||
@code{array[3]} first converts @code{array} to a pointer. That's why
|
||
it works to use an array directly in that construct.
|
||
|
||
@node Low-Level Pointer Arithmetic
|
||
@section Pointer Arithmetic at Low-Level
|
||
@cindex pointer arithmetic, low-level
|
||
@cindex low level pointer arithmetic
|
||
|
||
The behavior of pointer arithmetic is theoretically defined only when
|
||
the pointer values all point within one object allocated in memory.
|
||
But the addition and subtraction operators can't tell whether the
|
||
pointer values are all within one object. They don't know where
|
||
objects start and end. So what do they really do?
|
||
|
||
Adding pointer @var{p} to integer @var{i} treats @var{p} as a memory
|
||
address, which is in fact an integer---call it @var{pint}. It treats
|
||
@var{i} as a number of elements of the type that @var{p} points to.
|
||
These elements' sizes add up to @code{@var{i} * sizeof (*@var{p})}.
|
||
So the sum, as an integer, is @code{@var{pint} + @var{i} * sizeof
|
||
(*@var{p})}. This value is reinterpreted as a pointer of the same
|
||
type as @var{p}.
|
||
|
||
If the starting pointer value @var{p} and the result do not point at
|
||
parts of the same object, the operation is not officially legitimate,
|
||
and C code is not ``supposed'' to do it. But you can do it anyway,
|
||
and it gives precisely the results described by the procedure above.
|
||
In some special situations it can do something useful, but non-wizards
|
||
should avoid it.
|
||
|
||
Here's a function to offset a pointer value @emph{as if} it pointed to
|
||
an object of any given size, by explicitly performing that calculation:
|
||
|
||
@example
|
||
#include <stdint.h>
|
||
|
||
void *
|
||
ptr_add (void *p, int i, int objsize)
|
||
@{
|
||
intptr_t p_address = (long) p;
|
||
intptr_t totalsize = i * objsize;
|
||
intptr_t new_address = p_address + totalsize;
|
||
return (void *) new_address;
|
||
@}
|
||
@end example
|
||
|
||
@noindent
|
||
@cindex @code{intptr_t}
|
||
This does the same job as @code{@var{p} + @var{i}} with the proper
|
||
pointer type for @var{p}. It uses the type @code{intptr_t}, which is
|
||
defined in the header file @file{stdint.h}. (In practice, @code{long
|
||
long} would always work, but it is cleaner to use @code{intptr_t}.)
|
||
|
||
@node Pointer Increment/Decrement
|
||
@section Pointer Increment and Decrement
|
||
@cindex pointer increment and decrement
|
||
@cindex incrementing pointers
|
||
@cindex decrementing pointers
|
||
|
||
The @samp{++} operator adds 1 to a variable. We have seen it for
|
||
integers (@pxref{Increment/Decrement}), but it works for pointers too.
|
||
For instance, suppose we have a series of positive integers,
|
||
terminated by a zero, and we want to add them up. Here is a simple
|
||
way to step forward through the array by advancing a pointer.
|
||
|
||
@example
|
||
int
|
||
sum_array_till_0 (int *p)
|
||
@{
|
||
int sum = 0;
|
||
|
||
for (;;)
|
||
@{
|
||
/* @r{Fetch the next integer.} */
|
||
int next = *p++;
|
||
/* @r{Exit the loop if it's 0.} */
|
||
if (next == 0)
|
||
break;
|
||
/* @r{Add it into running total.} */
|
||
sum += next;
|
||
@}
|
||
|
||
return sum;
|
||
@}
|
||
@end example
|
||
|
||
@noindent
|
||
The statement @samp{break;} will be explained further on (@pxref{break
|
||
Statement}). Used in this way, it immediately exits the surrounding
|
||
@code{for} statement.
|
||
|
||
@code{*p++} uses postincrement (@code{++};
|
||
@pxref{Postincrement/Postdecrement}) on the pointer @code{p}. That
|
||
expression parses as @code{*(p++)}, because a postfix operator always
|
||
takes precedence over a prefix operator. Therefore, it dereferences
|
||
the entering value of @code{p}, then increments @code{p} afterwards.
|
||
|
||
Incrementing a variable means adding 1 to it, as in @code{p = p + 1}.
|
||
Since @code{p} is a pointer, adding 1 to it advances it by the width
|
||
of the datum it points to---in this case, @code{sizeof (int)}.
|
||
Therefore, each iteration of the loop picks up the next integer from
|
||
the series and puts it into @code{next}.
|
||
|
||
This @code{for}-loop has no initialization expression since @code{p}
|
||
and @code{sum} are already initialized, has no end-test since the
|
||
@samp{break;} statement will exit it, and needs no expression to
|
||
advance it since that's done within the loop by incrementing @code{p}
|
||
and @code{sum}. Thus, those three expressions after @code{for} are
|
||
left empty.
|
||
|
||
Another way to write this function is by keeping the parameter value unchanged
|
||
and using indexing to access the integers in the table.
|
||
|
||
@example
|
||
int
|
||
sum_array_till_0_indexing (int *p)
|
||
@{
|
||
int i;
|
||
int sum = 0;
|
||
|
||
for (i = 0; ; i++)
|
||
@{
|
||
/* @r{Fetch the next integer.} */
|
||
int next = p[i];
|
||
/* @r{Exit the loop if it's 0.} */
|
||
if (next == 0)
|
||
break;
|
||
/* @r{Add it into running total.} */
|
||
sum += next;
|
||
@}
|
||
|
||
return sum;
|
||
@}
|
||
@end example
|
||
|
||
In this program, instead of advancing @code{p}, we advance @code{i}
|
||
and add it to @code{p}. (Recall that @code{p[i]} means @code{*(p +
|
||
i)}.) Either way, it uses the same address to get the next integer.
|
||
|
||
It makes no difference in this program whether we write @code{i++} or
|
||
@code{++i}, because the value @emph{of that expression} is not used.
|
||
We use it for its effect, to increment @code{i}.
|
||
|
||
The @samp{--} operator also works on pointers; it can be used
|
||
to step backwards through an array, like this:
|
||
|
||
@example
|
||
int
|
||
after_last_nonzero (int *p, int len)
|
||
@{
|
||
/* @r{Set up @code{q} to point just after the last array element.} */
|
||
int *q = p + len;
|
||
|
||
while (q != p)
|
||
/* @r{Step @code{q} back until it reaches a nonzero element.} */
|
||
if (*--q != 0)
|
||
/* @r{Return the index of the element after that nonzero.} */
|
||
return q - p + 1;
|
||
|
||
return 0;
|
||
@}
|
||
@end example
|
||
|
||
That function returns the length of the nonzero part of the
|
||
array specified by its arguments; that is, the index of the
|
||
first zero of the run of zeros at the end.
|
||
|
||
@node Pointer Arithmetic Drawbacks
|
||
@section Drawbacks of Pointer Arithmetic
|
||
@cindex drawbacks of pointer arithmetic
|
||
@cindex pointer arithmetic, drawbacks
|
||
|
||
Pointer arithmetic is clean and elegant, but it is also the cause of a
|
||
major security flaw in the C language. Theoretically, it is only
|
||
valid to adjust a pointer within one object allocated as a unit in
|
||
memory. However, if you unintentionally adjust a pointer across the
|
||
bounds of the object and into some other object, the system has no way
|
||
to detect this error.
|
||
|
||
A bug which does that can easily result in clobbering (overwriting)
|
||
part of another object. For example, with @code{array[-1]} you can
|
||
read or write the nonexistent element before the beginning of an
|
||
array---probably part of some other data.
|
||
|
||
Combining pointer arithmetic with casts between pointer types, you can
|
||
create a pointer that fails to be properly aligned for its type. For
|
||
example,
|
||
|
||
@example
|
||
int a[2];
|
||
char *pa = (char *)a;
|
||
int *p = (int *)(pa + 1);
|
||
@end example
|
||
|
||
@noindent
|
||
gives @code{p} a value pointing to an ``integer'' that includes part
|
||
of @code{a[0]} and part of @code{a[1]}. Dereferencing that with
|
||
@code{*p} can cause a fatal @code{SIGSEGV} signal or it can return the
|
||
contents of that badly aligned @code{int} (@pxref{Signals}. If it
|
||
``works,'' it may be quite slow. It can also cause aliasing
|
||
confusions (@pxref{Aliasing}).
|
||
|
||
@strong{Warning:} Using improperly aligned pointers is risky---don't do it
|
||
unless it is really necessary.
|
||
|
||
@node Pointer-Integer Conversion
|
||
@section Pointer-Integer Conversion
|
||
@cindex pointer-integer conversion
|
||
@cindex conversion between pointers and integers
|
||
@cindex @code{uintptr_t}
|
||
|
||
On modern computers, an address is simply a number. It occupies the
|
||
same space as some size of integer. In C, you can convert a pointer
|
||
to the appropriate integer types and vice versa, without losing
|
||
information. The appropriate integer types are @code{uintptr_t} (an
|
||
unsigned type) and @code{intptr_t} (a signed type). Both are defined
|
||
in @file{stdint.h}.
|
||
|
||
For instance,
|
||
|
||
@example
|
||
#include <stdint.h>
|
||
#include <stdio.h>
|
||
|
||
void
|
||
print_pointer (void *ptr)
|
||
@{
|
||
uintptr_t converted = (uintptr_t) ptr;
|
||
|
||
printf ("Pointer value is 0x%x\n",
|
||
(unsigned int) converted);
|
||
@}
|
||
@end example
|
||
|
||
@noindent
|
||
The specification @samp{%x} in the template (the first argument) for
|
||
@code{printf} means to represent this argument using hexadecimal
|
||
notation. It's cleaner to use @code{uintptr_t}, since hexadecimal
|
||
printing treats the number as unsigned, but it won't actually matter:
|
||
all @code{printf} gets to see is the series of bits in the number.
|
||
|
||
@strong{Warning:} Converting pointers to integers is risky---don't do
|
||
it unless it is really necessary.
|
||
|
||
@node Printing Pointers
|
||
@section Printing Pointers
|
||
|
||
To print the numeric value of a pointer, use the @samp{%p} specifier.
|
||
For example:
|
||
|
||
@example
|
||
void
|
||
print_pointer (void *ptr)
|
||
@{
|
||
printf ("Pointer value is %p\n", ptr);
|
||
@}
|
||
@end example
|
||
|
||
The specification @samp{%p} works with any pointer type. It prints
|
||
@samp{0x} followed by the address in hexadecimal, printed as the
|
||
appropriate unsigned integer type.
|
||
|
||
@node Structures
|
||
@chapter Structures
|
||
@cindex structures
|
||
@findex struct
|
||
@cindex fields in structures
|
||
@cindex compound type
|
||
|
||
A @dfn{structure} is a user-defined data type that holds various
|
||
@dfn{fields} of data. Each field has a name and a data type specified
|
||
in the structure's definition. Because a structure combines various
|
||
fields, each of its own type, we call a structure type a @dfn{compound
|
||
type}. (Union types are also compound types; @pxref{Unions}.)
|
||
|
||
Here we define a structure suitable for storing a linked list of
|
||
integers. Each list item will hold one integer, plus a pointer
|
||
to the next item.
|
||
|
||
@example
|
||
struct intlistlink
|
||
@{
|
||
int datum;
|
||
struct intlistlink *next;
|
||
@};
|
||
@end example
|
||
|
||
The structure definition has a @dfn{type tag} so that the code can
|
||
refer to this structure. The type tag here is @code{intlistlink}.
|
||
The definition refers recursively to the same structure through that
|
||
tag.
|
||
|
||
You can define a structure without a type tag, but then you can't
|
||
refer to it again. That is useful only in some special contexts, such
|
||
as inside a @code{typedef} or a @code{union}.
|
||
|
||
The contents of the structure are specified by the @dfn{field
|
||
declarations} inside the braces. Each field in the structure needs a
|
||
declaration there. The fields in one structure definition must have
|
||
distinct names, but these names do not conflict with any other names
|
||
in the program.
|
||
|
||
A field declaration looks just like a variable declaration. You can
|
||
combine field declarations with the same beginning, just as you can
|
||
combine variable declarations.
|
||
|
||
This structure has two fields. One, named @code{datum}, has type
|
||
@code{int} and will hold one integer in the list. The other, named
|
||
@code{next}, is a pointer to another @code{struct intlistlink}
|
||
which would be the rest of the list. In the last list item, it would
|
||
be @code{NULL}.
|
||
|
||
This structure definition is recursive, since the type of the
|
||
@code{next} field refers to the structure type. Such recursion is not
|
||
a problem; in fact, you can use the type @code{struct intlistlink *}
|
||
before the definition of the type @code{struct intlistlink} itself.
|
||
That works because pointers to all kinds of structures really look the
|
||
same at the machine level.
|
||
|
||
After defining the structure, you can declare a variable of type
|
||
@code{struct intlistlink} like this:
|
||
|
||
@example
|
||
struct intlistlink foo;
|
||
@end example
|
||
|
||
The structure definition itself can serve as the beginning of a
|
||
variable declaration, so you can declare variables immediately after,
|
||
like this:
|
||
|
||
@example
|
||
struct intlistlink
|
||
@{
|
||
int datum;
|
||
struct intlistlink *next;
|
||
@} foo;
|
||
@end example
|
||
|
||
@noindent
|
||
But that is ugly. It is almost always clearer to separate the
|
||
definition of the structure from its uses.
|
||
|
||
Declaring a structure type inside a block (@pxref{Blocks}) limits
|
||
the scope of the structure type name to that block. That means the
|
||
structure type is recognized only within that block. Declaring it in
|
||
a function parameter list, as here,
|
||
|
||
@example
|
||
int f (struct foo @{int a, b@} parm);
|
||
@end example
|
||
|
||
@noindent
|
||
(assuming that @code{struct foo} is not already defined) limits the
|
||
scope of the structure type @code{struct foo} to that parameter list;
|
||
that is basically useless, so it triggers a warning.
|
||
|
||
Standard C requires at least one field in a structure.
|
||
GNU C does not require this.
|
||
|
||
@menu
|
||
* Referencing Fields:: Accessing field values in a structure object.
|
||
* Arrays as Fields:: Accessing arrays as structure fields.
|
||
* Dynamic Memory Allocation:: Allocating space for objects
|
||
while the program is running.
|
||
* Field Offset:: Memory layout of fields within a structure.
|
||
* Structure Layout:: Planning the memory layout of fields.
|
||
* Packed Structures:: Packing structure fields as close as possible.
|
||
* Bit Fields:: Dividing integer fields
|
||
into fields with fewer bits.
|
||
* Bit Field Packing:: How bit fields pack together in integers.
|
||
* const Fields:: Making structure fields immutable.
|
||
* Zero Length:: Zero-length array as a variable-length object.
|
||
* Flexible Array Fields:: Another approach to variable-length objects.
|
||
* Overlaying Structures:: Casting one structure type
|
||
over an object of another structure type.
|
||
* Structure Assignment:: Assigning values to structure objects.
|
||
* Unions:: Viewing the same object in different types.
|
||
* Packing With Unions:: Using a union type to pack various types into
|
||
the same memory space.
|
||
* Cast to Union:: Casting a value one of the union's alternative
|
||
types to the type of the union itself.
|
||
* Structure Constructors:: Building new structure objects.
|
||
* Unnamed Types as Fields:: Fields' types do not always need names.
|
||
* Incomplete Types:: Types which have not been fully defined.
|
||
* Intertwined Incomplete Types:: Defining mutually-recursive structure types.
|
||
* Type Tags:: Scope of structure and union type tags.
|
||
@end menu
|
||
|
||
@node Referencing Fields
|
||
@section Referencing Structure Fields
|
||
@cindex referencing structure fields
|
||
@cindex structure fields, referencing
|
||
|
||
To make a structure useful, there has to be a way to examine and store
|
||
its fields. The @samp{.} (period) operator does that; its use looks
|
||
like @code{@var{object}.@var{field}}.
|
||
|
||
Given this structure and variable,
|
||
|
||
@example
|
||
struct intlistlink
|
||
@{
|
||
int datum;
|
||
struct intlistlink *next;
|
||
@};
|
||
|
||
struct intlistlink foo;
|
||
@end example
|
||
|
||
@noindent
|
||
you can write @code{foo.datum} and @code{foo.next} to refer to the two
|
||
fields in the value of @code{foo}. These fields are lvalues, so you
|
||
can store values into them, and read the values out again.
|
||
|
||
Most often, structures are dynamically allocated (see the next
|
||
section), and we refer to the objects via pointers.
|
||
@code{(*p).@var{field}} is somewhat cumbersome, so there is an
|
||
abbreviation: @code{p->@var{field}}. For instance, assume the program
|
||
contains this declaration:
|
||
|
||
@example
|
||
struct intlistlink *ptr;
|
||
@end example
|
||
|
||
@noindent
|
||
You can write @code{ptr->datum} and @code{ptr->next} to refer
|
||
to the two fields in the object that @code{ptr} points to.
|
||
|
||
If a unary operator precedes an expression using @samp{->},
|
||
the @samp{->} nests inside:
|
||
|
||
@example
|
||
-ptr->datum @r{is equivalent to} -(ptr->datum)
|
||
@end example
|
||
|
||
You can intermix @samp{->} and @samp{.} without parentheses,
|
||
as shown here:
|
||
|
||
@example
|
||
struct @{ double d; struct intlistlink l; @} foo;
|
||
|
||
@r{@dots{}}foo.l.next->next->datum@r{@dots{}}
|
||
@end example
|
||
|
||
@node Arrays as Fields
|
||
@section Arrays as Fields
|
||
|
||
When you declare field in a structure as an array, as here:
|
||
|
||
@example
|
||
struct record
|
||
@{
|
||
char *name;
|
||
int data[4];
|
||
@};
|
||
@end example
|
||
|
||
@noindent
|
||
Each @code{struct record} object holds one string (a pointer, of
|
||
course) and four integers, all part of a field called @code{data}. If
|
||
@code{recptr} is a pointer of type @code{struct record *}, then it
|
||
points to a @code{struct record} which contains those things; you can
|
||
access the second integer in that record with @code{recptr->data[1]}.
|
||
|
||
If you have two objects of type @code{struct record}, each one contains
|
||
an array. With this declaration,
|
||
|
||
@example
|
||
struct record r1, r2;
|
||
@end example
|
||
|
||
@code{r1.data} holds space for 4 @code{int}s, and @code{r2.data} holds
|
||
space for another 4 @code{int}s,
|
||
|
||
@node Dynamic Memory Allocation
|
||
@section Dynamic Memory Allocation
|
||
@cindex dynamic memory allocation
|
||
@cindex memory allocation, dynamic
|
||
@cindex allocating memory dynamically
|
||
|
||
To allocate an object dynamically, call the library function
|
||
@code{malloc} (@pxref{Basic Allocation, The GNU C Library,, libc, The GNU C Library
|
||
Reference Manual}). Here is how to allocate an object of type
|
||
@code{struct intlistlink}. To make this code work, include the file
|
||
@file{stdlib.h}, like this:
|
||
|
||
@example
|
||
#include <stddef.h> /* @r{Defines @code{NULL}.} */
|
||
#include <stdlib.h> /* @r{Declares @code{malloc}.} */
|
||
|
||
@dots{}
|
||
|
||
struct intlistlink *
|
||
alloc_intlistlink ()
|
||
@{
|
||
struct intlistlink *p;
|
||
|
||
p = malloc (sizeof (struct intlistlink));
|
||
|
||
if (p == NULL)
|
||
fatal ("Ran out of storage");
|
||
|
||
/* @r{Initialize the contents.} */
|
||
p->datum = 0;
|
||
p->next = NULL;
|
||
|
||
return p;
|
||
@}
|
||
@end example
|
||
|
||
@noindent
|
||
@code{malloc} returns @code{void *}, so the assignment to @code{p}
|
||
will automatically convert it to type @code{struct intlistlink *}.
|
||
The return value of @code{malloc} is always sufficiently aligned
|
||
(@pxref{Type Alignment}) that it is valid for any data type.
|
||
|
||
The test for @code{p == NULL} is necessary because @code{malloc}
|
||
returns a null pointer if it cannot get any storage. We assume that
|
||
the program defines the function @code{fatal} to report a fatal error
|
||
to the user.
|
||
|
||
Here's how to add one more integer to the front of such a list:
|
||
|
||
@example
|
||
struct intlistlink *my_list = NULL;
|
||
|
||
void
|
||
add_to_mylist (int my_int)
|
||
@{
|
||
struct intlistlink *p = alloc_intlistlink ();
|
||
|
||
p->datum = my_int;
|
||
p->next = mylist;
|
||
mylist = p;
|
||
@}
|
||
@end example
|
||
|
||
The way to free the objects is by calling @code{free}. Here's
|
||
a function to free all the links in one of these lists:
|
||
|
||
@example
|
||
void
|
||
free_intlist (struct intlistlink *p)
|
||
@{
|
||
while (p)
|
||
@{
|
||
struct intlistlink *q = p;
|
||
p = p->next;
|
||
free (q);
|
||
@}
|
||
@}
|
||
@end example
|
||
|
||
We must extract the @code{next} pointer from the object before freeing
|
||
it, because @code{free} can clobber the data that was in the object.
|
||
For the same reason, the program must not use the list any more after
|
||
freeing its elements. To make sure it won't, it is best to clear out
|
||
the variable where the list was stored, like this:
|
||
|
||
@example
|
||
free_intlist (mylist);
|
||
|
||
mylist = NULL;
|
||
@end example
|
||
|
||
@node Field Offset
|
||
@section Field Offset
|
||
@cindex field offset
|
||
@cindex structure field offset
|
||
@cindex offset of structure fields
|
||
|
||
To determine the offset of a given field @var{field} in a structure
|
||
type @var{type}, use the macro @code{offsetof}, which is defined in
|
||
the file @file{stddef.h}. It is used like this:
|
||
|
||
@example
|
||
offsetof (@var{type}, @var{field})
|
||
@end example
|
||
|
||
Here is an example:
|
||
|
||
@example
|
||
struct foo
|
||
@{
|
||
int element;
|
||
struct foo *next;
|
||
@};
|
||
|
||
offsetof (struct foo, next)
|
||
/* @r{On most machines that is 4. It may be 8.} */
|
||
@end example
|
||
|
||
@node Structure Layout
|
||
@section Structure Layout
|
||
@cindex structure layout
|
||
@cindex layout of structures
|
||
|
||
The rest of this chapter covers advanced topics about structures. If
|
||
you are just learning C, you can skip it.
|
||
|
||
The precise layout of a @code{struct} type is crucial when using it to
|
||
overlay hardware registers, to access data structures in shared
|
||
memory, or to assemble and disassemble packets for network
|
||
communication. It is also important for avoiding memory waste when
|
||
the program makes many objects of that type. However, the layout
|
||
depends on the target platform. Each platform has conventions for
|
||
structure layout, which compilers need to follow.
|
||
|
||
Here are the conventions used on most platforms.
|
||
|
||
The structure's fields appear in the structure layout in the order
|
||
they are declared. When possible, consecutive fields occupy
|
||
consecutive bytes within the structure. However, if a field's type
|
||
demands more alignment than it would get that way, C gives it the
|
||
alignment it requires by leaving a gap after the previous field.
|
||
|
||
@cindex structure alignment
|
||
@cindex alignment of structures
|
||
Once all the fields have been laid out, it is possible to determine
|
||
the structure's alignment and size. The structure's alignment is the
|
||
maximum alignment of any of the fields in it. Then the structure's
|
||
size is rounded up to a multiple of its alignment. That may require
|
||
leaving a gap at the end of the structure.
|
||
|
||
Here are some examples, where we assume that @code{char} has size and
|
||
alignment 1 (always true), and @code{int} has size and alignment 4
|
||
(true on most kinds of computers):
|
||
|
||
@example
|
||
struct foo
|
||
@{
|
||
char a, b;
|
||
int c;
|
||
@};
|
||
@end example
|
||
|
||
@noindent
|
||
This structure occupies 8 bytes, with an alignment of 4. @code{a} is
|
||
at offset 0, @code{b} is at offset 1, and @code{c} is at offset 4.
|
||
There is a gap of 2 bytes before @code{c}.
|
||
|
||
Contrast that with this structure:
|
||
|
||
@example
|
||
struct foo
|
||
@{
|
||
char a;
|
||
int c;
|
||
char b;
|
||
@};
|
||
@end example
|
||
|
||
This structure has size 12 and alignment 4. @code{a} is at offset 0,
|
||
@code{c} is at offset 4, and @code{b} is at offset 8. There are two
|
||
gaps: three bytes before @code{c}, and three bytes at the end.
|
||
|
||
These two structures have the same contents at the C level, but one
|
||
takes 8 bytes and the other takes 12 bytes due to the ordering of the
|
||
fields. A reliable way to avoid this sort of wastage is to order the
|
||
fields by size, biggest fields first.
|
||
|
||
@node Packed Structures
|
||
@section Packed Structures
|
||
@cindex packed structures
|
||
@cindex @code{__attribute__((packed))}
|
||
|
||
In GNU C you can force a structure to be laid out with no gaps by
|
||
adding @code{__attribute__((packed))} after @code{struct} (or at the
|
||
end of the structure type declaration). Here's an example:
|
||
|
||
@example
|
||
struct __attribute__((packed)) foo
|
||
@{
|
||
char a;
|
||
int c;
|
||
char b;
|
||
@};
|
||
@end example
|
||
|
||
Without @code{__attribute__((packed))}, this structure occupies 12
|
||
bytes (as described in the previous section), assuming 4-byte
|
||
alignment for @code{int}. With @code{__attribute__((packed))}, it is
|
||
only 6 bytes long---the sum of the lengths of its fields.
|
||
|
||
Use of @code{__attribute__((packed))} often results in fields that
|
||
don't have the normal alignment for their types. Taking the address
|
||
of such a field can result in an invalid pointer because of its
|
||
improper alignment. Dereferencing such a pointer can cause a
|
||
@code{SIGSEGV} signal on a machine that doesn't, in general, allow
|
||
unaligned pointers.
|
||
|
||
@xref{Attributes}.
|
||
|
||
@node Bit Fields
|
||
@section Bit Fields
|
||
@cindex bit fields
|
||
|
||
A structure field declaration with an integer type can specify the
|
||
number of bits the field should occupy. We call that a @dfn{bit
|
||
field}. These are useful because consecutive bit fields are packed
|
||
into a larger storage unit. For instance,
|
||
|
||
@example
|
||
unsigned char opcode: 4;
|
||
@end example
|
||
|
||
@noindent
|
||
specifies that this field takes just 4 bits.
|
||
Since it is unsigned, its possible values range
|
||
from 0 to 15. A signed field with 4 bits, such as this,
|
||
|
||
@example
|
||
signed char small: 4;
|
||
@end example
|
||
|
||
@noindent
|
||
can hold values from -8 to 7.
|
||
|
||
You can subdivide a single byte into those two parts by writing
|
||
|
||
@example
|
||
unsigned char opcode: 4;
|
||
signed char small: 4;
|
||
@end example
|
||
|
||
@noindent
|
||
in the structure. With bit fields, these two numbers fit into
|
||
a single @code{char}.
|
||
|
||
Here's how to declare a one-bit field that can hold either 0 or 1:
|
||
|
||
@example
|
||
unsigned char special_flag: 1;
|
||
@end example
|
||
|
||
You can also use the @code{bool} type for bit fields:
|
||
|
||
@example
|
||
bool special_flag: 1;
|
||
@end example
|
||
|
||
Except when using @code{bool} (which is always unsigned,
|
||
@pxref{Boolean Type}), always specify @code{signed} or @code{unsigned}
|
||
for a bit field. There is a default, if that's not specified: the bit
|
||
field is signed if plain @code{char} is signed, except that the option
|
||
@option{-funsigned-bitfields} forces unsigned as the default. But it
|
||
is cleaner not to depend on this default.
|
||
|
||
Bit fields are special in that you cannot take their address with
|
||
@samp{&}. They are not stored with the size and alignment appropriate
|
||
for the specified type, so they cannot be addressed through pointers
|
||
to that type.
|
||
|
||
@node Bit Field Packing
|
||
@section Bit Field Packing
|
||
|
||
Programs to communicate with low-level hardware interfaces need to
|
||
define bit fields laid out to match the hardware data. This section
|
||
explains how to do that.
|
||
|
||
Consecutive bit fields are packed together, but each bit field must
|
||
fit within a single object of its specified type. In this example,
|
||
|
||
@example
|
||
unsigned short a : 3, b : 3, c : 3, d : 3, e : 3;
|
||
@end example
|
||
|
||
@noindent
|
||
all five fields fit consecutively into one two-byte @code{short}.
|
||
They need 15 bits, and one @code{short} provides 16. By contrast,
|
||
|
||
@example
|
||
unsigned char a : 3, b : 3, c : 3, d : 3, e : 3;
|
||
@end example
|
||
|
||
@noindent
|
||
needs three bytes. It fits @code{a} and @code{b} into one
|
||
@code{char}, but @code{c} won't fit in that @code{char} (they would
|
||
add up to 9 bits). So @code{c} and @code{d} go into a second
|
||
@code{char}, leaving a gap of two bits between @code{b} and @code{c}.
|
||
Then @code{e} needs a third @code{char}. By contrast,
|
||
|
||
@example
|
||
unsigned char a : 3, b : 3;
|
||
unsigned int c : 3;
|
||
unsigned char d : 3, e : 3;
|
||
@end example
|
||
|
||
@noindent
|
||
needs only two bytes: the type @code{unsigned int}
|
||
allows @code{c} to straddle bytes that are in the same word.
|
||
|
||
You can leave a gap of a specified number of bits by defining a
|
||
nameless bit field. This looks like @code{@var{type} : @var{nbits};}.
|
||
It is allocated space in the structure just as a named bit field would
|
||
be allocated.
|
||
|
||
You can force the following bit field to advance to the following
|
||
aligned memory object with @code{@var{type} : 0;}.
|
||
|
||
Both of these constructs can syntactically share @var{type} with
|
||
ordinary bit fields. This example illustrates both:
|
||
|
||
@example
|
||
unsigned int a : 5, : 3, b : 5, : 0, c : 5, : 3, d : 5;
|
||
@end example
|
||
|
||
@noindent
|
||
It puts @code{a} and @code{b} into one @code{int}, with a 3-bit gap
|
||
between them. Then @code{: 0} advances to the next @code{int},
|
||
so @code{c} and @code{d} fit into that one.
|
||
|
||
These rules for packing bit fields apply to most target platforms,
|
||
including all the usual real computers. A few embedded controllers
|
||
have special layout rules.
|
||
|
||
@node const Fields
|
||
@section @code{const} Fields
|
||
@cindex const fields
|
||
@cindex structure fields, constant
|
||
|
||
@c ??? Is this a C standard feature?
|
||
|
||
A structure field declared @code{const} cannot be assigned to
|
||
(@pxref{const}). For instance, let's define this modified version of
|
||
@code{struct intlistlink}:
|
||
|
||
@example
|
||
struct intlistlink_ro /* @r{``ro'' for read-only.} */
|
||
@{
|
||
const int datum;
|
||
struct intlistlink *next;
|
||
@};
|
||
@end example
|
||
|
||
This structure can be used to prevent part of the code from modifying
|
||
the @code{datum} field:
|
||
|
||
@example
|
||
/* @r{@code{p} has type @code{struct intlistlink *}.}
|
||
@r{Convert it to @code{struct intlistlink_ro *}.} */
|
||
struct intlistlink_ro *q
|
||
= (struct intlistlink_ro *) p;
|
||
|
||
q->datum = 5; /* @r{Error!} */
|
||
p->datum = 5; /* @r{Valid since @code{*p} is}
|
||
@r{not a @code{struct intlistlink_ro}.} */
|
||
@end example
|
||
|
||
A @code{const} field can get a value in two ways: by initialization of
|
||
the whole structure, and by making a pointer-to-structure point to an object
|
||
in which that field already has a value.
|
||
|
||
Any @code{const} field in a structure type makes assignment impossible
|
||
for structures of that type (@pxref{Structure Assignment}). That is
|
||
because structure assignment works by assigning the structure's
|
||
fields, one by one.
|
||
|
||
@node Zero Length
|
||
@section Arrays of Length Zero
|
||
@cindex array of length zero
|
||
@cindex zero-length arrays
|
||
@cindex length-zero arrays
|
||
|
||
GNU C allows zero-length arrays. They are useful as the last field
|
||
of a structure that is really a header for a variable-length object.
|
||
Here's an example, where we construct a variable-size structure
|
||
to hold a line which is @code{this_length} characters long:
|
||
|
||
@example
|
||
struct line @{
|
||
int length;
|
||
char contents[0];
|
||
@};
|
||
|
||
struct line *thisline
|
||
= ((struct line *)
|
||
malloc (sizeof (struct line)
|
||
+ this_length));
|
||
thisline->length = this_length;
|
||
@end example
|
||
|
||
In ISO C90, we would have to give @code{contents} a length of 1, which
|
||
means either wasting space or complicating the argument to @code{malloc}.
|
||
|
||
@node Flexible Array Fields
|
||
@section Flexible Array Fields
|
||
@cindex flexible array fields
|
||
@cindex array fields, flexible
|
||
|
||
The C99 standard adopted a more complex equivalent of zero-length
|
||
array fields. It's called a @dfn{flexible array}, and it's indicated
|
||
by omitting the length, like this:
|
||
|
||
@example
|
||
struct line
|
||
@{
|
||
int length;
|
||
char contents[];
|
||
@};
|
||
@end example
|
||
|
||
The flexible array has to be the last field in the structure, and there
|
||
must be other fields before it.
|
||
|
||
Under the C standard, a structure with a flexible array can't be part
|
||
of another structure, and can't be an element of an array.
|
||
|
||
GNU C allows static initialization of flexible array fields. The effect
|
||
is to ``make the array long enough'' for the initializer.
|
||
|
||
@example
|
||
struct f1 @{ int x; int y[]; @} f1
|
||
= @{ 1, @{ 2, 3, 4 @} @};
|
||
@end example
|
||
|
||
@noindent
|
||
This defines a structure variable named @code{f1}
|
||
whose type is @code{struct f1}. In C, a variable name or function name
|
||
never conflicts with a structure type tag.
|
||
|
||
Omitting the flexible array field's size lets the initializer
|
||
determine it. This is allowed only when the flexible array is defined
|
||
in the outermost structure and you declare a variable of that
|
||
structure type. For example:
|
||
|
||
@example
|
||
struct foo @{ int x; int y[]; @};
|
||
struct bar @{ struct foo z; @};
|
||
|
||
struct foo a = @{ 1, @{ 2, 3, 4 @} @}; // @r{Valid.}
|
||
struct bar b = @{ @{ 1, @{ 2, 3, 4 @} @} @}; // @r{Invalid.}
|
||
struct bar c = @{ @{ 1, @{ @} @} @}; // @r{Valid.}
|
||
struct foo d[1] = @{ @{ 1 @{ 2, 3, 4 @} @} @}; // @r{Invalid.}
|
||
@end example
|
||
|
||
@node Overlaying Structures
|
||
@section Overlaying Different Structures
|
||
@cindex overlaying structures
|
||
@cindex structures, overlaying
|
||
|
||
Be careful about using different structure types to refer to the same
|
||
memory within one function, because GNU C can optimize code assuming
|
||
it never does that. @xref{Aliasing}. Here's an example of the kind of
|
||
aliasing that can cause the problem:
|
||
|
||
@example
|
||
struct a @{ int size; char *data; @};
|
||
struct b @{ int size; char *data; @};
|
||
struct a foo;
|
||
struct a *p = &foo;
|
||
struct b *q = (struct b *) &foo;
|
||
@end example
|
||
|
||
Here @code{q} points to the same memory that the variable @code{foo}
|
||
occupies, but they have two different types. The two types
|
||
@code{struct a} and @code{struct b} are defined alike, but they are
|
||
not the same type. Interspersing references using the two types,
|
||
like this,
|
||
|
||
@example
|
||
p->size = 0;
|
||
q->size = 1;
|
||
x = p->size;
|
||
@end example
|
||
|
||
@noindent
|
||
allows GNU C to assume that @code{p->size} is still zero when it is
|
||
copied into @code{x}. The GNU C compiler ``knows'' that @code{q}
|
||
points to a @code{struct b} and this is not supposed to overlap with a
|
||
@code{struct a}. Other compilers might also do this optimization.
|
||
|
||
The ISO C standard considers such code erroneous, precisely so that
|
||
this optimization will not be incorrect.
|
||
|
||
@node Structure Assignment
|
||
@section Structure Assignment
|
||
@cindex structure assignment
|
||
@cindex assigning structures
|
||
|
||
Assignment operating on a structure type copies the structure. The
|
||
left and right operands must have the same type. Here is an example:
|
||
|
||
@example
|
||
#include <stddef.h> /* @r{Defines @code{NULL}.} */
|
||
#include <stdlib.h> /* @r{Declares @code{malloc}.} */
|
||
@r{@dots{}}
|
||
|
||
struct point @{ double x, y; @};
|
||
|
||
struct point *
|
||
copy_point (struct point point)
|
||
@{
|
||
struct point *p
|
||
= (struct point *) malloc (sizeof (struct point));
|
||
if (p == NULL)
|
||
fatal ("Out of memory");
|
||
*p = point;
|
||
return p;
|
||
@}
|
||
@end example
|
||
|
||
Notionally, assignment on a structure type works by copying each of
|
||
the fields. Thus, if any of the fields has the @code{const}
|
||
qualifier, that structure type does not allow assignment:
|
||
|
||
@example
|
||
struct point @{ const double x, y; @};
|
||
|
||
struct point a, b;
|
||
|
||
a = b; /* @r{Error!} */
|
||
@end example
|
||
|
||
@xref{Assignment Expressions}.
|
||
|
||
When a structure type has a field which is an array, as here,
|
||
|
||
@example
|
||
struct record
|
||
@{
|
||
char *name;
|
||
int data[4];
|
||
@};
|
||
|
||
struct record r1, r2;
|
||
@end example
|
||
|
||
@noindent
|
||
structure assignment such as @code{r1 = r2} copies array fields'
|
||
contents just as it copies all the other fields.
|
||
|
||
This is the only way in C that you can operate on the whole contents
|
||
of a array with one operation: when the array is contained in a
|
||
@code{struct}. You can't copy the contents of the @code{data} field
|
||
as an array, because
|
||
|
||
@example
|
||
r1.data = r2.data;
|
||
@end example
|
||
|
||
@noindent
|
||
would convert the array objects (as always) to pointers to the zeroth
|
||
elements of the arrays (of type @code{int *}), and the
|
||
assignment would be invalid because the left operand is not an lvalue.
|
||
|
||
@node Unions
|
||
@section Unions
|
||
@cindex unions
|
||
@findex union
|
||
|
||
A @dfn{union type} defines alternative ways of looking at the same
|
||
piece of memory. Each alternative view is defined with a data type,
|
||
and identified by a name. Because a union combines various types, it
|
||
is considered a @dfn{compound type}, like structures
|
||
(@pxref{Structures}). A union definition looks like this:
|
||
|
||
@example
|
||
union @var{name}
|
||
@{
|
||
@var{alternative declarations}@r{@dots{}}
|
||
@};
|
||
@end example
|
||
|
||
Each alternative declaration looks like a structure field declaration,
|
||
except that it can't be a bit field. For instance,
|
||
|
||
@example
|
||
union number
|
||
@{
|
||
long int integer;
|
||
double float;
|
||
@}
|
||
@end example
|
||
|
||
@noindent
|
||
lets you store either an integer (type @code{long int}) or a floating
|
||
point number (type @code{double}) in the same place in memory. The
|
||
length and alignment of the union type are the maximum of all the
|
||
alternatives---they do not have to be the same. In this union
|
||
example, @code{double} probably takes more space than @code{long int},
|
||
but that doesn't cause a problem in programs that use the union in the
|
||
normal way.
|
||
|
||
The members don't have to be different in data type. Sometimes
|
||
each member pertains to a way the data will be used. For instance,
|
||
|
||
@example
|
||
union datum
|
||
@{
|
||
double latitude;
|
||
double longitude;
|
||
double height;
|
||
double weight;
|
||
int continent;
|
||
@}
|
||
@end example
|
||
|
||
This union holds one of several kinds of data; most kinds are floating
|
||
points, but the value can also be a code for a continent which is an
|
||
integer. You @emph{could} use one member of type @code{double} to
|
||
access all the values which have that type, but the different member
|
||
names will make the program clearer.
|
||
|
||
The alignment of a union type is the maximum of the alignments of the
|
||
alternatives. The size of the union type is the maximum of the sizes
|
||
of the alternatives, rounded up to a multiple of the alignment
|
||
(because every type's size must be a multiple of its alignment).
|
||
|
||
All the union alternatives start at the address of the union itself.
|
||
If an alternative is shorter than the union as a whole, it occupies
|
||
the first part of the union's storage, leaving the last part unused
|
||
@emph{for that alternative}.
|
||
|
||
@strong{Warning:} If the code stores data using one union alternative
|
||
and accesses it with another, the results depend on the kind of
|
||
computer in use. Only wizards should try to do this. However, when
|
||
you need to do this, a union is a clean way to do it.
|
||
|
||
Assignment works on any union type by copying the entire value.
|
||
|
||
@node Packing With Unions
|
||
@section Packing With Unions
|
||
|
||
Sometimes we design a union with the intention of packing various
|
||
kinds of objects into a certain amount of memory space. For example.
|
||
|
||
@example
|
||
union bytes8
|
||
@{
|
||
long long big_int_elt;
|
||
double double_elt;
|
||
struct @{ int first, second; @} two_ints;
|
||
struct @{ void *first, *second; @} two_ptrs;
|
||
@};
|
||
|
||
union bytes8 *p;
|
||
@end example
|
||
|
||
This union makes it possible to look at 8 bytes of data that @code{p}
|
||
points to as a single 8-byte integer (@code{p->big_int_elt}), as a
|
||
single floating-point number (@code{p->double_elt}), as a pair of
|
||
integers (@code{p->two_ints.first} and @code{p->two_ints.second}), or
|
||
as a pair of pointers (@code{p->two_ptrs.first} and
|
||
@code{p->two_ptrs.second}).
|
||
|
||
To pack storage with such a union makes assumptions about the sizes of
|
||
all the types involved. This particular union was written expecting a
|
||
pointer to have the same size as @code{int}. On a machine where one
|
||
pointer takes 8 bytes, the code using this union probably won't work
|
||
as expected. The union, as such, will function correctly---if you
|
||
store two values through @code{two_ints} and extract them through
|
||
@code{two_ints}, you will get the same integers back---but the part of
|
||
the program that expects the union to be 8 bytes long could
|
||
malfunction, or at least use too much space.
|
||
|
||
The above example shows one case where a @code{struct} type with no
|
||
tag can be useful. Another way to get effectively the same result
|
||
is with arrays as members of the union:
|
||
|
||
@example
|
||
union eight_bytes
|
||
@{
|
||
long long big_int_elt;
|
||
double double_elt;
|
||
int two_ints[2];
|
||
void *two_ptrs[2];
|
||
@};
|
||
@end example
|
||
|
||
@node Cast to Union
|
||
@section Cast to a Union Type
|
||
@cindex cast to a union
|
||
@cindex union, casting to a
|
||
|
||
In GNU C, you can explicitly cast any of the alternative types to the
|
||
union type; for instance,
|
||
|
||
@example
|
||
(union eight_bytes) (long long) 5
|
||
@end example
|
||
|
||
@noindent
|
||
makes a value of type @code{union eight_bytes} which gets its contents
|
||
through the alternative named @code{big_int_elt}.
|
||
|
||
The value being cast must exactly match the type of the alternative,
|
||
so this is not valid:
|
||
|
||
@example
|
||
(union eight_bytes) 5 /* @r{Error! 5 is @code{int}.} */
|
||
@end example
|
||
|
||
A cast to union type looks like any other cast, except that the type
|
||
specified is a union type. You can specify the type either with
|
||
@code{union @var{tag}} or with a typedef name (@pxref{Defining
|
||
Typedef Names}).
|
||
|
||
Using the cast as the right-hand side of an assignment to a variable of
|
||
union type is equivalent to storing in an alternative of the union:
|
||
|
||
@example
|
||
/* @r{Define the union @code{foo}.} */
|
||
union foo @{ int i; double d; @};
|
||
|
||
/* @r{Declare the union-valued variable, @code{u}.} */
|
||
union foo u;
|
||
|
||
int x; double y;
|
||
|
||
u = (union foo) x @r{means} u.i = x
|
||
|
||
u = (union foo) y @r{means} u.d = y
|
||
@end example
|
||
|
||
You can also use the union cast as a function argument:
|
||
|
||
@example
|
||
void hack (union foo);
|
||
@r{@dots{}}
|
||
hack ((union foo) x);
|
||
@end example
|
||
|
||
@node Structure Constructors
|
||
@section Structure Constructors
|
||
@cindex structure constructors
|
||
@cindex constructors, structure
|
||
|
||
You can construct a structure value by writing its type in
|
||
parentheses, followed by an initializer that would be valid in a
|
||
declaration for that type. For instance, given this declaration,
|
||
|
||
@example
|
||
struct foo @{int a; char b[2];@} structure;
|
||
@end example
|
||
|
||
@noindent
|
||
you can create a @code{struct foo} value as follows:
|
||
|
||
@example
|
||
((struct foo) @{x + y, 'a', 0@})
|
||
@end example
|
||
|
||
@noindent
|
||
This specifies @code{x + y} for field @code{a},
|
||
the character @samp{a} for field @code{b}'s element 0,
|
||
and the null character for field @code{b}'s element 1.
|
||
|
||
The parentheses around that constructor are not necessary, but we
|
||
recommend writing them to make the nesting of the containing
|
||
expression clearer.
|
||
|
||
You can also show the nesting of the two by writing it like
|
||
this:
|
||
|
||
@example
|
||
((struct foo) @{x + y, @{'a', 0@} @})
|
||
@end example
|
||
|
||
Each of those is equivalent to writing the following statement
|
||
expression (@pxref{Statement Exprs}):
|
||
|
||
@example
|
||
(@{
|
||
struct foo temp = @{x + y, 'a', 0@};
|
||
temp;
|
||
@})
|
||
@end example
|
||
|
||
You can also use field labels in the structure constructor to indicate
|
||
which fields you're specifying values for, instead of using the order
|
||
of the fields to specify that:
|
||
|
||
@example
|
||
(struct foo) @{.a = x + y, .b = @{'a', 0@}@}
|
||
@end example
|
||
|
||
You can also create a union value this way, but it is not especially
|
||
useful since that is equivalent to doing a cast:
|
||
|
||
@example
|
||
((union whosis) @{@var{value}@})
|
||
@r{is equivalent to}
|
||
((union whosis) (@var{value}))
|
||
@end example
|
||
|
||
@node Unnamed Types as Fields
|
||
@section Unnamed Types as Fields
|
||
@cindex unnamed structures
|
||
@cindex unnamed unions
|
||
@cindex structures, unnamed
|
||
@cindex unions, unnamed
|
||
|
||
A structure or a union can contain, as fields,
|
||
unnamed structures and unions. Here's an example:
|
||
|
||
@example
|
||
struct
|
||
@{
|
||
int a;
|
||
union
|
||
@{
|
||
int b;
|
||
float c;
|
||
@};
|
||
int d;
|
||
@} foo;
|
||
@end example
|
||
|
||
@noindent
|
||
You can access the fields of the unnamed union within @code{foo} as if they
|
||
were individual fields at the same level as the union definition:
|
||
|
||
@example
|
||
foo.a = 42;
|
||
foo.b = 47;
|
||
foo.c = 5.25; // @r{Overwrites the value in @code{foo.b}}.
|
||
foo.d = 314;
|
||
@end example
|
||
|
||
Avoid using field names that could cause ambiguity. For example, with
|
||
this definition:
|
||
|
||
@example
|
||
struct
|
||
@{
|
||
int a;
|
||
struct
|
||
@{
|
||
int a;
|
||
float b;
|
||
@};
|
||
@} foo;
|
||
@end example
|
||
|
||
@noindent
|
||
it is impossible to tell what @code{foo.a} refers to. GNU C reports
|
||
an error when a definition is ambiguous in this way.
|
||
|
||
@node Incomplete Types
|
||
@section Incomplete Types
|
||
@cindex incomplete types
|
||
@cindex types, incomplete
|
||
|
||
A type that has not been fully defined is called an @dfn{incomplete
|
||
type}. Structure and union types are incomplete when the code makes a
|
||
forward reference, such as @code{struct foo}, before defining the
|
||
type. An array type is incomplete when its length is unspecified.
|
||
|
||
You can't use an incomplete type to declare a variable or field, or
|
||
use it for a function parameter or return type. The operators
|
||
@code{sizeof} and @code{_Alignof} give errors when used on an
|
||
incomplete type.
|
||
|
||
However, you can define a pointer to an incomplete type, and declare a
|
||
variable or field with such a pointer type. In general, you can do
|
||
everything with such pointers except dereference them, increment or
|
||
decrement them, or do pointer arithmetic with them (not even @code{p +
|
||
0}). For example:
|
||
|
||
@example
|
||
extern void bar (struct mysterious_value *);
|
||
|
||
void
|
||
foo (struct mysterious_value *arg)
|
||
@{
|
||
bar (arg);
|
||
@}
|
||
|
||
@r{@dots{}}
|
||
|
||
@{
|
||
struct mysterious_value *p, **q;
|
||
|
||
p = *q;
|
||
foo (p);
|
||
@}
|
||
@end example
|
||
|
||
@noindent
|
||
These examples are valid because the code doesn't try to understand
|
||
what @code{p} points to; it just passes the pointer around.
|
||
(Presumably @code{bar} is defined in some other file that really does
|
||
have a definition for @code{struct mysterious_value}.) However,
|
||
dereferencing the pointer would get an error; that requires a
|
||
definition for the structure type.
|
||
|
||
@node Intertwined Incomplete Types
|
||
@section Intertwined Incomplete Types
|
||
|
||
When several structure types contain pointers to each other, you can
|
||
define the types in any order because pointers to types that come
|
||
later are incomplete types. Here is an example.
|
||
|
||
@example
|
||
/* @r{An employee record points to a group.} */
|
||
struct employee
|
||
@{
|
||
char *name;
|
||
@r{@dots{}}
|
||
struct group *group; /* @r{incomplete type.} */
|
||
@r{@dots{}}
|
||
@};
|
||
|
||
/* @r{An employee list points to employees.} */
|
||
struct employee_list
|
||
@{
|
||
struct employee *this_one;
|
||
struct employee_list *next; /* @r{incomplete type.} */
|
||
@r{@dots{}}
|
||
@};
|
||
|
||
/* @r{A group points to one employee_list.} */
|
||
struct group
|
||
@{
|
||
char *name;
|
||
@r{@dots{}}
|
||
struct employee_list *employees;
|
||
@r{@dots{}}
|
||
@};
|
||
@end example
|
||
|
||
@node Type Tags
|
||
@section Type Tags
|
||
@cindex type tags
|
||
|
||
The name that follows @code{struct} (@pxref{Structures}), @code{union}
|
||
(@pxref{Unions}, or @code{enum} (@pxref{Enumeration Types}) is called
|
||
a @dfn{type tag}. In C, a type tag never conflicts with a variable
|
||
name or function name; the type tags have a separate @dfn{name space}.
|
||
Thus, there is no name conflict in this code:
|
||
|
||
@example
|
||
struct pair @{ int a, b; @};
|
||
int pair = 1;
|
||
@end example
|
||
|
||
@noindent
|
||
nor in this one:
|
||
|
||
@example
|
||
struct pair @{ int a, b; @} pair;
|
||
@end example
|
||
|
||
@noindent
|
||
where @code{pair} is both a structure type tag and a variable name.
|
||
|
||
However, @code{struct}, @code{union}, and @code{enum} share the same
|
||
name space of tags, so this is a conflict:
|
||
|
||
@example
|
||
struct pair @{ int a, b; @};
|
||
enum pair @{ c, d @};
|
||
@end example
|
||
|
||
@noindent
|
||
and so is this:
|
||
|
||
@example
|
||
struct pair @{ int a, b; @};
|
||
struct pair @{ int c, d; @};
|
||
@end example
|
||
|
||
When the code defines a type tag inside a block, the tag's scope is
|
||
limited to that block (as for local variables). Two definitions for
|
||
one type tag do not conflict if they are in different scopes; rather,
|
||
each is valid in its scope. For example,
|
||
|
||
@example
|
||
struct pair @{ int a, b; @};
|
||
|
||
void
|
||
pair_up_doubles (int len, double array[])
|
||
@{
|
||
struct pair @{ double a, b; @};
|
||
@r{@dots{}}
|
||
@}
|
||
@end example
|
||
|
||
@noindent
|
||
has two definitions for @code{struct pair} which do not conflict. The
|
||
one inside the function applies only within the definition of
|
||
@code{pair_up_doubles}. Within its scope, that definition
|
||
@dfn{shadows} the outer definition.
|
||
|
||
If @code{struct pair} appears inside the function body, before the
|
||
inner definition, it refers to the outer definition---the only one
|
||
that has been seen at that point. Thus, in this code,
|
||
|
||
@example
|
||
struct pair @{ int a, b; @};
|
||
|
||
void
|
||
pair_up_doubles (int len, double array[])
|
||
@{
|
||
struct two_pairs @{ struct pair *p, *q; @};
|
||
struct pair @{ double a, b; @};
|
||
@r{@dots{}}
|
||
@}
|
||
@end example
|
||
|
||
@noindent
|
||
the structure @code{two_pairs} has pointers to the outer definition of
|
||
@code{struct pair}, which is probably not desirable.
|
||
|
||
To prevent that, you can write @code{struct pair;} inside the function
|
||
body as a variable declaration with no variables. This is a
|
||
@dfn{forward declaration} of the type tag @code{pair}: it makes the
|
||
type tag local to the current block, with the details of the type to
|
||
come later. Here's an example:
|
||
|
||
@example
|
||
void
|
||
pair_up_doubles (int len, double array[])
|
||
@{
|
||
/* @r{Forward declaration for @code{pair}.} */
|
||
struct pair;
|
||
struct two_pairs @{ struct pair *p, *q; @};
|
||
/* @r{Give the details.} */
|
||
struct pair @{ double a, b; @};
|
||
@r{@dots{}}
|
||
@}
|
||
@end example
|
||
|
||
However, the cleanest practice is to avoid shadowing type tags.
|
||
|
||
@node Arrays
|
||
@chapter Arrays
|
||
@cindex array
|
||
@cindex elements of arrays
|
||
|
||
An @dfn{array} is a data object that holds a series of @dfn{elements},
|
||
all of the same data type. Each element is identified by its numeric
|
||
@var{index} within the array.
|
||
|
||
We presented arrays of numbers in the sample programs early in this
|
||
manual (@pxref{Array Example}). However, arrays can have elements of
|
||
any data type, including pointers, structures, unions, and other
|
||
arrays.
|
||
|
||
If you know another programming language, you may suppose that you know all
|
||
about arrays, but C arrays have special quirks, so in this chapter we
|
||
collect all the information about arrays in C@.
|
||
|
||
The elements of a C array are allocated consecutively in memory,
|
||
with no gaps between them. Each element is aligned as required
|
||
for its data type (@pxref{Type Alignment}).
|
||
|
||
@menu
|
||
* Accessing Array Elements:: How to access individual elements of an array.
|
||
* Declaring an Array:: How to name and reserve space for a new array.
|
||
* Strings:: A string in C is a special case of array.
|
||
* Array Type Designators:: Referring to a specific array type.
|
||
* Incomplete Array Types:: Naming, but not allocating, a new array.
|
||
* Limitations of C Arrays:: Arrays are not first-class objects.
|
||
* Multidimensional Arrays:: Arrays of arrays.
|
||
* Constructing Array Values:: Assigning values to an entire array at once.
|
||
* Arrays of Variable Length:: Declaring arrays of non-constant size.
|
||
@end menu
|
||
|
||
@node Accessing Array Elements
|
||
@section Accessing Array Elements
|
||
@cindex accessing array elements
|
||
@cindex array elements, accessing
|
||
|
||
If the variable @code{a} is an array, the @var{n}th element of
|
||
@code{a} is @code{a[@var{n}]}. You can use that expression to access
|
||
an element's value or to assign to it:
|
||
|
||
@example
|
||
x = a[5];
|
||
a[6] = 1;
|
||
@end example
|
||
|
||
@noindent
|
||
Since the variable @code{a} is an lvalue, @code{a[@var{n}]} is also an
|
||
lvalue.
|
||
|
||
The lowest valid index in an array is 0, @emph{not} 1, and the highest
|
||
valid index is one less than the number of elements.
|
||
|
||
The C language does not check whether array indices are in bounds, so
|
||
if the code uses an out-of-range index, it will access memory outside the
|
||
array.
|
||
|
||
@strong{Warning:} Using only valid index values in C is the
|
||
programmer's responsibility.
|
||
|
||
Array indexing in C is not a primitive operation: it is defined in
|
||
terms of pointer arithmetic and dereferencing. Now that we know
|
||
@emph{what} @code{a[i]} does, we can ask @emph{how} @code{a[i]} does
|
||
its job.
|
||
|
||
In C, @code{@var{x}[@var{y}]} is an abbreviation for
|
||
@code{*(@var{x}+@var{y})}. Thus, @code{a[i]} really means
|
||
@code{*(a+i)}. @xref{Pointers and Arrays}.
|
||
|
||
When an expression with array type (such as @code{a}) appears as part
|
||
of a larger C expression, it is converted automatically to a pointer
|
||
to element zero of that array. For instance, @code{a} in an
|
||
expression is equivalent to @code{&a[0]}. Thus, @code{*(a+i)} is
|
||
computed as @code{*(&a[0]+i)}.
|
||
|
||
Now we can analyze how that expression gives us the desired element of
|
||
the array. It makes a pointer to element 0 of @code{a}, advances it
|
||
by the value of @code{i}, and dereferences that pointer.
|
||
|
||
Another equivalent way to write the expression is @code{(&a[0])[i]}.
|
||
|
||
@node Declaring an Array
|
||
@section Declaring an Array
|
||
@cindex declaring an array
|
||
@cindex array, declaring
|
||
|
||
To make an array declaration, write @code{[@var{length}]} after the
|
||
name being declared. This construct is valid in the declaration of a
|
||
variable, a function parameter, a function value type (the value can't
|
||
be an array, but it can be a pointer to one), a structure field, or a
|
||
union alternative.
|
||
|
||
The surrounding declaration specifies the element type of the array;
|
||
that can be any type of data, but not @code{void} or a function type.
|
||
For instance,
|
||
|
||
@example
|
||
double a[5];
|
||
@end example
|
||
|
||
@noindent
|
||
declares @code{a} as an array of 5 @code{double}s.
|
||
|
||
@example
|
||
struct foo bstruct[length];
|
||
@end example
|
||
|
||
@noindent
|
||
declares @code{bstruct} as an array of @code{length} objects of type
|
||
@code{struct foo}. A variable array size like this is allowed when
|
||
the array is not file-scope.
|
||
|
||
Other declaration constructs can nest within the array declaration
|
||
construct. For instance:
|
||
|
||
@example
|
||
struct foo *b[length];
|
||
@end example
|
||
|
||
@noindent
|
||
declares @code{b} as an array of @code{length} pointers to
|
||
@code{struct foo}. This shows that the length need not be a constant
|
||
(@pxref{Arrays of Variable Length}).
|
||
|
||
@example
|
||
double (*c)[5];
|
||
@end example
|
||
|
||
@noindent
|
||
declares @code{c} as a pointer to an array of 5 @code{double}s, and
|
||
|
||
@example
|
||
char *(*f (int))[5];
|
||
@end example
|
||
|
||
@noindent
|
||
declares @code{f} as a function taking an @code{int} argument and
|
||
returning a pointer to an array of 5 strings (pointers to
|
||
@code{char}s).
|
||
|
||
@example
|
||
double aa[5][10];
|
||
@end example
|
||
|
||
@noindent
|
||
declares @code{aa} as an array of 5 elements, each of which is an
|
||
array of 10 @code{double}s. This shows how to declare a
|
||
multidimensional array in C (@pxref{Multidimensional Arrays}).
|
||
|
||
All these declarations specify the array's length, which is needed in
|
||
these cases in order to allocate storage for the array.
|
||
|
||
@node Strings
|
||
@section Strings
|
||
@cindex string
|
||
|
||
A string in C is a sequence of elements of type @code{char},
|
||
terminated with the null character, the character with code zero.
|
||
However, the C code that operates on strings normally uses
|
||
the pointer type @code{char *} to do it.
|
||
|
||
Programs often need to use strings with specific, fixed contents. To
|
||
write one in a C program, use a @dfn{string constant} such as
|
||
@code{"Take me to your leader!"}. The data type of a string constant
|
||
is @code{char *}. For the full syntactic details of writing string
|
||
constants, @ref{String Constants}.
|
||
|
||
To declare a place to store a non-constant string, declare an array of
|
||
@code{char}. Keep in mind that it must include one extra @code{char}
|
||
for the terminating null. For instance,
|
||
|
||
@example
|
||
char text[] = @{ 'H', 'e', 'l', 'l', 'o', 0 @};
|
||
@end example
|
||
|
||
@noindent
|
||
declares an array named @samp{text} with six elements---five letters
|
||
and the terminating null character. An equivalent way to get the same
|
||
result is this,
|
||
|
||
@example
|
||
char text[] = "Hello";
|
||
@end example
|
||
|
||
@noindent
|
||
which copies the elements of the string constant, including @emph{its}
|
||
terminating null character.
|
||
|
||
@example
|
||
char message[200];
|
||
@end example
|
||
|
||
@noindent
|
||
declares an array long enough to hold a string of 199 ASCII characters
|
||
plus the terminating null character.
|
||
|
||
When you store a string into @code{message} be sure to check or prove
|
||
that the length does not exceed its size. For example,
|
||
|
||
@example
|
||
void
|
||
set_message (char *text)
|
||
@{
|
||
int i;
|
||
/* @r{Recall that @code{message} is declared above.} */
|
||
for (i = 0; i < sizeof (message); i++)
|
||
@{
|
||
message[i] = text[i];
|
||
if (text[i] == 0)
|
||
return;
|
||
@}
|
||
fatal_error ("Message is too long for `message'\n");
|
||
@}
|
||
@end example
|
||
|
||
It's easy to do this with the standard library function
|
||
@code{strncpy}, which fills out the whole destination array (up to a
|
||
specified length) with null characters. Thus, if the last character
|
||
of the destination is not null, the string did not fit. Many system
|
||
libraries, including the GNU C library, hand-optimize @code{strncpy}
|
||
to run faster than an explicit @code{for}-loop.
|
||
|
||
Here's what the code looks like:
|
||
|
||
@example
|
||
void
|
||
set_message (char *text)
|
||
@{
|
||
strncpy (message, text, sizeof (message));
|
||
if (message[sizeof (message) - 1] != 0)
|
||
fatal_error ("Message is too long for `message');
|
||
@}
|
||
@end example
|
||
|
||
@xref{String and Array Utilities, The GNU C Library, , libc, The GNU C
|
||
Library Reference Manual}, for more information about the standard
|
||
library functions for operating on strings.
|
||
|
||
You can avoid putting a fixed length limit on strings you construct or
|
||
operate on by allocating the space for them dynamically.
|
||
@xref{Dynamic Memory Allocation}.
|
||
|
||
@node Array Type Designators
|
||
@section Array Type Designators
|
||
|
||
Every C type has a type designator, which you make by deleting the
|
||
variable name and the semicolon from a declaration (@pxref{Type
|
||
Designators}). The designators for array types follow this rule, but
|
||
they may appear surprising.
|
||
|
||
@example
|
||
@r{type} int a[5]; @r{designator} int [5]
|
||
@r{type} double a[5][3]; @r{designator} double [5][3]
|
||
@r{type} struct foo *a[5]; @r{designator} struct foo *[5]
|
||
@end example
|
||
|
||
@node Incomplete Array Types
|
||
@section Incomplete Array Types
|
||
@cindex incomplete array types
|
||
@cindex array types, incomplete
|
||
|
||
An array is equivalent, for most purposes, to a pointer to its zeroth
|
||
element. When that is true, the length of the array is irrelevant.
|
||
The length needs to be known only for allocating space for the array, or
|
||
for @code{sizeof} and @code{typeof} (@pxref{Auto Type}). Thus, in some
|
||
contexts C allows
|
||
|
||
@itemize @bullet
|
||
@item
|
||
An @code{extern} declaration says how to refer to a variable allocated
|
||
elsewhere. It does not need to allocate space for the variable,
|
||
so if it is an array, you can omit the length. For example,
|
||
|
||
@example
|
||
extern int foo[];
|
||
@end example
|
||
|
||
@item
|
||
When declaring a function parameter as an array, the argument value
|
||
passed to the function is really a pointer to the array's zeroth
|
||
element. This value does not say how long the array really is, there
|
||
is no need to declare it. For example,
|
||
|
||
@example
|
||
int
|
||
func (int foo[])
|
||
@end example
|
||
@end itemize
|
||
|
||
These declarations are examples of @dfn{incomplete} array types, types
|
||
that are not fully specified. The incompleteness makes no difference
|
||
for accessing elements of the array, but it matters for some other
|
||
things. For instance, @code{sizeof} is not allowed on an incomplete
|
||
type.
|
||
|
||
With multidimensional arrays, only the first dimension can be omitted.
|
||
For example, suppose we want to represent the positions of pieces on a
|
||
chessboard which has the usual 8 files (columns), but more (or fewer)
|
||
ranks (rows) than the usual 8. This declaration could hold a pointer
|
||
to a two-dimensional array that can hold that data. Each element of
|
||
the array holds one row.
|
||
|
||
@example
|
||
struct chesspiece *funnyboard[][8];
|
||
@end example
|
||
|
||
Since it is just a pointer to the start of an array, its type can be
|
||
incomplete, but it must state how big each array element is---the
|
||
number of elements in each row.
|
||
|
||
@node Limitations of C Arrays
|
||
@section Limitations of C Arrays
|
||
@cindex limitations of C arrays
|
||
@cindex first-class object
|
||
|
||
Arrays have quirks in C because they are not ``first-class objects'':
|
||
there is no way in C to operate on an array as a unit.
|
||
|
||
The other composite objects in C, structures and unions, are
|
||
first-class objects: a C program can copy a structure or union value
|
||
in an assignment, or pass one as an argument to a function, or make a
|
||
function return one. You can't do those things with an array in C@.
|
||
That is because a value you can operate on never has an array type.
|
||
|
||
An expression in C can have an array type, but that doesn't produce
|
||
the array as a value. Instead it is converted automatically to a
|
||
pointer to the array's element at index zero. The code can operate
|
||
on the pointer, and through that on individual elements of the array,
|
||
but it can't get and operate on the array as a unit.
|
||
|
||
There are three exceptions to this conversion rule, but none of them
|
||
offers a way to operate on the array as a whole.
|
||
|
||
First, @samp{&} applied to an expression with array type gives you the
|
||
address of the array, as an array type. However, you can't operate on the
|
||
whole array that way---if you apply @samp{*} to get the array back,
|
||
that expression converts, as usual, to a pointer to its zeroth
|
||
element.
|
||
|
||
Second, the operators @code{sizeof}, @code{_Alignof}, and
|
||
@code{typeof} do not convert the array to a pointer; they leave it as
|
||
an array. But they don't operate on the array's data---they only give
|
||
information about its type.
|
||
|
||
Third, a string constant used as an initializer for an array is not
|
||
converted to a pointer---rather, the declaration copies the
|
||
@emph{contents} of that string in that one special case.
|
||
|
||
You @emph{can} copy the contents of an array, just not with an
|
||
assignment operator. You can do it by calling the library function
|
||
@code{memcpy} or @code{memmove} (@pxref{Copying and Concatenation, The
|
||
GNU C Library, , libc, The GNU C Library Reference Manual}). Also,
|
||
when a structure contains just an array, you can copy that structure.
|
||
|
||
An array itself is an lvalue if it is a declared variable, or part of
|
||
a structure or union that is an lvalue. When you construct an array
|
||
from elements (@pxref{Constructing Array Values}), that array is not
|
||
an lvalue.
|
||
|
||
@node Multidimensional Arrays
|
||
@section Multidimensional Arrays
|
||
@cindex multidimensional arrays
|
||
@cindex array, multidimensional
|
||
|
||
Strictly speaking, all arrays in C are unidimensional. However, you
|
||
can create an array of arrays, which is more or less equivalent to a
|
||
multidimensional array. For example,
|
||
|
||
@example
|
||
struct chesspiece *board[8][8];
|
||
@end example
|
||
|
||
@noindent
|
||
declares an array of 8 arrays of 8 pointers to @code{struct
|
||
chesspiece}. This data type could represent the state of a chess
|
||
game. To access one square's contents requires two array index
|
||
operations, one for each dimension. For instance, you can write
|
||
@code{board[row][column]}, assuming @code{row} and @code{column}
|
||
are variables with integer values in the proper range.
|
||
|
||
How does C understand @code{board[row][column]}? First of all,
|
||
@code{board} is converted automatically to a pointer to the zeroth
|
||
element (at index zero) of @code{board}. Adding @code{row} to that
|
||
makes it point to the desired element. Thus, @code{board[row]}'s
|
||
value is an element of @code{board}---an array of 8 pointers.
|
||
|
||
However, as an expression with array type, it is converted
|
||
automatically to a pointer to the array's zeroth element. The second
|
||
array index operation, @code{[column]}, accesses the chosen element
|
||
from that array.
|
||
|
||
As this shows, pointer-to-array types are meaningful in C@.
|
||
You can declare a variable that points to a row in a chess board
|
||
like this:
|
||
|
||
@example
|
||
struct chesspiece *(*rowptr)[8];
|
||
@end example
|
||
|
||
@noindent
|
||
This points to an array of 8 pointers to @code{struct chesspiece}.
|
||
You can assign to it as follows:
|
||
|
||
@example
|
||
rowptr = &board[5];
|
||
@end example
|
||
|
||
The dimensions don't have to be equal in length. Here we declare
|
||
@code{statepop} as an array to hold the population of each state in
|
||
the United States for each year since 1900:
|
||
|
||
@example
|
||
#define NSTATES 50
|
||
@{
|
||
int nyears = current_year - 1900 + 1;
|
||
int statepop[NSTATES][nyears];
|
||
@r{@dots{}}
|
||
@}
|
||
@end example
|
||
|
||
The variable @code{statepop} is an array of @code{NSTATES} subarrays,
|
||
each indexed by the year (counting from 1900). Thus, to get the
|
||
element for a particular state and year, we must subscript it first
|
||
by the number that indicates the state, and second by the index for
|
||
the year:
|
||
|
||
@example
|
||
statepop[state][year - 1900]
|
||
@end example
|
||
|
||
@cindex array, layout in memory
|
||
The subarrays within the multidimensional array are allocated
|
||
consecutively in memory, and within each subarray, its elements are
|
||
allocated consecutively in memory. The most efficient way to process
|
||
all the elements in the array is to scan the last subscript in the
|
||
innermost loop. This means consecutive accesses go to consecutive
|
||
memory locations, which optimizes use of the processor's memory cache.
|
||
For example:
|
||
|
||
@example
|
||
int total = 0;
|
||
float average;
|
||
|
||
for (int state = 0; state < NSTATES, ++state)
|
||
@{
|
||
for (int year = 0; year < nyears; ++year)
|
||
@{
|
||
total += statepop[state][year];
|
||
@}
|
||
@}
|
||
|
||
average = total / nyears;
|
||
@end example
|
||
|
||
C's layout for multidimensional arrays is different from Fortran's
|
||
layout. In Fortran, a multidimensional array is not an array of
|
||
arrays; rather, multidimensional arrays are a primitive feature, and
|
||
it is the first index that varies most rapidly between consecutive
|
||
memory locations. Thus, the memory layout of a 50x114 array in C
|
||
matches that of a 114x50 array in Fortran.
|
||
|
||
@node Constructing Array Values
|
||
@section Constructing Array Values
|
||
@cindex constructing array values
|
||
@cindex array values, constructing
|
||
|
||
You can construct an array from elements by writing them inside
|
||
braces, and preceding all that with the array type's designator in
|
||
parentheses. There is no need to specify the array length, since the
|
||
number of elements determines that. The constructor looks like this:
|
||
|
||
@example
|
||
(@var{elttype}[]) @{ @var{elements} @};
|
||
@end example
|
||
|
||
Here is an example, which constructs an array of string pointers:
|
||
|
||
@example
|
||
(char *[]) @{ "x", "y", "z" @};
|
||
@end example
|
||
|
||
That's equivalent in effect to declaring an array with the same
|
||
initializer, like this:
|
||
|
||
@example
|
||
char *array[] = @{ "x", "y", "z" @};
|
||
@end example
|
||
|
||
and then using the array.
|
||
|
||
If all the elements are simple constant expressions, or made up of
|
||
such, then the compound literal can be coerced to a pointer to its
|
||
zeroth element and used to initialize a file-scope variable
|
||
(@pxref{File-Scope Variables}), as shown here:
|
||
|
||
@example
|
||
char **foo = (char *[]) @{ "x", "y", "z" @};
|
||
@end example
|
||
|
||
@noindent
|
||
The data type of @code{foo} is @code{char **}, which is a pointer
|
||
type, not an array type. The declaration is equivalent to defining
|
||
and then using an array-type variable:
|
||
|
||
@example
|
||
char *nameless_array[] = @{ "x", "y", "z" @};
|
||
char **foo = &nameless_array[0];
|
||
@end example
|
||
|
||
@node Arrays of Variable Length
|
||
@section Arrays of Variable Length
|
||
@cindex array of variable length
|
||
@cindex variable-length arrays
|
||
|
||
In GNU C, you can declare variable-length arrays like any other
|
||
arrays, but with a length that is not a constant expression. The
|
||
storage is allocated at the point of declaration and deallocated when
|
||
the block scope containing the declaration exits. For example:
|
||
|
||
@example
|
||
#include <stdio.h> /* @r{Defines @code{FILE}.} */
|
||
#include <string.h> /* @r{Declares @code{str}.} */
|
||
|
||
FILE *
|
||
concat_fopen (char *s1, char *s2, char *mode)
|
||
@{
|
||
char str[strlen (s1) + strlen (s2) + 1];
|
||
strcpy (str, s1);
|
||
strcat (str, s2);
|
||
return fopen (str, mode);
|
||
@}
|
||
@end example
|
||
|
||
@noindent
|
||
(This uses some standard library functions; see @ref{String and Array
|
||
Utilities, , , libc, The GNU C Library Reference Manual}.)
|
||
|
||
The length of an array is computed once when the storage is allocated
|
||
and is remembered for the scope of the array in case it is used in
|
||
@code{sizeof}.
|
||
|
||
@strong{Warning:} Don't allocate a variable-length array if the size
|
||
might be very large (more than 100,000), or in a recursive function,
|
||
because that is likely to cause stack overflow. Allocate the array
|
||
dynamically instead (@pxref{Dynamic Memory Allocation}).
|
||
|
||
Jumping or breaking out of the scope of the array name deallocates the
|
||
storage. Jumping into the scope is not allowed; that gives an error
|
||
message.
|
||
|
||
You can also use variable-length arrays as arguments to functions:
|
||
|
||
@example
|
||
struct entry
|
||
tester (int len, char data[len][len])
|
||
@{
|
||
@r{@dots{}}
|
||
@}
|
||
@end example
|
||
|
||
As usual, a function argument declared with an array type
|
||
is really a pointer to an array that already exists.
|
||
Calling the function does not allocate the array, so there's no
|
||
particular danger of stack overflow in using this construct.
|
||
|
||
To pass the array first and the length afterward, use a forward
|
||
declaration in the function's parameter list (another GNU extension).
|
||
For example,
|
||
|
||
@example
|
||
struct entry
|
||
tester (int len; char data[len][len], int len)
|
||
@{
|
||
@r{@dots{}}
|
||
@}
|
||
@end example
|
||
|
||
The @code{int len} before the semicolon is a @dfn{parameter forward
|
||
declaration}, and it serves the purpose of making the name @code{len}
|
||
known when the declaration of @code{data} is parsed.
|
||
|
||
You can write any number of such parameter forward declarations in the
|
||
parameter list. They can be separated by commas or semicolons, but
|
||
the last one must end with a semicolon, which is followed by the
|
||
``real'' parameter declarations. Each forward declaration must match
|
||
a ``real'' declaration in parameter name and data type. ISO C11 does
|
||
not support parameter forward declarations.
|
||
|
||
@node Enumeration Types
|
||
@chapter Enumeration Types
|
||
@cindex enumeration types
|
||
@cindex types, enumeration
|
||
@cindex enumerator
|
||
|
||
An @dfn{enumeration type} represents a limited set of integer values,
|
||
each with a name. It is effectively equivalent to a primitive integer
|
||
type.
|
||
|
||
Suppose we have a list of possible emotional states to store in an
|
||
integer variable. We can give names to these alternative values with
|
||
an enumeration:
|
||
|
||
@example
|
||
enum emotion_state @{ neutral, happy, sad, worried,
|
||
calm, nervous @};
|
||
@end example
|
||
|
||
@noindent
|
||
(Never mind that this is a simplistic way to classify emotional states;
|
||
it's just a code example.)
|
||
|
||
The names inside the enumeration are called @dfn{enumerators}. The
|
||
enumeration type defines them as constants, and their values are
|
||
consecutive integers; @code{neutral} is 0, @code{happy} is 1,
|
||
@code{sad} is 2, and so on. Alternatively, you can specify values for
|
||
the enumerators explicitly like this:
|
||
|
||
@example
|
||
enum emotion_state @{ neutral = 2, happy = 5,
|
||
sad = 20, worried = 10,
|
||
calm = -5, nervous = -300 @};
|
||
@end example
|
||
|
||
Each enumerator which does not specify a value gets value zero
|
||
(if it is at the beginning) or the next consecutive integer.
|
||
|
||
@example
|
||
/* @r{@code{neutral} is 0 by default,}
|
||
@r{and @code{worried} is 21 by default.} */
|
||
enum emotion_state @{ neutral,
|
||
happy = 5, sad = 20, worried,
|
||
calm = -5, nervous = -300 @};
|
||
@end example
|
||
|
||
If an enumerator is obsolete, you can specify that using it should
|
||
cause a warning, by including an attribute in the enumerator's
|
||
declaration. Here is how @code{happy} would look with this
|
||
attribute:
|
||
|
||
@example
|
||
happy __attribute__
|
||
((deprecated
|
||
("impossible under plutocratic rule")))
|
||
= 5,
|
||
@end example
|
||
|
||
@xref{Attributes}.
|
||
|
||
You can declare variables with the enumeration type:
|
||
|
||
@example
|
||
enum emotion_state feelings_now;
|
||
@end example
|
||
|
||
In the C code itself, this is equivalent to declaring the variable
|
||
@code{int}. (If all the enumeration values are positive, it is
|
||
equivalent to @code{unsigned int}.) However, declaring it with the
|
||
enumeration type has an advantage in debugging, because GDB knows it
|
||
should display the current value of the variable using the
|
||
corresponding name. If the variable's type is @code{int}, GDB can
|
||
only show the value as a number.
|
||
|
||
The identifier that follows @code{enum} is called a @dfn{type tag}
|
||
since it distinguishes different enumeration types. Type tags are in
|
||
a separate name space and belong to scopes like most other names in C@.
|
||
@xref{Type Tags}, for explanation.
|
||
|
||
You can predeclare an @code{enum} type tag like a structure or union
|
||
type tag, like this:
|
||
|
||
@example
|
||
enum foo;
|
||
@end example
|
||
|
||
@noindent
|
||
The @code{enum} type is incomplete until you finish defining it.
|
||
|
||
You can optionally include a trailing comma at the end of a list of
|
||
enumeration values:
|
||
|
||
@example
|
||
enum emotion_state @{ neutral, happy, sad, worried,
|
||
calm, nervous, @};
|
||
@end example
|
||
|
||
@noindent
|
||
This is useful in some macro definitions, since it enables you to
|
||
assemble the list of enumerators without knowing which one is last.
|
||
The extra comma does not change the meaning of the enumeration in any
|
||
way.
|
||
|
||
@node Defining Typedef Names
|
||
@chapter Defining Typedef Names
|
||
@cindex typedef names
|
||
@findex typedef
|
||
|
||
You can define a data type keyword as an alias for any type, and then
|
||
use the alias syntactically like a built-in type keyword such as
|
||
@code{int}. You do this using @code{typedef}, so these aliases are
|
||
also called @dfn{typedef names}.
|
||
|
||
@code{typedef} is followed by text that looks just like a variable
|
||
declaration, but instead of declaring variables it defines data type
|
||
keywords.
|
||
|
||
Here's how to define @code{fooptr} as a typedef alias for the type
|
||
@code{struct foo *}, then declare @code{x} and @code{y} as variables
|
||
with that type:
|
||
|
||
@example
|
||
typedef struct foo *fooptr;
|
||
|
||
fooptr x, y;
|
||
@end example
|
||
|
||
@noindent
|
||
That declaration is equivalent to the following one:
|
||
|
||
@example
|
||
struct foo *x, *y;
|
||
@end example
|
||
|
||
You can define a typedef alias for any type. For instance, this makes
|
||
@code{frobcount} an alias for type @code{int}:
|
||
|
||
@example
|
||
typedef int frobcount;
|
||
@end example
|
||
|
||
@noindent
|
||
This doesn't define a new type distinct from @code{int}. Rather,
|
||
@code{frobcount} is another name for the type @code{int}. Once the
|
||
variable is declared, it makes no difference which name the
|
||
declaration used.
|
||
|
||
There is a syntactic difference, however, between @code{frobcount} and
|
||
@code{int}: A typedef name cannot be used with
|
||
@code{signed}, @code{unsigned}, @code{long} or @code{short}. It has
|
||
to specify the type all by itself. So you can't write this:
|
||
|
||
@example
|
||
unsigned frobcount f1; /* @r{Error!} */
|
||
@end example
|
||
|
||
But you can write this:
|
||
|
||
@example
|
||
typedef unsigned int unsigned_frobcount;
|
||
|
||
unsigned_frobcount f1;
|
||
@end example
|
||
|
||
In other words, a typedef name is not an alias for @emph{a keyword}
|
||
such as @code{int}. It stands for a @emph{type}, and that could be
|
||
the type @code{int}.
|
||
|
||
Typedef names are in the same namespace as functions and variables, so
|
||
you can't use the same name for a typedef and a function, or a typedef
|
||
and a variable. When a typedef is declared inside a code block, it is
|
||
in scope only in that block.
|
||
|
||
@strong{Warning:} Avoid defining typedef names that end in @samp{_t},
|
||
because many of these have standard meanings.
|
||
|
||
You can redefine a typedef name to the exact same type as its first
|
||
definition, but you cannot redefine a typedef name to a
|
||
different type, even if the two types are compatible. For example, this
|
||
is valid:
|
||
|
||
@example
|
||
typedef int frobcount;
|
||
typedef int frotzcount;
|
||
typedef frotzcount frobcount;
|
||
typedef frobcount frotzcount;
|
||
@end example
|
||
|
||
@noindent
|
||
because each typedef name is always defined with the same type
|
||
(@code{int}), but this is not valid:
|
||
|
||
@example
|
||
enum foo @{f1, f2, f3@};
|
||
typedef enum foo frobcount;
|
||
typedef int frobcount;
|
||
@end example
|
||
|
||
@noindent
|
||
Even though the type @code{enum foo} is compatible with @code{int},
|
||
they are not the @emph{same} type.
|
||
|
||
@node Statements
|
||
@chapter Statements
|
||
@cindex statements
|
||
|
||
A @dfn{statement} specifies computations to be done for effect; it
|
||
does not produce a value, as an expression would. In general a
|
||
statement ends with a semicolon (@samp{;}), but blocks (which are
|
||
statements, more or less) are an exception to that rule.
|
||
@ifnottex
|
||
@xref{Blocks}.
|
||
@end ifnottex
|
||
|
||
The places to use statements are inside a block, and inside a
|
||
complex statement. A @dfn{complex statement} contains one or two
|
||
components that are nested statements. Each such component must
|
||
consist of one and only one statement. The way to put multiple
|
||
statements in such a component is to group them into a @dfn{block}
|
||
(@pxref{Blocks}), which counts as one statement.
|
||
|
||
The following sections describe the various kinds of statement.
|
||
|
||
@menu
|
||
* Expression Statement:: Evaluate an expression, as a statement,
|
||
usually done for a side effect.
|
||
* if Statement:: Basic conditional execution.
|
||
* if-else Statement:: Multiple branches for conditional execution.
|
||
* Blocks:: Grouping multiple statements together.
|
||
* return Statement:: Return a value from a function.
|
||
* Loop Statements:: Repeatedly executing a statement or block.
|
||
* switch Statement:: Multi-way conditional choices.
|
||
* switch Example:: A plausible example of using @code{switch}.
|
||
* Duffs Device:: A special way to use @code{switch}.
|
||
* Case Ranges:: Ranges of values for @code{switch} cases.
|
||
* Null Statement:: A statement that does nothing.
|
||
* goto Statement:: Jump to another point in the source code,
|
||
identified by a label.
|
||
* Local Labels:: Labels with limited scope.
|
||
* Labels as Values:: Getting the address of a label.
|
||
* Statement Exprs:: A series of statements used as an expression.
|
||
@end menu
|
||
|
||
@node Expression Statement
|
||
@section Expression Statement
|
||
@cindex expression statement
|
||
@cindex statement, expression
|
||
|
||
The most common kind of statement in C is an @dfn{expression statement}.
|
||
It consists of an expression followed by a
|
||
semicolon. The expression's value is discarded, so the expressions
|
||
that are useful are those that have side effects: assignment
|
||
expressions, increment and decrement expressions, and function calls.
|
||
Here are examples of expression statements:
|
||
|
||
@smallexample
|
||
x = 5; /* @r{Assignment expression.} */
|
||
p++; /* @r{Increment expression.} */
|
||
printf ("Done\n"); /* @r{Function call expression.} */
|
||
*p; /* @r{Cause @code{SIGSEGV} signal if @code{p} is null.} */
|
||
x + y; /* @r{Useless statement without effect.} */
|
||
@end smallexample
|
||
|
||
In very unusual circumstances we use an expression statement
|
||
whose purpose is to get a fault if an address is invalid:
|
||
|
||
@smallexample
|
||
volatile char *p;
|
||
@r{@dots{}}
|
||
*p; /* @r{Cause signal if @code{p} is null.} */
|
||
@end smallexample
|
||
|
||
If the target of @code{p} is not declared @code{volatile}, the
|
||
compiler might optimize away the memory access, since it knows that
|
||
the value isn't really used. @xref{volatile}.
|
||
|
||
@node if Statement
|
||
@section @code{if} Statement
|
||
@cindex @code{if} statement
|
||
@cindex statement, @code{if}
|
||
@findex if
|
||
|
||
An @code{if} statement computes an expression to decide
|
||
whether to execute the following statement or not.
|
||
It looks like this:
|
||
|
||
@example
|
||
if (@var{condition})
|
||
@var{execute-if-true}
|
||
@end example
|
||
|
||
The first thing this does is compute the value of @var{condition}. If
|
||
that is true (nonzero), then it executes the statement
|
||
@var{execute-if-true}. If the value of @var{condition} is false
|
||
(zero), it doesn't execute @var{execute-if-true}; instead, it does
|
||
nothing.
|
||
|
||
This is a @dfn{complex statement} because it contains a component
|
||
@var{execute-if-true} that is a nested statement. It must be one
|
||
and only one statement. The way to put multiple statements there is
|
||
to group them into a @dfn{block} (@pxref{Blocks}).
|
||
|
||
@node if-else Statement
|
||
@section @code{if-else} Statement
|
||
@cindex @code{if}@dots{}@code{else} statement
|
||
@cindex statement, @code{if}@dots{}@code{else}
|
||
@findex else
|
||
|
||
An @code{if}-@code{else} statement computes an expression to decide
|
||
which of two nested statements to execute.
|
||
It looks like this:
|
||
|
||
@example
|
||
if (@var{condition})
|
||
@var{if-true-substatement}
|
||
else
|
||
@var{if-false-substatement}
|
||
@end example
|
||
|
||
The first thing this does is compute the value of @var{condition}. If
|
||
that is true (nonzero), then it executes the statement
|
||
@var{if-true-substatement}. If the value of @var{condition} is false
|
||
(zero), then it executes the statement @var{if-false-substatement} instead.
|
||
|
||
This is a @dfn{complex statement} because it contains components
|
||
@var{if-true-substatement} and @var{if-else-substatement} that are
|
||
nested statements. Each must be one and only one statement. The way
|
||
to put multiple statements in such a component is to group them into a
|
||
@dfn{block} (@pxref{Blocks}).
|
||
|
||
@node Blocks
|
||
@section Blocks
|
||
@cindex block
|
||
@cindex compound statement
|
||
|
||
A @dfn{block} is a construct that contains multiple statements of any
|
||
kind. It begins with @samp{@{} and ends with @samp{@}}, and has a
|
||
series of statements and declarations in between. Another name for
|
||
blocks is @dfn{compound statements}.
|
||
|
||
Is a block a statement? Yes and no. It doesn't @emph{look} like a
|
||
normal statement---it does not end with a semicolon. But you can
|
||
@emph{use} it like a statement; anywhere that a statement is required
|
||
or allowed, you can write a block and consider that block a statement.
|
||
|
||
So far it seems that a block is a kind of statement with an unusual
|
||
syntax. But that is not entirely true: a function body is also a
|
||
block, and that block is definitely not a statement. The text after a
|
||
function header is not treated as a statement; only a function body is
|
||
allowed there, and nothing else would be meaningful there.
|
||
|
||
In a formal grammar we would have to choose---either a block is a kind
|
||
of statement or it is not. But this manual is meant for humans, not
|
||
for parser generators. The clearest answer for humans is, ``a block
|
||
is a statement, in some ways.''
|
||
|
||
@cindex nested block
|
||
@cindex internal block
|
||
A block that isn't a function body is called an @dfn{internal block}
|
||
or a @dfn{nested block}. You can put a nested block directly inside
|
||
another block, but more often the nested block is inside some complex
|
||
statement, such as a @code{for} statement or an @code{if} statement.
|
||
|
||
There are two uses for nested blocks in C:
|
||
|
||
@itemize @bullet
|
||
@item
|
||
To specify the scope for local declarations. For instance, a local
|
||
variable's scope is the rest of the innermost containing block.
|
||
|
||
@item
|
||
To write a series of statements where, syntactically, one statement is
|
||
called for. For instance, the @var{execute-if-true} of an @code{if}
|
||
statement is one statement. To put multiple statements there, they
|
||
have to be wrapped in a block, like this:
|
||
|
||
@example
|
||
if (x < 0)
|
||
@{
|
||
printf ("x was negative\n");
|
||
x = -x;
|
||
@}
|
||
@end example
|
||
@end itemize
|
||
|
||
This example (repeated from above) shows a nested block which serves
|
||
both purposes: it includes two statements (plus a declaration) in the
|
||
body of a @code{while} statement, and it provides the scope for the
|
||
declaration of @code{q}.
|
||
|
||
@example
|
||
void
|
||
free_intlist (struct intlistlink *p)
|
||
@{
|
||
while (p)
|
||
@{
|
||
struct intlistlink *q = p;
|
||
p = p->next;
|
||
free (q);
|
||
@}
|
||
@}
|
||
@end example
|
||
|
||
@node return Statement
|
||
@section @code{return} Statement
|
||
@cindex @code{return} statement
|
||
@cindex statement, @code{return}
|
||
@findex return
|
||
|
||
The @code{return} statement makes the containing function return
|
||
immediately. It has two forms. This one specifies no value to
|
||
return:
|
||
|
||
@example
|
||
return;
|
||
@end example
|
||
|
||
@noindent
|
||
That form is meant for functions whose return type is @code{void}
|
||
(@pxref{The Void Type}). You can also use it in a function that
|
||
returns nonvoid data, but that's a bad idea, since it makes the
|
||
function return garbage.
|
||
|
||
The form that specifies a value looks like this:
|
||
|
||
@example
|
||
return @var{value};
|
||
@end example
|
||
|
||
@noindent
|
||
which computes the expression @var{value} and makes the function
|
||
return that. If necessary, the value undergoes type conversion to
|
||
the function's declared return value type, which works like
|
||
assigning the value to a variable of that type.
|
||
|
||
@node Loop Statements
|
||
@section Loop Statements
|
||
@cindex loop statements
|
||
@cindex statements, loop
|
||
@cindex iteration
|
||
|
||
You can use a loop statement when you need to execute a series of
|
||
statements repeatedly, making an @dfn{iteration}. C provides several
|
||
different kinds of loop statements, described in the following
|
||
subsections.
|
||
|
||
Every kind of loop statement is a complex statement because contains a
|
||
component, here called @var{body}, which is a nested statement.
|
||
Most often the body is a block.
|
||
|
||
@menu
|
||
* while Statement:: Loop as long as a test expression is true.
|
||
* do-while Statement:: Execute a loop once, with further looping
|
||
as long as a test expression is true.
|
||
* break Statement:: End a loop immediately.
|
||
* for Statement:: Iterative looping.
|
||
* Example of for:: An example of iterative looping.
|
||
* Omitted for-Expressions:: for-loop expression options.
|
||
* for-Index Declarations:: for-loop declaration options.
|
||
* continue Statement:: Begin the next cycle of a loop.
|
||
@end menu
|
||
|
||
@node while Statement
|
||
@subsection @code{while} Statement
|
||
@cindex @code{while} statement
|
||
@cindex statement, @code{while}
|
||
@findex while
|
||
|
||
The @code{while} statement is the simplest loop construct.
|
||
It looks like this:
|
||
|
||
@example
|
||
while (@var{test})
|
||
@var{body}
|
||
@end example
|
||
|
||
Here, @var{body} is a statement (often a nested block) to repeat, and
|
||
@var{test} is the test expression that controls whether to repeat it again.
|
||
Each iteration of the loop starts by computing @var{test} and, if it
|
||
is true (nonzero), that means the loop should execute @var{body} again
|
||
and then start over.
|
||
|
||
Here's an example of advancing to the last structure in a chain of
|
||
structures chained through the @code{next} field:
|
||
|
||
@example
|
||
#include <stddef.h> /* @r{Defines @code{NULL}.} */
|
||
@r{@dots{}}
|
||
while (chain->next != NULL)
|
||
chain = chain->next;
|
||
@end example
|
||
|
||
@noindent
|
||
This code assumes the chain isn't empty to start with; if the chain is
|
||
empty (that is, if @code{chain} is a null pointer), the code gets a
|
||
@code{SIGSEGV} signal trying to dereference that null pointer (@pxref{Signals}).
|
||
|
||
@node do-while Statement
|
||
@subsection @code{do-while} Statement
|
||
@cindex @code{do}--@code{while} statement
|
||
@cindex statement, @code{do}--@code{while}
|
||
@findex do
|
||
|
||
The @code{do}--@code{while} statement is a simple loop construct that
|
||
performs the test at the end of the iteration.
|
||
|
||
@example
|
||
do
|
||
@var{body}
|
||
while (@var{test});
|
||
@end example
|
||
|
||
Here, @var{body} is a statement (possibly a block) to repeat, and
|
||
@var{test} is an expression that controls whether to repeat it again.
|
||
|
||
Each iteration of the loop starts by executing @var{body}. Then it
|
||
computes @var{test} and, if it is true (nonzero), that means to go
|
||
back and start over with @var{body}. If @var{test} is false (zero),
|
||
then the loop stops repeating and execution moves on past it.
|
||
|
||
@strong{Warning:} Human beings tend to confuse the @code{do}--@code{while}
|
||
statement with the @code{while} statement using the null statement
|
||
as its @var{body} (@pxref{Null Statement}). To avoid that, consistently
|
||
mark such constructs with a specific comment or with clearly different
|
||
indent styles.
|
||
|
||
@node break Statement
|
||
@subsection @code{break} Statement
|
||
@cindex @code{break} statement
|
||
@cindex statement, @code{break}
|
||
@findex break
|
||
|
||
The @code{break} statement looks like @samp{break;}. Its effect is to
|
||
exit immediately from the innermost loop construct or @code{switch}
|
||
statement (@pxref{switch Statement}).
|
||
|
||
For example, this loop advances @code{p} until the next null
|
||
character or newline.
|
||
|
||
@example
|
||
while (*p)
|
||
@{
|
||
/* @r{End loop if we have reached a newline.} */
|
||
if (*p == '\n')
|
||
break;
|
||
p++
|
||
@}
|
||
@end example
|
||
|
||
When there are nested loops, the @code{break} statement exits from the
|
||
innermost loop containing it.
|
||
|
||
@example
|
||
struct list_if_tuples
|
||
@{
|
||
struct list_if_tuples *next;
|
||
int length;
|
||
data *contents;
|
||
@};
|
||
|
||
void
|
||
process_all_elements (struct list_if_tuples *list)
|
||
@{
|
||
while (list)
|
||
@{
|
||
/* @r{Process all the elements in this node's vector,}
|
||
@r{stopping when we reach one that is null.} */
|
||
for (i = 0; i < list->length; i++)
|
||
@{
|
||
/* @r{Null element terminates this node's vector.} */
|
||
if (list->contents[i] == NULL)
|
||
/* @r{Exit the @code{for} loop.} */
|
||
break;
|
||
/* @r{Operate on the next element.} */
|
||
process_element (list->contents[i]);
|
||
@}
|
||
|
||
list = list->next;
|
||
@}
|
||
@}
|
||
@end example
|
||
|
||
The only way in C to exit from an outer loop is with
|
||
@code{goto} (@pxref{goto Statement}).
|
||
|
||
@node for Statement
|
||
@subsection @code{for} Statement
|
||
@cindex @code{for} statement
|
||
@cindex statement, @code{for}
|
||
@findex for
|
||
|
||
A @code{for} statement uses three expressions written inside a
|
||
parenthetical group to define the repetition of the loop. The first
|
||
expression says how to prepare to start the loop. The second says how
|
||
to test, before each iteration, whether to continue looping. The
|
||
third says how to advance, at the end of an iteration, for the next
|
||
iteration. All together, it looks like this:
|
||
|
||
@example
|
||
for (@var{start}; @var{continue-test}; @var{advance})
|
||
@var{body}
|
||
@end example
|
||
|
||
The first thing the @code{for} statement does is compute @var{start}.
|
||
The next thing it does is compute the expression @var{continue-test}.
|
||
If that expression is false (zero), the @code{for} statement finishes
|
||
immediately, so @var{body} is executed zero times.
|
||
|
||
However, if @var{continue-test} is true (nonzero), the @code{for}
|
||
statement executes @var{body}, then @var{advance}. Then it loops back
|
||
to the not-quite-top to test @var{continue-test} again. But it does
|
||
not compute @var{start} again.
|
||
|
||
@node Example of for
|
||
@subsection Example of @code{for}
|
||
|
||
Here is the @code{for} statement from the iterative Fibonacci
|
||
function:
|
||
|
||
@example
|
||
int i;
|
||
for (i = 1; i < n; ++i)
|
||
/* @r{If @code{n} is 1 or less, the loop runs zero times,} */
|
||
/* @r{since @code{i < n} is false the first time.} */
|
||
@{
|
||
/* @r{Now @var{last} is @code{fib (@var{i})}}
|
||
@r{and @var{prev} is @code{fib (@var{i} @minus{} 1)}.} */
|
||
/* @r{Compute @code{fib (@var{i} + 1)}.} */
|
||
int next = prev + last;
|
||
/* @r{Shift the values down.} */
|
||
prev = last;
|
||
last = next;
|
||
/* @r{Now @var{last} is @code{fib (@var{i} + 1)}}
|
||
@r{and @var{prev} is @code{fib (@var{i})}.}
|
||
@r{But that won't stay true for long,}
|
||
@r{because we are about to increment @var{i}.} */
|
||
@}
|
||
@end example
|
||
|
||
In this example, @var{start} is @code{i = 1}, meaning set @code{i} to
|
||
1. @var{continue-test} is @code{i < n}, meaning keep repeating the
|
||
loop as long as @code{i} is less than @code{n}. @var{advance} is
|
||
@code{i++}, meaning increment @code{i} by 1. The body is a block
|
||
that contains a declaration and two statements.
|
||
|
||
@node Omitted for-Expressions
|
||
@subsection Omitted @code{for}-Expressions
|
||
|
||
A fully-fleshed @code{for} statement contains all these parts,
|
||
|
||
@example
|
||
for (@var{start}; @var{continue-test}; @var{advance})
|
||
@var{body}
|
||
@end example
|
||
|
||
@noindent
|
||
but you can omit any of the three expressions inside the parentheses.
|
||
The parentheses and the two semicolons are required syntactically, but
|
||
the expressions between them may be missing. A missing expression
|
||
means this loop doesn't use that particular feature of the @code{for}
|
||
statement.
|
||
|
||
@c ??? You can't do this if START is a declaration.
|
||
Instead of using @var{start}, you can do the loop preparation
|
||
before the @code{for} statement: the effect is the same. So we
|
||
could have written the beginning of the previous example this way:
|
||
|
||
@example
|
||
int i = 0;
|
||
for (; i < n; ++i)
|
||
@end example
|
||
|
||
@noindent
|
||
instead of this way:
|
||
|
||
@example
|
||
int i;
|
||
for (i = 0; i < n; ++i)
|
||
@end example
|
||
|
||
Omitting @var{continue-test} means the loop runs forever (or until
|
||
something else causes exit from it). Statements inside the loop can
|
||
test conditions for termination and use @samp{break;} to exit. This
|
||
is more flexible since you can put those tests anywhere in the loop,
|
||
not solely at the beginning.
|
||
|
||
Putting an expression in @var{advance} is almost equivalent to writing
|
||
it at the end of the loop body; it does almost the same thing. The
|
||
only difference is for the @code{continue} statement (@pxref{continue
|
||
Statement}). So we could have written this:
|
||
|
||
@example
|
||
for (i = 0; i < n;)
|
||
@{
|
||
@r{@dots{}}
|
||
++i;
|
||
@}
|
||
@end example
|
||
|
||
@noindent
|
||
instead of this:
|
||
|
||
@example
|
||
for (i = 0; i < n; ++i)
|
||
@{
|
||
@r{@dots{}}
|
||
@}
|
||
@end example
|
||
|
||
The choice is mainly a matter of what is more readable for
|
||
programmers. However, there is also a syntactic difference:
|
||
@var{advance} is an expression, not a statement. It can't include
|
||
loops, blocks, declarations, etc.
|
||
|
||
@node for-Index Declarations
|
||
@subsection @code{for}-Index Declarations
|
||
|
||
You can declare loop-index variables directly in the @var{start}
|
||
portion of the @code{for}-loop, like this:
|
||
|
||
@example
|
||
for (int i = 0; i < n; ++i)
|
||
@{
|
||
@r{@dots{}}
|
||
@}
|
||
@end example
|
||
|
||
This kind of @var{start} is limited to a single declaration; it can
|
||
declare one or more variables, separated by commas, all of which are
|
||
the same @var{basetype} (@code{int}, in this example):
|
||
|
||
@example
|
||
for (int i = 0, j = 1, *p = NULL; i < n; ++i, ++j, ++p)
|
||
@{
|
||
@r{@dots{}}
|
||
@}
|
||
@end example
|
||
|
||
@noindent
|
||
The scope of these variables is the @code{for} statement as a whole.
|
||
See @ref{Variable Declarations} for an explanation of @var{basetype}.
|
||
|
||
Variables declared in @code{for} statements should have initializers.
|
||
Omitting the initialization gives the variables unpredictable initial
|
||
values, so this code is erroneous.
|
||
|
||
@example
|
||
for (int i; i < n; ++i)
|
||
@{
|
||
@r{@dots{}}
|
||
@}
|
||
@end example
|
||
|
||
@node continue Statement
|
||
@subsection @code{continue} Statement
|
||
@cindex @code{continue} statement
|
||
@cindex statement, @code{continue}
|
||
@findex continue
|
||
|
||
The @code{continue} statement looks like @samp{continue;}, and its
|
||
effect is to jump immediately to the end of the innermost loop
|
||
construct. If it is a @code{for}-loop, the next thing that happens
|
||
is to execute the loop's @var{advance} expression.
|
||
|
||
For example, this loop increments @code{p} until the next null character
|
||
or newline, and operates (in some way not shown) on all the characters
|
||
in the line except for spaces. All it does with spaces is skip them.
|
||
|
||
@example
|
||
for (;*p; ++p)
|
||
@{
|
||
/* @r{End loop if we have reached a newline.} */
|
||
if (*p == '\n')
|
||
break;
|
||
/* @r{Pay no attention to spaces.} */
|
||
if (*p == ' ')
|
||
continue;
|
||
/* @r{Operate on the next character.} */
|
||
@r{@dots{}}
|
||
@}
|
||
@end example
|
||
|
||
@noindent
|
||
Executing @samp{continue;} skips the loop body but it does not
|
||
skip the @var{advance} expression, @code{p++}.
|
||
|
||
We could also write it like this:
|
||
|
||
@example
|
||
for (;*p; ++p)
|
||
@{
|
||
/* @r{Exit if we have reached a newline.} */
|
||
if (*p == '\n')
|
||
break;
|
||
/* @r{Pay no attention to spaces.} */
|
||
if (*p != ' ')
|
||
@{
|
||
/* @r{Operate on the next character.} */
|
||
@r{@dots{}}
|
||
@}
|
||
@}
|
||
@end example
|
||
|
||
The advantage of using @code{continue} is that it reduces the
|
||
depth of nesting.
|
||
|
||
Contrast @code{continue} with the @code{break} statement. @xref{break
|
||
Statement}.
|
||
|
||
@node switch Statement
|
||
@section @code{switch} Statement
|
||
@cindex @code{switch} statement
|
||
@cindex statement, @code{switch}
|
||
@findex switch
|
||
@findex case
|
||
@findex default
|
||
|
||
The @code{switch} statement selects code to run according to the value
|
||
of an expression. The expression, in parentheses, follows the keyword
|
||
@code{switch}. After that come all the cases to select among,
|
||
inside braces. It looks like this:
|
||
|
||
@example
|
||
switch (@var{selector})
|
||
@{
|
||
@var{cases}@r{@dots{}}
|
||
@}
|
||
@end example
|
||
|
||
A case can look like this:
|
||
|
||
@example
|
||
case @var{value}:
|
||
@var{statements}
|
||
break;
|
||
@end example
|
||
|
||
@noindent
|
||
which means ``come here if @var{selector} happens to have the value
|
||
@var{value},'' or like this (a GNU C extension):
|
||
|
||
@example
|
||
case @var{rangestart} ... @var{rangeend}:
|
||
@var{statements}
|
||
break;
|
||
@end example
|
||
|
||
@noindent
|
||
which means ``come here if @var{selector} happens to have a value
|
||
between @var{rangestart} and @var{rangeend} (inclusive).'' @xref{Case
|
||
Ranges}.
|
||
|
||
The values in @code{case} labels must reduce to integer constants.
|
||
They can use arithmetic, and @code{enum} constants, but they cannot
|
||
refer to data in memory, because they have to be computed at compile
|
||
time. It is an error if two @code{case} labels specify the same
|
||
value, or ranges that overlap, or if one is a range and the other is a
|
||
value in that range.
|
||
|
||
You can also define a default case to handle ``any other value,'' like
|
||
this:
|
||
|
||
@example
|
||
default:
|
||
@var{statements}
|
||
break;
|
||
@end example
|
||
|
||
If the @code{switch} statement has no @code{default:} label, then it
|
||
does nothing when the value matches none of the cases.
|
||
|
||
The brace-group inside the @code{switch} statement is a block, and you
|
||
can declare variables with that scope just as in any other block
|
||
(@pxref{Blocks}). However, initializers in these declarations won't
|
||
necessarily be executed every time the @code{switch} statement runs,
|
||
so it is best to avoid giving them initializers.
|
||
|
||
@code{break;} inside a @code{switch} statement exits immediately from
|
||
the @code{switch} statement. @xref{break Statement}.
|
||
|
||
If there is no @code{break;} at the end of the code for a case,
|
||
execution continues into the code for the following case. This
|
||
happens more often by mistake than intentionally, but since this
|
||
feature is used in real code, we cannot eliminate it.
|
||
|
||
@strong{Warning:} When one case is intended to fall through to the
|
||
next, write a comment like @samp{falls through} to say it's
|
||
intentional. That way, other programmers won't assume it was an error
|
||
and ``fix'' it erroneously.
|
||
|
||
Consecutive @code{case} statements could, pedantically, be considered
|
||
an instance of falling through, but we don't consider or treat them that
|
||
way because they won't confuse anyone.
|
||
|
||
@node switch Example
|
||
@section Example of @code{switch}
|
||
|
||
Here's an example of using the @code{switch} statement
|
||
to distinguish among characters:
|
||
|
||
@cindex counting vowels and punctuation
|
||
@example
|
||
struct vp @{ int vowels, punct; @};
|
||
|
||
struct vp
|
||
count_vowels_and_punct (char *string)
|
||
@{
|
||
int c;
|
||
int vowels = 0;
|
||
int punct = 0;
|
||
/* @r{Don't change the parameter itself.} */
|
||
/* @r{That helps in debugging.} */
|
||
char *p = string;
|
||
struct vp value;
|
||
|
||
while (c = *p++)
|
||
switch (c)
|
||
@{
|
||
case 'y':
|
||
case 'Y':
|
||
/* @r{We assume @code{y_is_consonant} will check surrounding
|
||
letters to determine whether this y is a vowel.} */
|
||
if (y_is_consonant (p - 1))
|
||
break;
|
||
|
||
/* @r{Falls through} */
|
||
|
||
case 'a':
|
||
case 'e':
|
||
case 'i':
|
||
case 'o':
|
||
case 'u':
|
||
case 'A':
|
||
case 'E':
|
||
case 'I':
|
||
case 'O':
|
||
case 'U':
|
||
vowels++;
|
||
break;
|
||
|
||
case '.':
|
||
case ',':
|
||
case ':':
|
||
case ';':
|
||
case '?':
|
||
case '!':
|
||
case '\"':
|
||
case '\'':
|
||
punct++;
|
||
break;
|
||
@}
|
||
|
||
value.vowels = vowels;
|
||
value.punct = punct;
|
||
|
||
return value;
|
||
@}
|
||
@end example
|
||
|
||
@node Duffs Device
|
||
@section Duff's Device
|
||
@cindex Duff's device
|
||
|
||
The cases in a @code{switch} statement can be inside other control
|
||
constructs. For instance, we can use a technique known as @dfn{Duff's
|
||
device} to optimize this simple function,
|
||
|
||
@example
|
||
void
|
||
copy (char *to, char *from, int count)
|
||
@{
|
||
while (count > 0)
|
||
*to++ = *from++, count--;
|
||
@}
|
||
@end example
|
||
|
||
@noindent
|
||
which copies memory starting at @var{from} to memory starting at
|
||
@var{to}.
|
||
|
||
Duff's device involves unrolling the loop so that it copies
|
||
several characters each time around, and using a @code{switch} statement
|
||
to enter the loop body at the proper point:
|
||
|
||
@example
|
||
void
|
||
copy (char *to, char *from, int count)
|
||
@{
|
||
if (count <= 0)
|
||
return;
|
||
int n = (count + 7) / 8;
|
||
switch (count % 8)
|
||
@{
|
||
do @{
|
||
case 0: *to++ = *from++;
|
||
case 7: *to++ = *from++;
|
||
case 6: *to++ = *from++;
|
||
case 5: *to++ = *from++;
|
||
case 4: *to++ = *from++;
|
||
case 3: *to++ = *from++;
|
||
case 2: *to++ = *from++;
|
||
case 1: *to++ = *from++;
|
||
@} while (--n > 0);
|
||
@}
|
||
@}
|
||
@end example
|
||
|
||
@node Case Ranges
|
||
@section Case Ranges
|
||
@cindex case ranges
|
||
@cindex ranges in case statements
|
||
|
||
You can specify a range of consecutive values in a single @code{case} label,
|
||
like this:
|
||
|
||
@example
|
||
case @var{low} ... @var{high}:
|
||
@end example
|
||
|
||
@noindent
|
||
This has the same effect as the proper number of individual @code{case}
|
||
labels, one for each integer value from @var{low} to @var{high}, inclusive.
|
||
|
||
This feature is especially useful for ranges of ASCII character codes:
|
||
|
||
@example
|
||
case 'A' ... 'Z':
|
||
@end example
|
||
|
||
@strong{Be careful:} with integers, write spaces around the @code{...}
|
||
to prevent it from being parsed wrong. For example, write this:
|
||
|
||
@example
|
||
case 1 ... 5:
|
||
@end example
|
||
|
||
@noindent
|
||
rather than this:
|
||
|
||
@example
|
||
case 1...5:
|
||
@end example
|
||
|
||
@node Null Statement
|
||
@section Null Statement
|
||
@cindex null statement
|
||
@cindex statement, null
|
||
|
||
A @dfn{null statement} is just a semicolon. It does nothing.
|
||
|
||
A null statement is a placeholder for use where a statement is
|
||
grammatically required, but there is nothing to be done. For
|
||
instance, sometimes all the work of a @code{for}-loop is done in the
|
||
@code{for}-header itself, leaving no work for the body. Here is an
|
||
example that searches for the first newline in @code{array}:
|
||
|
||
@example
|
||
for (p = array; *p != '\n'; p++)
|
||
;
|
||
@end example
|
||
|
||
@node goto Statement
|
||
@section @code{goto} Statement and Labels
|
||
@cindex @code{goto} statement
|
||
@cindex statement, @code{goto}
|
||
@cindex label
|
||
@findex goto
|
||
|
||
The @code{goto} statement looks like this:
|
||
|
||
@example
|
||
goto @var{label};
|
||
@end example
|
||
|
||
@noindent
|
||
Its effect is to transfer control immediately to another part of the
|
||
current function---where the label named @var{label} is defined.
|
||
|
||
An ordinary label definition looks like this:
|
||
|
||
@example
|
||
@var{label}:
|
||
@end example
|
||
|
||
@noindent
|
||
and it can appear before any statement. You can't use @code{default}
|
||
as a label, since that has a special meaning for @code{switch}
|
||
statements.
|
||
|
||
An ordinary label doesn't need a separate declaration; defining it is
|
||
enough.
|
||
|
||
Here's an example of using @code{goto} to implement a loop
|
||
equivalent to @code{do}--@code{while}:
|
||
|
||
@example
|
||
@{
|
||
loop_restart:
|
||
@var{body}
|
||
if (@var{condition})
|
||
goto loop_restart;
|
||
@}
|
||
@end example
|
||
|
||
The name space of labels is separate from that of variables and functions.
|
||
Thus, there is no error in using a single name in both ways:
|
||
|
||
@example
|
||
@{
|
||
int foo; // @r{Variable @code{foo}.}
|
||
foo: // @r{Label @code{foo}.}
|
||
@var{body}
|
||
if (foo > 0) // @r{Variable @code{foo}.}
|
||
goto foo; // @r{Label @code{foo}.}
|
||
@}
|
||
@end example
|
||
|
||
Blocks have no effect on ordinary labels; each label name is defined
|
||
throughout the whole of the function it appears in. It looks strange to
|
||
jump into a block with @code{goto}, but it works. For example,
|
||
|
||
@example
|
||
if (x < 0)
|
||
goto negative;
|
||
if (y < 0)
|
||
@{
|
||
negative:
|
||
printf ("Negative\n");
|
||
return;
|
||
@}
|
||
@end example
|
||
|
||
If the goto jumps into the scope of a variable, it does not
|
||
initialize the variable. For example, if @code{x} is negative,
|
||
|
||
@example
|
||
if (x < 0)
|
||
goto negative;
|
||
if (y < 0)
|
||
@{
|
||
int i = 5;
|
||
negative:
|
||
printf ("Negative, and i is %d\n", i);
|
||
return;
|
||
@}
|
||
@end example
|
||
|
||
@noindent
|
||
prints junk because @code{i} was not initialized.
|
||
|
||
If the block declares a variable-length automatic array, jumping into
|
||
it gives a compilation error. However, jumping out of the scope of a
|
||
variable-length array works fine, and deallocates its storage.
|
||
|
||
A label can't come directly before a declaration, so the code can't
|
||
jump directly to one. For example, this is not allowed:
|
||
|
||
@example
|
||
@{
|
||
goto foo;
|
||
foo:
|
||
int x = 5;
|
||
bar(&x);
|
||
@}
|
||
@end example
|
||
|
||
@noindent
|
||
The workaround is to add a statement, even an empty statement,
|
||
directly after the label. For example:
|
||
|
||
@example
|
||
@{
|
||
goto foo;
|
||
foo:
|
||
;
|
||
int x = 5;
|
||
bar(&x);
|
||
@}
|
||
@end example
|
||
|
||
Likewise, a label can't be the last thing in a block. The workaround
|
||
solution is the same: add a semicolon after the label.
|
||
|
||
These unnecessary restrictions on labels make no sense, and ought in
|
||
principle to be removed; but they do only a little harm since labels
|
||
and @code{goto} are rarely the best way to write a program.
|
||
|
||
These examples are all artificial; it would be more natural to
|
||
write them in other ways, without @code{goto}. For instance,
|
||
the clean way to write the example that prints @samp{Negative} is this:
|
||
|
||
@example
|
||
if (x < 0 || y < 0)
|
||
@{
|
||
printf ("Negative\n");
|
||
return;
|
||
@}
|
||
@end example
|
||
|
||
@noindent
|
||
It is hard to construct simple examples where @code{goto} is actually
|
||
the best way to write a program. Its rare good uses tend to be in
|
||
complex code, thus not apt for the purpose of explaining the meaning
|
||
of @code{goto}.
|
||
|
||
The only good time to use @code{goto} is when it makes the code
|
||
simpler than any alternative. Jumping backward is rarely desirable,
|
||
because usually the other looping and control constructs give simpler
|
||
code. Using @code{goto} to jump forward is more often desirable, for
|
||
instance when a function needs to do some processing in an error case
|
||
and errors can occur at various different places within the function.
|
||
|
||
@node Local Labels
|
||
@section Locally Declared Labels
|
||
@cindex local labels
|
||
@cindex macros, local labels
|
||
@findex __label__
|
||
|
||
In GNU C you can declare @dfn{local labels} in any nested block
|
||
scope. A local label is used in a @code{goto} statement just like an
|
||
ordinary label, but you can only reference it within the block in
|
||
which it was declared.
|
||
|
||
A local label declaration looks like this:
|
||
|
||
@example
|
||
__label__ @var{label};
|
||
@end example
|
||
|
||
@noindent
|
||
or
|
||
|
||
@example
|
||
__label__ @var{label1}, @var{label2}, @r{@dots{}};
|
||
@end example
|
||
|
||
Local label declarations must come at the beginning of the block,
|
||
before any ordinary declarations or statements.
|
||
|
||
The label declaration declares the label @emph{name}, but does not define
|
||
the label itself. That's done in the usual way, with
|
||
@code{@var{label}:}, before one of the statements in the block.
|
||
|
||
The local label feature is useful for complex macros. If a macro
|
||
contains nested loops, a @code{goto} can be useful for breaking out of
|
||
them. However, an ordinary label whose scope is the whole function
|
||
cannot be used: if the macro can be expanded several times in one
|
||
function, the label will be multiply defined in that function. A
|
||
local label avoids this problem. For example:
|
||
|
||
@example
|
||
#define SEARCH(value, array, target) \
|
||
do @{ \
|
||
__label__ found; \
|
||
__auto_type _SEARCH_target = (target); \
|
||
__auto_type _SEARCH_array = (array); \
|
||
int i, j; \
|
||
int value; \
|
||
for (i = 0; i < max; i++) \
|
||
for (j = 0; j < max; j++) \
|
||
if (_SEARCH_array[i][j] == _SEARCH_target) \
|
||
@{ (value) = i; goto found; @} \
|
||
(value) = -1; \
|
||
found:; \
|
||
@} while (0)
|
||
@end example
|
||
|
||
This could also be written using a statement expression
|
||
(@pxref{Statement Exprs}):
|
||
|
||
@example
|
||
#define SEARCH(array, target) \
|
||
(@{ \
|
||
__label__ found; \
|
||
__auto_type _SEARCH_target = (target); \
|
||
__auto_type _SEARCH_array = (array); \
|
||
int i, j; \
|
||
int value; \
|
||
for (i = 0; i < max; i++) \
|
||
for (j = 0; j < max; j++) \
|
||
if (_SEARCH_array[i][j] == _SEARCH_target) \
|
||
@{ value = i; goto found; @} \
|
||
value = -1; \
|
||
found: \
|
||
value; \
|
||
@})
|
||
@end example
|
||
|
||
Ordinary labels are visible throughout the function where they are
|
||
defined, and only in that function. However, explicitly declared
|
||
local labels of a block are visible in nested function definitions
|
||
inside that block. @xref{Nested Functions}, for details.
|
||
|
||
@xref{goto Statement}.
|
||
|
||
@node Labels as Values
|
||
@section Labels as Values
|
||
@cindex labels as values
|
||
@cindex computed gotos
|
||
@cindex goto with computed label
|
||
@cindex address of a label
|
||
|
||
In GNU C, you can get the address of a label defined in the current
|
||
function (or a local label defined in the containing function) with
|
||
the unary operator @samp{&&}. The value has type @code{void *}. This
|
||
value is a constant and can be used wherever a constant of that type
|
||
is valid. For example:
|
||
|
||
@example
|
||
void *ptr;
|
||
@r{@dots{}}
|
||
ptr = &&foo;
|
||
@end example
|
||
|
||
To use these values requires a way to jump to one. This is done
|
||
with the computed goto statement@footnote{The analogous feature in
|
||
Fortran is called an assigned goto, but that name seems inappropriate in
|
||
C, since you can do more with label addresses than store them in special label
|
||
variables.}, @code{goto *@var{exp};}. For example,
|
||
|
||
@example
|
||
goto *ptr;
|
||
@end example
|
||
|
||
@noindent
|
||
Any expression of type @code{void *} is allowed.
|
||
|
||
@xref{goto Statement}.
|
||
|
||
@menu
|
||
* Label Value Uses:: Examples of using label values.
|
||
* Label Value Caveats:: Limitations of label values.
|
||
@end menu
|
||
|
||
@node Label Value Uses
|
||
@subsection Label Value Uses
|
||
|
||
One use for label-valued constants is to initialize a static array to
|
||
serve as a jump table:
|
||
|
||
@example
|
||
static void *array[] = @{ &&foo, &&bar, &&hack @};
|
||
@end example
|
||
|
||
Then you can select a label with indexing, like this:
|
||
|
||
@example
|
||
goto *array[i];
|
||
@end example
|
||
|
||
@noindent
|
||
Note that this does not check whether the subscript is in bounds---array
|
||
indexing in C never checks that.
|
||
|
||
You can make the table entries offsets instead of addresses
|
||
by subtracting one label from the others. Here is an example:
|
||
|
||
@example
|
||
static const int array[] = @{ &&foo - &&foo, &&bar - &&foo,
|
||
&&hack - &&foo @};
|
||
goto *(&&foo + array[i]);
|
||
@end example
|
||
|
||
@noindent
|
||
Using offsets is preferable in shared libraries, as it avoids the need
|
||
for dynamic relocation of the array elements; therefore, the array can
|
||
be read-only.
|
||
|
||
An array of label values or offsets serves a purpose much like that of
|
||
the @code{switch} statement. The @code{switch} statement is cleaner,
|
||
so use @code{switch} by preference when feasible.
|
||
|
||
Another use of label values is in an interpreter for threaded code.
|
||
The labels within the interpreter function can be stored in the
|
||
threaded code for super-fast dispatching.
|
||
|
||
@node Label Value Caveats
|
||
@subsection Label Value Caveats
|
||
|
||
Jumping to a label defined in another function does not work.
|
||
It can cause unpredictable results.
|
||
|
||
The best way to avoid this is to store label values only in
|
||
automatic variables, or static variables whose names are declared
|
||
within the function. Never pass them as arguments.
|
||
|
||
@cindex cloning
|
||
An optimization known as @dfn{cloning} generates multiple simplified
|
||
variants of a function's code, for use with specific fixed arguments.
|
||
Using label values in certain ways, such as saving the address in one
|
||
call to the function and using it again in another call, would make cloning
|
||
give incorrect results. These functions must disable cloning.
|
||
|
||
Inlining calls to the function would also result in multiple copies of
|
||
the code, each with its own value of the same label. Using the label
|
||
in a computed goto is no problem, because the computed goto inhibits
|
||
inlining. However, using the label value in some other way, such as
|
||
an indication of where an error occurred, would be optimized wrong.
|
||
These functions must disable inlining.
|
||
|
||
To prevent inlining or cloning of a function, specify
|
||
@code{__attribute__((__noinline__,__noclone__))} in its definition.
|
||
@xref{Attributes}.
|
||
|
||
When a function uses a label value in a static variable initializer,
|
||
that automatically prevents inlining or cloning the function.
|
||
|
||
@node Statement Exprs
|
||
@section Statements and Declarations in Expressions
|
||
@cindex statements inside expressions
|
||
@cindex declarations inside expressions
|
||
@cindex expressions containing statements
|
||
|
||
@c the above section title wrapped and causes an underfull hbox.. i
|
||
@c changed it from "within" to "in". --mew 4feb93
|
||
A block enclosed in parentheses can be used as an expression in GNU
|
||
C@. This provides a way to use local variables, loops and switches within
|
||
an expression. We call it a @dfn{statement expression}.
|
||
|
||
Recall that a block is a sequence of statements
|
||
surrounded by braces. In this construct, parentheses go around the
|
||
braces. For example:
|
||
|
||
@example
|
||
(@{ int y = foo (); int z;
|
||
if (y > 0) z = y;
|
||
else z = - y;
|
||
z; @})
|
||
@end example
|
||
|
||
@noindent
|
||
is a valid (though slightly more complex than necessary) expression
|
||
for the absolute value of @code{foo ()}.
|
||
|
||
The last statement in the block should be an expression statement; an
|
||
expression followed by a semicolon, that is. The value of this
|
||
expression serves as the value of statement expression. If the last
|
||
statement is anything else, the statement expression's value is
|
||
@code{void}.
|
||
|
||
This feature is mainly useful in making macro definitions compute each
|
||
operand exactly once. @xref{Macros and Auto Type}.
|
||
|
||
Statement expressions are not allowed in expressions that must be
|
||
constant, such as the value for an enumerator, the width of a
|
||
bit-field, or the initial value of a static variable.
|
||
|
||
Jumping into a statement expression---with @code{goto}, or using a
|
||
@code{switch} statement outside the statement expression---is an
|
||
error. With a computed @code{goto} (@pxref{Labels as Values}), the
|
||
compiler can't detect the error, but it still won't work.
|
||
|
||
Jumping out of a statement expression is permitted, but since
|
||
subexpressions in C are not computed in a strict order, it is
|
||
unpredictable which other subexpressions will have been computed by
|
||
then. For example,
|
||
|
||
@example
|
||
foo (), ((@{ bar1 (); goto a; 0; @}) + bar2 ()), baz();
|
||
@end example
|
||
|
||
@noindent
|
||
calls @code{foo} and @code{bar1} before it jumps, and never
|
||
calls @code{baz}, but may or may not call @code{bar2}. If @code{bar2}
|
||
does get called, that occurs after @code{foo} and before @code{bar1}.
|
||
|
||
@node Variables
|
||
@chapter Variables
|
||
@cindex variables
|
||
|
||
Every variable used in a C program needs to be made known by a
|
||
@dfn{declaration}. It can be used only after it has been declared.
|
||
It is an error to declare a variable name more than once in the same
|
||
scope; an exception is that @code{extern} declarations and tentative
|
||
definitions can coexist with another declaration of the same
|
||
variable.
|
||
|
||
Variables can be declared anywhere within a block or file. (Older
|
||
versions of C required that all variable declarations within a block
|
||
occur before any statements.)
|
||
|
||
Variables declared within a function or block are @dfn{local} to
|
||
it. This means that the variable name is visible only until the end
|
||
of that function or block, and the memory space is allocated only
|
||
while control is within it.
|
||
|
||
Variables declared at the top level in a file are called @dfn{file-scope}.
|
||
They are assigned fixed, distinct memory locations, so they retain
|
||
their values for the whole execution of the program.
|
||
|
||
@menu
|
||
* Variable Declarations:: Name a variable and reserve space for it.
|
||
* Initializers:: Assigning initial values to variables.
|
||
* Designated Inits:: Assigning initial values to array elements
|
||
at particular array indices.
|
||
* Auto Type:: Obtaining the type of a variable.
|
||
* Local Variables:: Variables declared in function definitions.
|
||
* File-Scope Variables:: Variables declared outside of
|
||
function definitions.
|
||
* Static Local Variables:: Variables declared within functions,
|
||
but with permanent storage allocation.
|
||
* Extern Declarations:: Declaring a variable
|
||
which is allocated somewhere else.
|
||
* Allocating File-Scope:: When is space allocated
|
||
for file-scope variables?
|
||
* auto and register:: Historically used storage directions.
|
||
* Omitting Types:: The bad practice of declaring variables
|
||
with implicit type.
|
||
@end menu
|
||
|
||
@node Variable Declarations
|
||
@section Variable Declarations
|
||
@cindex variable declarations
|
||
@cindex declaration of variables
|
||
|
||
Here's what a variable declaration looks like:
|
||
|
||
@example
|
||
@var{keywords} @var{basetype} @var{decorated-variable} @r{[}= @var{init}@r{]};
|
||
@end example
|
||
|
||
The @var{keywords} specify how to handle the scope of the variable
|
||
name and the allocation of its storage. Most declarations have
|
||
no keywords because the defaults are right for them.
|
||
|
||
C allows these keywords to come before or after @var{basetype}, or
|
||
even in the middle of it as in @code{unsigned static int}, but don't
|
||
do that---it would surprise other programmers. Always write the
|
||
keywords first.
|
||
|
||
The @var{basetype} can be any of the predefined types of C, or a type
|
||
keyword defined with @code{typedef}. It can also be @code{struct
|
||
@var{tag}}, @code{union @var{tag}}, or @code{enum @var{tag}}. In
|
||
addition, it can include type qualifiers such as @code{const} and
|
||
@code{volatile} (@pxref{Type Qualifiers}).
|
||
|
||
In the simplest case, @var{decorated-variable} is just the variable
|
||
name. That declares the variable with the type specified by
|
||
@var{basetype}. For instance,
|
||
|
||
@example
|
||
int foo;
|
||
@end example
|
||
|
||
@noindent
|
||
uses @code{int} as the @var{basetype} and @code{foo} as the
|
||
@var{decorated-variable}. It declares @code{foo} with type
|
||
@code{int}.
|
||
|
||
@example
|
||
struct tree_node foo;
|
||
@end example
|
||
|
||
@noindent
|
||
declares @code{foo} with type @code{struct tree_node}.
|
||
|
||
@menu
|
||
* Declaring Arrays and Pointers:: Declaration syntax for variables of
|
||
array and pointer types.
|
||
* Combining Variable Declarations:: More than one variable declaration
|
||
in a single statement.
|
||
@end menu
|
||
|
||
@node Declaring Arrays and Pointers
|
||
@subsection Declaring Arrays and Pointers
|
||
@cindex declaring arrays and pointers
|
||
@cindex array, declaring
|
||
@cindex pointers, declaring
|
||
|
||
To declare a variable that is an array, write
|
||
@code{@var{variable}[@var{length}]} for @var{decorated-variable}:
|
||
|
||
@example
|
||
int foo[5];
|
||
@end example
|
||
|
||
To declare a variable that has a pointer type, write
|
||
@code{*@var{variable}} for @var{decorated-variable}:
|
||
|
||
@example
|
||
struct list_elt *foo;
|
||
@end example
|
||
|
||
These constructs nest. For instance,
|
||
|
||
@example
|
||
int foo[3][5];
|
||
@end example
|
||
|
||
@noindent
|
||
declares @code{foo} as an array of 3 arrays of 5 integers each,
|
||
|
||
@example
|
||
struct list_elt *foo[5];
|
||
@end example
|
||
|
||
@noindent
|
||
declares @code{foo} as an array of 5 pointers to structures, and
|
||
|
||
@example
|
||
struct list_elt **foo;
|
||
@end example
|
||
|
||
@noindent
|
||
declares @code{foo} as a pointer to a pointer to a structure.
|
||
|
||
@example
|
||
int **(*foo[30])(int, double);
|
||
@end example
|
||
|
||
@noindent
|
||
declares @code{foo} as an array of 30 pointers to functions
|
||
(@pxref{Function Pointers}), each of which must accept two arguments
|
||
(one @code{int} and one @code{double}) and return type @code{int **}.
|
||
|
||
@example
|
||
void
|
||
bar (int size)
|
||
@{
|
||
int foo[size];
|
||
@r{@dots{}}
|
||
@}
|
||
@end example
|
||
|
||
@noindent
|
||
declares @code{foo} as an array of integers with a size specified at
|
||
run time when the function @code{bar} is called.
|
||
|
||
@node Combining Variable Declarations
|
||
@subsection Combining Variable Declarations
|
||
@cindex combining variable declarations
|
||
@cindex variable declarations, combining
|
||
@cindex declarations, combining
|
||
|
||
When multiple declarations have the same @var{keywords} and
|
||
@var{basetype}, you can combine them using commas. Thus,
|
||
|
||
@example
|
||
@var{keywords} @var{basetype}
|
||
@var{decorated-variable-1} @r{[}= @var{init1}@r{]},
|
||
@var{decorated-variable-2} @r{[}= @var{init2}@r{]};
|
||
@end example
|
||
|
||
@noindent
|
||
is equivalent to
|
||
|
||
@example
|
||
@var{keywords} @var{basetype}
|
||
@var{decorated-variable-1} @r{[}= @var{init1}@r{]};
|
||
@var{keywords} @var{basetype}
|
||
@var{decorated-variable-2} @r{[}= @var{init2}@r{]};
|
||
@end example
|
||
|
||
Here are some simple examples:
|
||
|
||
@example
|
||
int a, b;
|
||
int a = 1, b = 2;
|
||
int a, *p, array[5];
|
||
int a = 0, *p = &a, array[5] = @{1, 2@};
|
||
@end example
|
||
|
||
@noindent
|
||
In the last two examples, @code{a} is an @code{int}, @code{p} is a
|
||
pointer to @code{int}, and @code{array} is an array of 5 @code{int}s.
|
||
Since the initializer for @code{array} specifies only two elements,
|
||
the other three elements are initialized to zero.
|
||
|
||
@node Initializers
|
||
@section Initializers
|
||
@cindex initializers
|
||
|
||
A variable's declaration, unless it is @code{extern}, should also
|
||
specify its initial value. For numeric and pointer-type variables,
|
||
the initializer is an expression for the value. If necessary, it is
|
||
converted to the variable's type, just as in an assignment.
|
||
|
||
You can also initialize a local structure-type (@pxref{Structures}) or
|
||
local union-type (@pxref{Unions}) variable this way, from an
|
||
expression whose value has the same type. But you can't initialize an
|
||
array this way (@pxref{Arrays}), since arrays are not first-class
|
||
objects in C (@pxref{Limitations of C Arrays}) and there is no array
|
||
assignment.
|
||
|
||
You can initialize arrays and structures componentwise,
|
||
with a list of the elements or components. You can initialize
|
||
a union with any one of its alternatives.
|
||
|
||
@itemize @bullet
|
||
@item
|
||
A component-wise initializer for an array consists of element values
|
||
surrounded by @samp{@{@r{@dots{}}@}}. If the values in the initializer
|
||
don't cover all the elements in the array, the remaining elements are
|
||
initialized to zero.
|
||
|
||
You can omit the size of the array when you declare it, and let
|
||
the initializer specify the size:
|
||
|
||
@example
|
||
int array[] = @{ 3, 9, 12 @};
|
||
@end example
|
||
|
||
@item
|
||
A component-wise initializer for a structure consists of field values
|
||
surrounded by @samp{@{@r{@dots{}}@}}. Write the field values in the same
|
||
order as the fields are declared in the structure. If the values in
|
||
the initializer don't cover all the fields in the structure, the
|
||
remaining fields are initialized to zero.
|
||
|
||
@item
|
||
The initializer for a union-type variable has the form @code{@{
|
||
@var{value} @}}, where @var{value} initializes the @emph{first alternative}
|
||
in the union definition.
|
||
@end itemize
|
||
|
||
For an array of arrays, a structure containing arrays, an array of
|
||
structures, etc., you can nest these constructs. For example,
|
||
|
||
@example
|
||
struct point @{ double x, y; @};
|
||
|
||
struct point series[]
|
||
= @{ @{0, 0@}, @{1.5, 2.8@}, @{99, 100.0004@} @};
|
||
@end example
|
||
|
||
You can omit a pair of inner braces if they contain the right
|
||
number of elements for the sub-value they initialize, so that
|
||
no elements or fields need to be filled in with zeros.
|
||
But don't do that very much, as it gets confusing.
|
||
|
||
An array of @code{char} can be initialized using a string constant.
|
||
Recall that the string constant includes an implicit null character at
|
||
the end (@pxref{String Constants}). Using a string constant as
|
||
initializer means to use its contents as the initial values of the
|
||
array elements. Here are examples:
|
||
|
||
@example
|
||
char text[6] = "text!"; /* @r{Includes the null.} */
|
||
char text[5] = "text!"; /* @r{Excludes the null.} */
|
||
char text[] = "text!"; /* @r{Gets length 6.} */
|
||
char text[]
|
||
= @{ 't', 'e', 'x', 't', '!', 0 @}; /* @r{same as above.} */
|
||
char text[] = @{ "text!" @}; /* @r{Braces are optional.} */
|
||
@end example
|
||
|
||
@noindent
|
||
and this kind of initializer can be nested inside braces to initialize
|
||
structures or arrays that contain a @code{char}-array.
|
||
|
||
In like manner, you can use a wide string constant to initialize
|
||
an array of @code{wchar_t}.
|
||
|
||
@node Designated Inits
|
||
@section Designated Initializers
|
||
@cindex initializers with labeled elements
|
||
@cindex labeled elements in initializers
|
||
@cindex case labels in initializers
|
||
@cindex designated initializers
|
||
|
||
In a complex structure or long array, it's useful to indicate
|
||
which field or element we are initializing.
|
||
|
||
To designate specific array elements during initialization, include
|
||
the array index in brackets, and an assignment operator, for each
|
||
element:
|
||
|
||
@example
|
||
int foo[10] = @{ [3] = 42, [7] = 58 @};
|
||
@end example
|
||
|
||
@noindent
|
||
This does the same thing as:
|
||
|
||
@example
|
||
int foo[10] = @{ 0, 0, 0, 42, 0, 0, 0, 58, 0, 0 @};
|
||
@end example
|
||
|
||
The array initialization can include non-designated element values
|
||
alongside designated indices; these follow the expected ordering
|
||
of the array initialization, so that
|
||
|
||
@example
|
||
int foo[10] = @{ [3] = 42, 43, 44, [7] = 58 @};
|
||
@end example
|
||
|
||
@noindent
|
||
does the same thing as:
|
||
|
||
@example
|
||
int foo[10] = @{ 0, 0, 0, 42, 43, 44, 0, 58, 0, 0 @};
|
||
@end example
|
||
|
||
Note that you can only use constant expressions as array index values,
|
||
not variables.
|
||
|
||
If you need to initialize a subsequence of sequential array elements to
|
||
the same value, you can specify a range:
|
||
|
||
@example
|
||
int foo[100] = @{ [0 ... 19] = 42, [20 ... 99] = 43 @};
|
||
@end example
|
||
|
||
@noindent
|
||
Using a range this way is a GNU C extension.
|
||
|
||
When subsequence ranges overlap, each element is initialized by the
|
||
last specification that applies to it. Thus, this initialization is
|
||
equivalent to the previous one.
|
||
|
||
@example
|
||
int foo[100] = @{ [0 ... 99] = 43, [0 ... 19] = 42 @};
|
||
@end example
|
||
|
||
@noindent
|
||
as the second overrides the first for elements 0 through 19.
|
||
|
||
The value used to initialize a range of elements is evaluated only
|
||
once, for the first element in the range. So for example, this code
|
||
|
||
@example
|
||
int random_values[100]
|
||
= @{ [0 ... 99] = get_random_number() @};
|
||
@end example
|
||
|
||
@noindent
|
||
would initialize all 100 elements of the array @code{random_values} to
|
||
the same value---probably not what is intended.
|
||
|
||
Similarly, you can initialize specific fields of a structure variable
|
||
by specifying the field name prefixed with a dot:
|
||
|
||
@example
|
||
struct point @{ int x; int y; @};
|
||
|
||
struct point foo = @{ .y = 42 @};
|
||
@end example
|
||
|
||
@noindent
|
||
The same syntax works for union variables as well:
|
||
|
||
@example
|
||
union int_double @{ int i; double d; @};
|
||
|
||
union int_double foo = @{ .d = 34 @};
|
||
@end example
|
||
|
||
@noindent
|
||
This casts the integer value 34 to a double and stores it
|
||
in the union variable @code{foo}.
|
||
|
||
You can designate both array elements and structure elements in
|
||
the same initialization; for example, here's an array of point
|
||
structures:
|
||
|
||
@example
|
||
struct point point_array[10] = @{ [4].y = 32, [6].y = 39 @};
|
||
@end example
|
||
|
||
Along with the capability to specify particular array and structure
|
||
elements to initialize comes the possibility of initializing the same
|
||
element more than once:
|
||
|
||
@example
|
||
int foo[10] = @{ [4] = 42, [4] = 98 @};
|
||
@end example
|
||
|
||
@noindent
|
||
In such a case, the last initialization value is retained.
|
||
|
||
@node Auto Type
|
||
@section Referring to a Type with @code{__auto_type}
|
||
@findex __auto_type
|
||
@findex typeof
|
||
@cindex macros, types of arguments
|
||
|
||
You can declare a variable copying the type from
|
||
the initializer by using @code{__auto_type} instead of a particular type.
|
||
Here's an example:
|
||
|
||
@example
|
||
#define max(a,b) \
|
||
(@{ __auto_type _a = (a); \
|
||
__auto_type _b = (b); \
|
||
_a > _b ? _a : _b @})
|
||
@end example
|
||
|
||
This defines @code{_a} to be of the same type as @code{a}, and
|
||
@code{_b} to be of the same type as @code{b}. This is a useful thing
|
||
to do in a macro that ought to be able to handle any type of data
|
||
(@pxref{Macros and Auto Type}).
|
||
|
||
The original GNU C method for obtaining the type of a value is to use
|
||
@code{typeof}, which takes as an argument either a value or the name of
|
||
a type. The previous example could also be written as:
|
||
|
||
@example
|
||
#define max(a,b) \
|
||
(@{ typeof(a) _a = (a); \
|
||
typeof(b) _b = (b); \
|
||
_a > _b ? _a : _b @})
|
||
@end example
|
||
|
||
@code{typeof} is more flexible than @code{__auto_type}; however, the
|
||
principal use case for @code{typeof} is in variable declarations with
|
||
initialization, which is exactly what @code{__auto_type} handles.
|
||
|
||
@node Local Variables
|
||
@section Local Variables
|
||
@cindex local variables
|
||
@cindex variables, local
|
||
|
||
Declaring a variable inside a function definition (@pxref{Function
|
||
Definitions}) makes the variable name @dfn{local} to the containing
|
||
block---that is, the containing pair of braces. More precisely, the
|
||
variable's name is visible starting just after where it appears in the
|
||
declaration, and its visibility continues until the end of the block.
|
||
|
||
Local variables in C are generally @dfn{automatic} variables: each
|
||
variable's storage exists only from the declaration to the end of the
|
||
block. Execution of the declaration allocates the storage, computes
|
||
the initial value, and stores it in the variable. The end of the
|
||
block deallocates the storage.@footnote{Due to compiler optimizations,
|
||
allocation and deallocation don't necessarily really happen at
|
||
those times.}
|
||
|
||
@strong{Warning:} Two declarations for the same local variable
|
||
in the same scope are an error.
|
||
|
||
@strong{Warning:} Automatic variables are stored in the run-time stack.
|
||
The total space for the program's stack may be limited; therefore,
|
||
in using very large arrays, it may be necessary to allocate
|
||
them in some other way to stop the program from crashing.
|
||
|
||
@strong{Warning:} If the declaration of an automatic variable does not
|
||
specify an initial value, the variable starts out containing garbage.
|
||
In this example, the value printed could be anything at all:
|
||
|
||
@example
|
||
@{
|
||
int i;
|
||
|
||
printf ("Print junk %d\n", i);
|
||
@}
|
||
@end example
|
||
|
||
In a simple test program, that statement is likely to print 0, simply
|
||
because every process starts with memory zeroed. But don't rely on it
|
||
to be zero---that is erroneous.
|
||
|
||
@strong{Note:} Make sure to store a value into each local variable (by
|
||
assignment, or by initialization) before referring to its value.
|
||
|
||
@node File-Scope Variables
|
||
@section File-Scope Variables
|
||
@cindex file-scope variables
|
||
@cindex global variables
|
||
@cindex variables, file-scope
|
||
@cindex variables, global
|
||
|
||
A variable declaration at the top level in a file (not inside a
|
||
function definition) declares a @dfn{file-scope variable}. Loading a
|
||
program allocates the storage for all the file-scope variables in it,
|
||
and initializes them too.
|
||
|
||
Each file-scope variable is either @dfn{static} (limited to one
|
||
compilation module) or @dfn{global} (shared with all compilation
|
||
modules in the program). To make the variable static, write the
|
||
keyword @code{static} at the start of the declaration. Omitting
|
||
@code{static} makes the variable global.
|
||
|
||
The initial value for a file-scope variable can't depend on the
|
||
contents of storage, and can't call any functions.
|
||
|
||
@example
|
||
int foo = 5; /* @r{Valid.} */
|
||
int bar = foo; /* @r{Invalid!} */
|
||
int bar = sin (1.0); /* @r{Invalid!} */
|
||
@end example
|
||
|
||
But it can use the address of another file-scope variable:
|
||
|
||
@example
|
||
int foo;
|
||
int *bar = &foo; /* @r{Valid.} */
|
||
int arr[5];
|
||
int *bar3 = &arr[3]; /* @r{Valid.} */
|
||
int *bar4 = arr + 4; /* @r{Valid.} */
|
||
@end example
|
||
|
||
It is valid for a module to have multiple declarations for a
|
||
file-scope variable, as long as they are all global or all static, but
|
||
at most one declaration can specify an initial value for it.
|
||
|
||
@node Static Local Variables
|
||
@section Static Local Variables
|
||
@cindex static local variables
|
||
@cindex variables, static local
|
||
@findex static
|
||
|
||
The keyword @code{static} in a local variable declaration says to
|
||
allocate the storage for the variable permanently, just like a
|
||
file-scope variable, even if the declaration is within a function.
|
||
|
||
Here's an example:
|
||
|
||
@example
|
||
int
|
||
increment_counter ()
|
||
@{
|
||
static int counter = 0;
|
||
return ++counter;
|
||
@}
|
||
@end example
|
||
|
||
The scope of the name @code{counter} runs from the declaration to the
|
||
end of the containing block, just like an automatic local variable,
|
||
but its storage is permanent, so the value persists from one call to
|
||
the next. As a result, each call to @code{increment_counter}
|
||
returns a different, unique value.
|
||
|
||
The initial value of a static local variable has the same limitations
|
||
as for file-scope variables: it can't depend on the contents of
|
||
storage or call any functions. It can use the address of a file-scope
|
||
variable or a static local variable, because those addresses are
|
||
determined before the program runs.
|
||
|
||
@node Extern Declarations
|
||
@section @code{extern} Declarations
|
||
@cindex @code{extern} declarations
|
||
@cindex declarations, @code{extern}
|
||
@findex extern
|
||
|
||
An @code{extern} declaration is used to refer to a global variable
|
||
whose principal declaration comes elsewhere---in the same module, or in
|
||
another compilation module. It looks like this:
|
||
|
||
@example
|
||
extern @var{basetype} @var{decorated-variable};
|
||
@end example
|
||
|
||
Its meaning is that, in the current scope, the variable name refers to
|
||
the file-scope variable of that name---which needs to be declared in a
|
||
non-@code{extern}, non-@code{static} way somewhere else.
|
||
|
||
For instance, if one compilation module has this global variable
|
||
declaration
|
||
|
||
@example
|
||
int error_count = 0;
|
||
@end example
|
||
|
||
@noindent
|
||
then other compilation modules can specify this
|
||
|
||
@example
|
||
extern int error_count;
|
||
@end example
|
||
|
||
@noindent
|
||
to allow reference to the same variable.
|
||
|
||
The usual place to write an @code{extern} declaration is at top level
|
||
in a source file, but you can write an @code{extern} declaration
|
||
inside a block to make a global or static file-scope variable
|
||
accessible in that block.
|
||
|
||
Since an @code{extern} declaration does not allocate space for the
|
||
variable, it can omit the size of an array:
|
||
|
||
@example
|
||
extern int array[];
|
||
@end example
|
||
|
||
You can use @code{array} normally in all contexts where it is
|
||
converted automatically to a pointer. However, to use it as the
|
||
operand of @code{sizeof} is an error, since the size is unknown.
|
||
|
||
It is valid to have multiple @code{extern} declarations for the same
|
||
variable, even in the same scope, if they give the same type. They do
|
||
not conflict---they agree. For an array, it is legitimate for some
|
||
@code{extern} declarations can specify the size while others omit it.
|
||
However, if two declarations give different sizes, that is an error.
|
||
|
||
Likewise, you can use @code{extern} declarations at file scope
|
||
(@pxref{File-Scope Variables}) followed by an ordinary global
|
||
(non-static) declaration of the same variable. They do not conflict,
|
||
because they say compatible things about the same meaning of the variable.
|
||
|
||
@node Allocating File-Scope
|
||
@section Allocating File-Scope Variables
|
||
@cindex allocation file-scope variables
|
||
@cindex file-scope variables, allocating
|
||
|
||
Some file-scope declarations allocate space for the variable, and some
|
||
don't.
|
||
|
||
A file-scope declaration with an initial value @emph{must} allocate
|
||
space for the variable; if there are two of such declarations for the
|
||
same variable, even in different compilation modules, they conflict.
|
||
|
||
An @code{extern} declaration @emph{never} allocates space for the variable.
|
||
If all the top-level declarations of a certain variable are
|
||
@code{extern}, the variable never gets memory space. If that variable
|
||
is used anywhere in the program, the use will be reported as an error,
|
||
saying that the variable is not defined.
|
||
|
||
@cindex tentative definition
|
||
A file-scope declaration without an initial value is called a
|
||
@dfn{tentative definition}. This is a strange hybrid: it @emph{can}
|
||
allocate space for the variable, but does not insist. So it causes no
|
||
conflict, no error, if the variable has another declaration that
|
||
allocates space for it, perhaps in another compilation module. But if
|
||
nothing else allocates space for the variable, the tentative
|
||
definition will do it. Any number of compilation modules can declare
|
||
the same variable in this way, and that is sufficient for all of them
|
||
to use the variable.
|
||
|
||
@c @opindex -fno-common
|
||
@c @opindex -fcommon
|
||
In programs that are very large or have many contributors, it may be
|
||
wise to adopt the convention of never using tentative definitions.
|
||
You can use the compilation option @option{-fno-common} to make them
|
||
an error, or @option{-fcommon} to enable them. The default depends
|
||
on the version of GCC and its target.
|
||
|
||
If a file-scope variable gets its space through a tentative
|
||
definition, it starts out containing all zeros.
|
||
|
||
@node auto and register
|
||
@section @code{auto} and @code{register}
|
||
@cindex @code{auto} declarations
|
||
@cindex @code{register} declarations
|
||
@findex auto
|
||
@findex register
|
||
|
||
For historical reasons, you can write @code{auto} or @code{register}
|
||
before a local variable declaration. @code{auto} merely emphasizes
|
||
that the variable isn't static; it changes nothing.
|
||
|
||
@code{register} suggests to the compiler storing this variable in a
|
||
register. However, GNU C ignores this suggestion, since it can
|
||
choose the best variables to store in registers without any hints.
|
||
|
||
It is an error to take the address of a variable declared
|
||
@code{register}, so you cannot use the unary @samp{&} operator on it.
|
||
If the variable is an array, you can't use it at all (other than as
|
||
the operand of @code{sizeof}), which makes it rather useless.
|
||
|
||
@node Omitting Types
|
||
@section Omitting Types in Declarations
|
||
@cindex omitting types in declarations
|
||
|
||
The syntax of C traditionally allows omitting the data type in a
|
||
declaration if it specifies a storage class, a type qualifier (see the
|
||
next chapter), or @code{auto} or @code{register}. Then the type
|
||
defaults to @code{int}. For example:
|
||
|
||
@example
|
||
auto foo = 42;
|
||
@end example
|
||
|
||
This is bad practice; if you see it, fix it.
|
||
|
||
@node Type Qualifiers
|
||
@chapter Type Qualifiers
|
||
|
||
A declaration can include type qualifiers to advise the compiler
|
||
about how the variable will be used. There are three different
|
||
qualifiers, @code{const}, @code{volatile} and @code{restrict}. They
|
||
pertain to different issues, so you can use more than one together.
|
||
For instance, @code{const volatile} describes a value that the
|
||
program is not allowed to change, but might have a different value
|
||
each time the program examines it. (This might perhaps be a special
|
||
hardware register, or part of shared memory.)
|
||
|
||
If you are just learning C, you can skip this chapter.
|
||
|
||
@menu
|
||
* const:: Variables whose values don't change.
|
||
* volatile:: Variables whose values may be accessed
|
||
or changed outside of the control of
|
||
this program.
|
||
* restrict Pointers:: Restricted pointers for code optimization.
|
||
* restrict Pointer Example:: Example of how that works.
|
||
@end menu
|
||
|
||
@node const
|
||
@section @code{const} Variables and Fields
|
||
@cindex @code{const} variables and fields
|
||
@cindex variables, @code{const}
|
||
@findex const
|
||
|
||
You can mark a variable as ``constant'' by writing @code{const} in
|
||
front of the declaration. This says to treat any assignment to that
|
||
variable as an error. It may also permit some compiler
|
||
optimizations---for instance, to fetch the value only once to satisfy
|
||
multiple references to it. The construct looks like this:
|
||
|
||
@example
|
||
const double pi = 3.14159;
|
||
@end example
|
||
|
||
After this definition, the code can use the variable @code{pi}
|
||
but cannot assign a different value to it.
|
||
|
||
@example
|
||
pi = 3.0; /* @r{Error!} */
|
||
@end example
|
||
|
||
Simple variables that are constant can be used for the same purposes
|
||
as enumeration constants, and they are not limited to integers. The
|
||
constantness of the variable propagates into pointers, too.
|
||
|
||
A pointer type can specify that the @emph{target} is constant. For
|
||
example, the pointer type @code{const double *} stands for a pointer
|
||
to a constant @code{double}. That's the type that results from taking
|
||
the address of @code{pi}. Such a pointer can't be dereferenced in the
|
||
left side of an assignment.
|
||
|
||
@example
|
||
*(&pi) = 3.0; /* @r{Error!} */
|
||
@end example
|
||
|
||
Nonconstant pointers can be converted automatically to constant
|
||
pointers, but not vice versa. For instance,
|
||
|
||
@example
|
||
const double *cptr;
|
||
double *ptr;
|
||
|
||
cptr = π /* @r{Valid.} */
|
||
cptr = ptr; /* @r{Valid.} */
|
||
ptr = cptr; /* @r{Error!} */
|
||
ptr = π /* @r{Error!} */
|
||
@end example
|
||
|
||
This is not an ironclad protection against modifying the value. You
|
||
can always cast the constant pointer to a nonconstant pointer type:
|
||
|
||
@example
|
||
ptr = (double *)cptr; /* @r{Valid.} */
|
||
ptr = (double *)π /* @r{Valid.} */
|
||
@end example
|
||
|
||
However, @code{const} provides a way to show that a certain function
|
||
won't modify the data structure whose address is passed to it. Here's
|
||
an example:
|
||
|
||
@example
|
||
int
|
||
string_length (const char *string)
|
||
@{
|
||
int count = 0;
|
||
while (*string++)
|
||
count++;
|
||
return count;
|
||
@}
|
||
@end example
|
||
|
||
@noindent
|
||
Using @code{const char *} for the parameter is a way of saying this
|
||
function never modifies the memory of the string itself.
|
||
|
||
In calling @code{string_length}, you can specify an ordinary
|
||
@code{char *} since that can be converted automatically to @code{const
|
||
char *}.
|
||
|
||
@node volatile
|
||
@section @code{volatile} Variables and Fields
|
||
@cindex @code{volatile} variables and fields
|
||
@cindex variables, @code{volatile}
|
||
@findex volatile
|
||
|
||
The GNU C compiler often performs optimizations that eliminate the
|
||
need to write or read a variable. For instance,
|
||
|
||
@example
|
||
int foo;
|
||
foo = 1;
|
||
foo++;
|
||
@end example
|
||
|
||
@noindent
|
||
might simply store the value 2 into @code{foo}, without ever storing 1.
|
||
These optimizations can also apply to structure fields in some cases.
|
||
|
||
If the memory containing @code{foo} is shared with another program,
|
||
or if it is examined asynchronously by hardware, such optimizations
|
||
could confuse the communication. Using @code{volatile} is one way
|
||
to prevent them.
|
||
|
||
Writing @code{volatile} with the type in a variable or field declaration
|
||
says that the value may be examined or changed for reasons outside the
|
||
control of the program at any moment. Therefore, the program must
|
||
execute in a careful way to assure correct interaction with those
|
||
accesses, whenever they may occur.
|
||
|
||
The simplest use looks like this:
|
||
|
||
@example
|
||
volatile int lock;
|
||
@end example
|
||
|
||
This directs the compiler not to do certain common optimizations on
|
||
use of the variable @code{lock}. All the reads and writes for a volatile
|
||
variable or field are really done, and done in the order specified
|
||
by the source code. Thus, this code:
|
||
|
||
@example
|
||
lock = 1;
|
||
list = list->next;
|
||
if (lock)
|
||
lock_broken (&lock);
|
||
lock = 0;
|
||
@end example
|
||
|
||
@noindent
|
||
really stores the value 1 in @code{lock}, even though there is no
|
||
sign it is really used, and the @code{if} statement reads and
|
||
checks the value of @code{lock}, rather than assuming it is still 1.
|
||
|
||
A limited amount of optimization can be done, in principle, on
|
||
@code{volatile} variables and fields: multiple references between two
|
||
sequence points (@pxref{Sequence Points}) can be simplified together.
|
||
|
||
Use of @code{volatile} does not eliminate the flexibility in ordering
|
||
the computation of the operands of most operators. For instance, in
|
||
@code{lock + foo ()}, the order of accessing @code{lock} and calling
|
||
@code{foo} is not specified, so they may be done in either order; the
|
||
fact that @code{lock} is @code{volatile} has no effect on that.
|
||
|
||
@node restrict Pointers
|
||
@section @code{restrict}-Qualified Pointers
|
||
@cindex @code{restrict} pointers
|
||
@cindex pointers, @code{restrict}-qualified
|
||
@findex restrict
|
||
|
||
You can declare a pointer as ``restricted'' using the @code{restrict}
|
||
type qualifier, like this:
|
||
|
||
@example
|
||
int *restrict p = x;
|
||
@end example
|
||
|
||
@noindent
|
||
This enables better optimization of code that uses the pointer.
|
||
|
||
If @code{p} is declared with @code{restrict}, and then the code
|
||
references the object that @code{p} points to (using @code{*p} or
|
||
@code{p[@var{i}]}), the @code{restrict} declaration promises that the
|
||
code will not access that object in any other way---only through
|
||
@code{p}.
|
||
|
||
For instance, it means the code must not use another pointer
|
||
to access the same space, as shown here:
|
||
|
||
@example
|
||
int *restrict p = @var{whatever};
|
||
int *q = p;
|
||
foo (*p, *q);
|
||
@end example
|
||
|
||
@noindent
|
||
That contradicts the @code{restrict} promise by accessing the object
|
||
that @code{p} points to using @code{q}, which bypasses @code{p}.
|
||
Likewise, it must not do this:
|
||
|
||
@example
|
||
int *restrict p = @var{whatever};
|
||
struct @{ int *a, *b; @} s;
|
||
s.a = p;
|
||
foo (*p, *s.a);
|
||
@end example
|
||
|
||
@noindent
|
||
This example uses a structure field instead of the variable @code{q}
|
||
to hold the other pointer, and that contradicts the promise just the
|
||
same.
|
||
|
||
The keyword @code{restrict} also promises that @code{p} won't point to
|
||
the allocated space of any automatic or static variable. So the code
|
||
must not do this:
|
||
|
||
@example
|
||
int a;
|
||
int *restrict p = &a;
|
||
foo (*p, a);
|
||
@end example
|
||
|
||
@noindent
|
||
because that does direct access to the object (@code{a}) that @code{p}
|
||
points to, which bypasses @code{p}.
|
||
|
||
If the code makes such promises with @code{restrict} then breaks them,
|
||
execution is unpredictable.
|
||
|
||
@node restrict Pointer Example
|
||
@section @code{restrict} Pointer Example
|
||
|
||
Here are examples where @code{restrict} enables real optimization.
|
||
|
||
In this example, @code{restrict} assures GCC that the array @code{out}
|
||
points to does not overlap with the array @code{in} points to.
|
||
|
||
@example
|
||
void
|
||
process_data (const char *in,
|
||
char * restrict out,
|
||
size_t size)
|
||
@{
|
||
for (i = 0; i < size; i++)
|
||
out[i] = in[i] + in[i + 1];
|
||
@}
|
||
@end example
|
||
|
||
Here's a simple tree structure, where each tree node holds data of
|
||
type @code{PAYLOAD} plus two subtrees.
|
||
|
||
@example
|
||
struct foo
|
||
@{
|
||
PAYLOAD payload;
|
||
struct foo *left;
|
||
struct foo *right;
|
||
@};
|
||
@end example
|
||
|
||
Now here's a function to null out both pointers in the @code{left}
|
||
subtree.
|
||
|
||
@example
|
||
void
|
||
null_left (struct foo *a)
|
||
@{
|
||
a->left->left = NULL;
|
||
a->left->right = NULL;
|
||
@}
|
||
@end example
|
||
|
||
Since @code{*a} and @code{*a->left} have the same data type,
|
||
they could legitimately alias (@pxref{Aliasing}). Therefore,
|
||
the compiled code for @code{null_left} must read @code{a->left}
|
||
again from memory when executing the second assignment statement.
|
||
|
||
We can enable optimization, so that it does not need to read
|
||
@code{a->left} again, by writing @code{null_left} in a less
|
||
obvious way.
|
||
|
||
@example
|
||
void
|
||
null_left (struct foo *a)
|
||
@{
|
||
struct foo *b = a->left;
|
||
b->left = NULL;
|
||
b->right = NULL;
|
||
@}
|
||
@end example
|
||
|
||
A more elegant way to fix this is with @code{restrict}.
|
||
|
||
@example
|
||
void
|
||
null_left (struct foo *restrict a)
|
||
@{
|
||
a->left->left = NULL;
|
||
a->left->right = NULL;
|
||
@}
|
||
@end example
|
||
|
||
Declaring @code{a} as @code{restrict} asserts that other pointers such
|
||
as @code{a->left} will not point to the same memory space as @code{a}.
|
||
Therefore, the memory location @code{a->left->left} cannot be the same
|
||
memory as @code{a->left}. Knowing this, the compiled code may avoid
|
||
reloading @code{a->left} for the second statement.
|
||
|
||
@node Functions
|
||
@chapter Functions
|
||
@cindex functions
|
||
|
||
We have already presented many examples of functions, so if you've
|
||
read this far, you basically understand the concept of a function. It
|
||
is vital, nonetheless, to have a chapter in the manual that collects
|
||
all the information about functions.
|
||
|
||
@menu
|
||
* Function Definitions:: Writing the body of a function.
|
||
* Function Declarations:: Declaring the interface of a function.
|
||
* Function Calls:: Using functions.
|
||
* Function Call Semantics:: Call-by-value argument passing.
|
||
* Function Pointers:: Using references to functions.
|
||
* The main Function:: Where execution of a GNU C program begins.
|
||
* Advanced Definitions:: Advanced features of function definitions.
|
||
* Obsolete Definitions:: Obsolete features still used
|
||
in function definitions in old code.
|
||
@end menu
|
||
|
||
@node Function Definitions
|
||
@section Function Definitions
|
||
@cindex function definitions
|
||
@cindex defining functions
|
||
|
||
We have already presented many examples of function definitions. To
|
||
summarize the rules, a function definition looks like this:
|
||
|
||
@example
|
||
@var{returntype}
|
||
@var{functionname} (@var{parm_declarations}@r{@dots{}})
|
||
@{
|
||
@var{body}
|
||
@}
|
||
@end example
|
||
|
||
The part before the open-brace is called the @dfn{function header}.
|
||
|
||
Write @code{void} as the @var{returntype} if the function does
|
||
not return a value.
|
||
|
||
@menu
|
||
* Function Parameter Variables:: Syntax and semantics
|
||
of function parameters.
|
||
* Forward Function Declarations:: Functions can only be called after
|
||
they have been defined or declared.
|
||
* Static Functions:: Limiting visibility of a function.
|
||
* Arrays as Parameters:: Functions that accept array arguments.
|
||
* Structs as Parameters:: Functions that accept structure arguments.
|
||
@end menu
|
||
|
||
@node Function Parameter Variables
|
||
@subsection Function Parameter Variables
|
||
@cindex function parameter variables
|
||
@cindex parameter variables in functions
|
||
@cindex parameter list
|
||
|
||
A function parameter variable is a local variable (@pxref{Local
|
||
Variables}) used within the function to store the value passed as an
|
||
argument in a call to the function. Usually we say ``function
|
||
parameter'' or ``parameter'' for short, not mentioning the fact that
|
||
it's a variable.
|
||
|
||
We declare these variables in the beginning of the function
|
||
definition, in the @dfn{parameter list}. For example,
|
||
|
||
@example
|
||
fib (int n)
|
||
@end example
|
||
|
||
@noindent
|
||
has a parameter list with one function parameter @code{n}, which has
|
||
type @code{int}.
|
||
|
||
Function parameter declarations differ from ordinary variable
|
||
declarations in several ways:
|
||
|
||
@itemize @bullet
|
||
@item
|
||
Inside the function definition header, commas separate parameter
|
||
declarations, and each parameter needs a complete declaration
|
||
including the type. For instance, if a function @code{foo} has two
|
||
@code{int} parameters, write this:
|
||
|
||
@example
|
||
foo (int a, int b)
|
||
@end example
|
||
|
||
You can't share the common @code{int} between the two declarations:
|
||
|
||
@example
|
||
foo (int a, b) /* @r{Invalid!} */
|
||
@end example
|
||
|
||
@item
|
||
A function parameter variable is initialized to whatever value is
|
||
passed in the function call, so its declaration cannot specify an
|
||
initial value.
|
||
|
||
@item
|
||
Writing an array type in a function parameter declaration has the
|
||
effect of declaring it as a pointer. The size specified for the array
|
||
has no effect at all, and we normally omit the size. Thus,
|
||
|
||
@example
|
||
foo (int a[5])
|
||
foo (int a[])
|
||
foo (int *a)
|
||
@end example
|
||
|
||
@noindent
|
||
are equivalent.
|
||
|
||
@item
|
||
The scope of the parameter variables is the entire function body,
|
||
notwithstanding the fact that they are written in the function header,
|
||
which is just outside the function body.
|
||
@end itemize
|
||
|
||
If a function has no parameters, it would be most natural for the
|
||
list of parameters in its definition to be empty. But that, in C, has
|
||
a special meaning for historical reasons: ``Do not check that calls to
|
||
this function have the right number of arguments.'' Thus,
|
||
|
||
@example
|
||
int
|
||
foo ()
|
||
@{
|
||
return 5;
|
||
@}
|
||
|
||
int
|
||
bar (int x)
|
||
@{
|
||
return foo (x);
|
||
@}
|
||
@end example
|
||
|
||
@noindent
|
||
would not report a compilation error in passing @code{x} as an
|
||
argument to @code{foo}. By contrast,
|
||
|
||
@example
|
||
int
|
||
foo (void)
|
||
@{
|
||
return 5;
|
||
@}
|
||
|
||
int
|
||
bar (int x)
|
||
@{
|
||
return foo (x);
|
||
@}
|
||
@end example
|
||
|
||
@noindent
|
||
would report an error because @code{foo} is supposed to receive
|
||
no arguments.
|
||
|
||
@node Forward Function Declarations
|
||
@subsection Forward Function Declarations
|
||
@cindex forward function declarations
|
||
@cindex function declarations, forward
|
||
|
||
The order of the function definitions in the source code makes no
|
||
difference, except that each function needs to be defined or declared
|
||
before code uses it.
|
||
|
||
The definition of a function also declares its name for the rest of
|
||
the containing scope. But what if you want to call the function
|
||
before its definition? To permit that, write a compatible declaration
|
||
of the same function, before the first call. A declaration that
|
||
prefigures a subsequent definition in this way is called a
|
||
@dfn{forward declaration}. The function declaration can be at top
|
||
@c ??? file scope
|
||
level or within a block, and it applies until the end of the containing
|
||
scope.
|
||
|
||
@xref{Function Declarations}, for more information about these
|
||
declarations.
|
||
|
||
@node Static Functions
|
||
@subsection Static Functions
|
||
@cindex static functions
|
||
@cindex functions, static
|
||
@findex static
|
||
|
||
The keyword @code{static} in a function definition limits the
|
||
visibility of the name to the current compilation module. (That's the
|
||
same thing @code{static} does in variable declarations;
|
||
@pxref{File-Scope Variables}.) For instance, if one compilation module
|
||
contains this code:
|
||
|
||
@example
|
||
static int
|
||
foo (void)
|
||
@{
|
||
@r{@dots{}}
|
||
@}
|
||
@end example
|
||
|
||
@noindent
|
||
then the code of that compilation module can call @code{foo} anywhere
|
||
after the definition, but other compilation modules cannot refer to it
|
||
at all.
|
||
|
||
@cindex forward declaration
|
||
@cindex static function, declaration
|
||
To call @code{foo} before its definition, it needs a forward
|
||
declaration, which should use @code{static} since the function
|
||
definition does. For this function, it looks like this:
|
||
|
||
@example
|
||
static int foo (void);
|
||
@end example
|
||
|
||
It is generally wise to use @code{static} on the definitions of
|
||
functions that won't be called from outside the same compilation
|
||
module. This makes sure that calls are not added in other modules.
|
||
If programmers decide to change the function's calling convention, or
|
||
understand all the consequences of its use, they will only have to
|
||
check for calls in the same compilation module.
|
||
|
||
@node Arrays as Parameters
|
||
@subsection Arrays as Parameters
|
||
@cindex arrays as parameters
|
||
@cindex functions with array parameters
|
||
|
||
Arrays in C are not first-class objects: it is impossible to copy
|
||
them. So they cannot be passed as arguments like other values.
|
||
@xref{Limitations of C Arrays}. Rather, array parameters work in
|
||
a special way.
|
||
|
||
@menu
|
||
* Array Params are Ptrs::
|
||
* Passing Array Args::
|
||
* Array Parm Qualifiers::
|
||
@end menu
|
||
|
||
@node Array Params are Ptrs
|
||
@subsubsection Array parameters are pointers
|
||
|
||
Declaring a function parameter variable as an array really gives it a
|
||
pointer type. C does this because an expression with array type, if
|
||
used as an argument in a function call, is converted automatically to
|
||
a pointer (to the zeroth element of the array). If you declare the
|
||
corresponding parameter as an ``array'', it will work correctly with
|
||
the pointer value that really gets passed.
|
||
|
||
This relates to the fact that C does not check array bounds in access
|
||
to elements of the array (@pxref{Accessing Array Elements}).
|
||
|
||
For example, in this function,
|
||
|
||
@example
|
||
void
|
||
clobber4 (int array[20])
|
||
@{
|
||
array[4] = 0;
|
||
@}
|
||
@end example
|
||
|
||
@noindent
|
||
the parameter @code{array}'s real type is @code{int *}; the specified
|
||
length, 20, has no effect on the program. You can leave out the length
|
||
and write this:
|
||
|
||
@example
|
||
void
|
||
clobber4 (int array[])
|
||
@{
|
||
array[4] = 0;
|
||
@}
|
||
@end example
|
||
|
||
@noindent
|
||
or write the parameter declaration explicitly as a pointer:
|
||
|
||
@example
|
||
void
|
||
clobber4 (int *array)
|
||
@{
|
||
array[4] = 0;
|
||
@}
|
||
@end example
|
||
|
||
They are all equivalent.
|
||
|
||
@node Passing Array Args
|
||
@subsubsection Passing array arguments
|
||
|
||
The function call passes this pointer by
|
||
value, like all argument values in C@. However, the result is
|
||
paradoxical in that the array itself is passed by reference: its
|
||
contents are treated as shared memory---shared between the caller and
|
||
the called function, that is. When @code{clobber4} assigns to element
|
||
4 of @code{array}, the effect is to alter element 4 of the array
|
||
specified in the call.
|
||
|
||
@example
|
||
#include <stddef.h> /* @r{Defines @code{NULL}.} */
|
||
#include <stdlib.h> /* @r{Declares @code{malloc},} */
|
||
/* @r{Defines @code{EXIT_SUCCESS}.} */
|
||
|
||
int
|
||
main (void)
|
||
@{
|
||
int data[] = @{1, 2, 3, 4, 5, 6@};
|
||
int i;
|
||
|
||
/* @r{Show the initial value of element 4.} */
|
||
for (i = 0; i < 6; i++)
|
||
printf ("data[%d] = %d\n", i, data[i]);
|
||
|
||
printf ("\n");
|
||
|
||
clobber4 (data);
|
||
|
||
/* @r{Show that element 4 has been changed.} */
|
||
for (i = 0; i < 6; i++)
|
||
printf ("data[%d] = %d\n", i, data[i]);
|
||
|
||
printf ("\n");
|
||
|
||
return EXIT_SUCCESS;
|
||
@}
|
||
@end example
|
||
|
||
@noindent
|
||
shows that @code{data[4]} has become zero after the call to
|
||
@code{clobber4}.
|
||
|
||
The array @code{data} has 6 elements, but passing it to a function
|
||
whose argument type is written as @code{int [20]} is not an error,
|
||
because that really stands for @code{int *}. The pointer that is the
|
||
real argument carries no indication of the length of the array it
|
||
points into. It is not required to point to the beginning of the
|
||
array, either. For instance,
|
||
|
||
@example
|
||
clobber4 (data+1);
|
||
@end example
|
||
|
||
@noindent
|
||
passes an ``array'' that starts at element 1 of @code{data}, and the
|
||
effect is to zero @code{data[5]} instead of @code{data[4]}.
|
||
|
||
If all calls to the function will provide an array of a particular
|
||
size, you can specify the size of the array to be @code{static}:
|
||
|
||
@example
|
||
void
|
||
clobber4 (int array[static 20])
|
||
@r{@dots{}}
|
||
@end example
|
||
|
||
@noindent
|
||
This is a promise to the compiler that the function will always be
|
||
called with an array of 20 elements, so that the compiler can optimize
|
||
code accordingly. If the code breaks this promise and calls the
|
||
function with, for example, a shorter array, unpredictable things may
|
||
happen.
|
||
|
||
@node Array Parm Qualifiers
|
||
@subsubsection Type qualifiers on array parameters
|
||
|
||
You can use the type qualifiers @code{const}, @code{restrict}, and
|
||
@code{volatile} with array parameters; for example:
|
||
|
||
@example
|
||
void
|
||
clobber4 (volatile int array[20])
|
||
@r{@dots{}}
|
||
@end example
|
||
|
||
@noindent
|
||
denotes that @code{array} is equivalent to a pointer to a volatile
|
||
@code{int}. Alternatively:
|
||
|
||
@example
|
||
void
|
||
clobber4 (int array[const 20])
|
||
@r{@dots{}}
|
||
@end example
|
||
|
||
@noindent
|
||
makes the array parameter equivalent to a constant pointer to an
|
||
@code{int}. If we want the @code{clobber4} function to succeed, it
|
||
would not make sense to write
|
||
|
||
@example
|
||
void
|
||
clobber4 (const int array[20])
|
||
@r{@dots{}}
|
||
@end example
|
||
|
||
@noindent
|
||
as this would tell the compiler that the parameter should point to an
|
||
array of constant @code{int} values, and then we would not be able to
|
||
store zeros in them.
|
||
|
||
In a function with multiple array parameters, you can use @code{restrict}
|
||
to tell the compiler that each array parameter passed in will be distinct:
|
||
|
||
@example
|
||
void
|
||
foo (int array1[restrict 10], int array2[restrict 10])
|
||
@r{@dots{}}
|
||
@end example
|
||
|
||
@noindent
|
||
Using @code{restrict} promises the compiler that callers will
|
||
not pass in the same array for more than one @code{restrict} array
|
||
parameter. Knowing this enables the compiler to perform better code
|
||
optimization. This is the same effect as using @code{restrict}
|
||
pointers (@pxref{restrict Pointers}), but makes it clear when reading
|
||
the code that an array of a specific size is expected.
|
||
|
||
@node Structs as Parameters
|
||
@subsection Functions That Accept Structure Arguments
|
||
|
||
Structures in GNU C are first-class objects, so using them as function
|
||
parameters and arguments works in the natural way. This function
|
||
@code{swapfoo} takes a @code{struct foo} with two fields as argument,
|
||
and returns a structure of the same type but with the fields
|
||
exchanged.
|
||
|
||
@example
|
||
struct foo @{ int a, b; @};
|
||
|
||
struct foo x;
|
||
|
||
struct foo
|
||
swapfoo (struct foo inval)
|
||
@{
|
||
struct foo outval;
|
||
outval.a = inval.b;
|
||
outval.b = inval.a;
|
||
return outval;
|
||
@}
|
||
@end example
|
||
|
||
This simpler definition of @code{swapfoo} avoids using a local
|
||
variable to hold the result about to be return, by using a structure
|
||
constructor (@pxref{Structure Constructors}), like this:
|
||
|
||
@example
|
||
struct foo
|
||
swapfoo (struct foo inval)
|
||
@{
|
||
return (struct foo) @{ inval.b, inval.a @};
|
||
@}
|
||
@end example
|
||
|
||
It is valid to define a structure type in a function's parameter list,
|
||
as in
|
||
|
||
@example
|
||
int
|
||
frob_bar (struct bar @{ int a, b; @} inval)
|
||
@{
|
||
@var{body}
|
||
@}
|
||
@end example
|
||
|
||
@noindent
|
||
and @var{body} can access the fields of @var{inval} since the
|
||
structure type @code{struct bar} is defined for the whole function
|
||
body. However, there is no way to create a @code{struct bar} argument
|
||
to pass to @code{frob_bar}, except with kludges. As a result,
|
||
defining a structure type in a parameter list is useless in practice.
|
||
|
||
@node Function Declarations
|
||
@section Function Declarations
|
||
@cindex function declarations
|
||
@cindex declararing functions
|
||
|
||
To call a function, or use its name as a pointer, a @dfn{function
|
||
declaration} for the function name must be in effect at that point in
|
||
the code. The function's definition serves as a declaration of that
|
||
function for the rest of the containing scope, but to use the function
|
||
in code before the definition, or from another compilation module, a
|
||
separate function declaration must precede the use.
|
||
|
||
A function declaration looks like the start of a function definition.
|
||
It begins with the return value type (@code{void} if none) and the
|
||
function name, followed by argument declarations in parentheses
|
||
(though these can sometimes be omitted). But that's as far as the
|
||
similarity goes: instead of the function body, the declaration uses a
|
||
semicolon.
|
||
|
||
@cindex function prototype
|
||
@cindex prototype of a function
|
||
A declaration that specifies argument types is called a @dfn{function
|
||
prototype}. You can include the argument names or omit them. The
|
||
names, if included in the declaration, have no effect, but they may
|
||
serve as documentation.
|
||
|
||
This form of prototype specifies fixed argument types:
|
||
|
||
@example
|
||
@var{rettype} @var{function} (@var{argtypes}@r{@dots{}});
|
||
@end example
|
||
|
||
@noindent
|
||
This form says the function takes no arguments:
|
||
|
||
@example
|
||
@var{rettype} @var{function} (void);
|
||
@end example
|
||
|
||
@noindent
|
||
This form declares types for some arguments, and allows additional
|
||
arguments whose types are not specified:
|
||
|
||
@example
|
||
@var{rettype} @var{function} (@var{argtypes}@r{@dots{}}, ...);
|
||
@end example
|
||
|
||
For a parameter that's an array of variable length, you can write
|
||
its declaration with @samp{*} where the ``length'' of the array would
|
||
normally go; for example, these are all equivalent.
|
||
|
||
@example
|
||
double maximum (int n, int m, double a[n][m]);
|
||
double maximum (int n, int m, double a[*][*]);
|
||
double maximum (int n, int m, double a[ ][*]);
|
||
double maximum (int n, int m, double a[ ][m]);
|
||
@end example
|
||
|
||
@noindent
|
||
The old-fashioned form of declaration, which is not a prototype, says
|
||
nothing about the types of arguments or how many they should be:
|
||
|
||
@example
|
||
@var{rettype} @var{function} ();
|
||
@end example
|
||
|
||
@strong{Warning:} Arguments passed to a function declared without a
|
||
prototype are converted with the default argument promotions
|
||
(@pxref{Argument Promotions}. Likewise for additional arguments whose
|
||
types are unspecified.
|
||
|
||
Function declarations are usually written at the top level in a source file,
|
||
but you can also put them inside code blocks. Then the function name
|
||
is visible for the rest of the containing scope. For example:
|
||
|
||
@example
|
||
void
|
||
foo (char *file_name)
|
||
@{
|
||
void save_file (char *);
|
||
save_file (file_name);
|
||
@}
|
||
@end example
|
||
|
||
If another part of the code tries to call the function
|
||
@code{save_file}, this declaration won't be in effect there. So the
|
||
function will get an implicit declaration of the form @code{extern int
|
||
save_file ();}. That conflicts with the explicit declaration
|
||
here, and the discrepancy generates a warning.
|
||
|
||
The syntax of C traditionally allows omitting the data type in a
|
||
function declaration if it specifies a storage class or a qualifier.
|
||
Then the type defaults to @code{int}. For example:
|
||
|
||
@example
|
||
static foo (double x);
|
||
@end example
|
||
|
||
@noindent
|
||
defaults the return type to @code{int}.
|
||
This is bad practice; if you see it, fix it.
|
||
|
||
Calling a function that is undeclared has the effect of creating
|
||
an @dfn{implicit} declaration in the innermost containing scope,
|
||
equivalent to this:
|
||
|
||
@example
|
||
extern int @dfn{function} ();
|
||
@end example
|
||
|
||
@noindent
|
||
This declaration says that the function returns @code{int} but leaves
|
||
its argument types unspecified. If that does not accurately fit the
|
||
function, then the program @strong{needs} an explicit declaration of
|
||
the function with argument types in order to call it correctly.
|
||
|
||
Implicit declarations are deprecated, and a function call that creates one
|
||
causes a warning.
|
||
|
||
@node Function Calls
|
||
@section Function Calls
|
||
@cindex function calls
|
||
@cindex calling functions
|
||
|
||
Starting a program automatically calls the function named @code{main}
|
||
(@pxref{The main Function}). Aside from that, a function does nothing
|
||
except when it is @dfn{called}. That occurs during the execution of a
|
||
function-call expression specifying that function.
|
||
|
||
A function-call expression looks like this:
|
||
|
||
@example
|
||
@var{function} (@var{arguments}@r{@dots{}})
|
||
@end example
|
||
|
||
Most of the time, @var{function} is a function name. However, it can
|
||
also be an expression with a function pointer value; that way, the
|
||
program can determine at run time which function to call.
|
||
|
||
The @var{arguments} are a series of expressions separated by commas.
|
||
Each expression specifies one argument to pass to the function.
|
||
|
||
The list of arguments in a function call looks just like use of the
|
||
comma operator (@pxref{Comma Operator}), but the fact that it fills
|
||
the parentheses of a function call gives it a different meaning.
|
||
|
||
Here's an example of a function call, taken from an example near the
|
||
beginning (@pxref{Complete Program}).
|
||
|
||
@example
|
||
printf ("Fibonacci series item %d is %d\n",
|
||
19, fib (19));
|
||
@end example
|
||
|
||
The three arguments given to @code{printf} are a constant string, the
|
||
integer 19, and the integer returned by @code{fib (19)}.
|
||
|
||
@node Function Call Semantics
|
||
@section Function Call Semantics
|
||
@cindex function call semantics
|
||
@cindex semantics of function calls
|
||
@cindex call-by-value
|
||
|
||
The meaning of a function call is to compute the specified argument
|
||
expressions, convert their values according to the function's
|
||
declaration, then run the function giving it copies of the converted
|
||
values. (This method of argument passing is known as
|
||
@dfn{call-by-value}.) When the function finishes, the value it
|
||
returns becomes the value of the function-call expression.
|
||
|
||
Call-by-value implies that an assignment to the function argument
|
||
variable has no direct effect on the caller. For instance,
|
||
|
||
@example
|
||
#include <stdlib.h> /* @r{Defines @code{EXIT_SUCCESS}.} */
|
||
#include <stdio.h> /* @r{Declares @code{printf}.} */
|
||
|
||
void
|
||
subroutine (int x)
|
||
@{
|
||
x = 5;
|
||
@}
|
||
|
||
void
|
||
main (void)
|
||
@{
|
||
int y = 20;
|
||
subroutine (y);
|
||
printf ("y is %d\n", y);
|
||
return EXIT_SUCCESS;
|
||
@}
|
||
@end example
|
||
|
||
@noindent
|
||
prints @samp{y is 20}. Calling @code{subroutine} initializes @code{x}
|
||
from the value of @code{y}, but this does not establish any other
|
||
relationship between the two variables. Thus, the assignment to
|
||
@code{x}, inside @code{subroutine}, changes only @emph{that} @code{x}.
|
||
|
||
If an argument's type is specified by the function's declaration, the
|
||
function call converts the argument expression to that type if
|
||
possible. If the conversion is impossible, that is an error.
|
||
|
||
If the function's declaration doesn't specify the type of that
|
||
argument, then the @emph{default argument promotions} apply.
|
||
@xref{Argument Promotions}.
|
||
|
||
@node Function Pointers
|
||
@section Function Pointers
|
||
@cindex function pointers
|
||
@cindex pointers to functions
|
||
|
||
A function name refers to a fixed function. Sometimes it is useful to
|
||
call a function to be determined at run time; to do this, you can use
|
||
a @dfn{function pointer value} that points to the chosen function
|
||
(@pxref{Pointers}).
|
||
|
||
Pointer-to-function types can be used to declare variables and other
|
||
data, including array elements, structure fields, and union
|
||
alternatives. They can also be used for function arguments and return
|
||
values. These types have the peculiarity that they are never
|
||
converted automatically to @code{void *} or vice versa. However, you
|
||
can do that conversion with a cast.
|
||
|
||
@menu
|
||
* Declaring Function Pointers:: How to declare a pointer to a function.
|
||
* Assigning Function Pointers:: How to assign values to function pointers.
|
||
* Calling Function Pointers:: How to call functions through pointers.
|
||
@end menu
|
||
|
||
@node Declaring Function Pointers
|
||
@subsection Declaring Function Pointers
|
||
@cindex declaring function pointers
|
||
@cindex function pointers, declaring
|
||
|
||
The declaration of a function pointer variable (or structure field)
|
||
looks almost like a function declaration, except it has an additional
|
||
@samp{*} just before the variable name. Proper nesting requires a
|
||
pair of parentheses around the two of them. For instance, @code{int
|
||
(*a) ();} says, ``Declare @code{a} as a pointer such that @code{*a} is
|
||
an @code{int}-returning function.''
|
||
|
||
Contrast these three declarations:
|
||
|
||
@example
|
||
/* @r{Declare a function returning @code{char *}.} */
|
||
char *a (char *);
|
||
/* @r{Declare a pointer to a function returning @code{char}.} */
|
||
char (*a) (char *);
|
||
/* @r{Declare a pointer to a function returning @code{char *}.} */
|
||
char *(*a) (char *);
|
||
@end example
|
||
|
||
The possible argument types of the function pointed to are the same
|
||
as in a function declaration. You can write a prototype
|
||
that specifies all the argument types:
|
||
|
||
@example
|
||
@var{rettype} (*@var{function}) (@var{arguments}@r{@dots{}});
|
||
@end example
|
||
|
||
@noindent
|
||
or one that specifies some and leaves the rest unspecified:
|
||
|
||
@example
|
||
@var{rettype} (*@var{function}) (@var{arguments}@r{@dots{}}, ...);
|
||
@end example
|
||
|
||
@noindent
|
||
or one that says there are no arguments:
|
||
|
||
@example
|
||
@var{rettype} (*@var{function}) (void);
|
||
@end example
|
||
|
||
You can also write a non-prototype declaration that says
|
||
nothing about the argument types:
|
||
|
||
@example
|
||
@var{rettype} (*@var{function}) ();
|
||
@end example
|
||
|
||
For example, here's a declaration for a variable that should
|
||
point to some arithmetic function that operates on two @code{double}s:
|
||
|
||
@example
|
||
double (*binary_op) (double, double);
|
||
@end example
|
||
|
||
Structure fields, union alternatives, and array elements can be
|
||
function pointers; so can parameter variables. The function pointer
|
||
declaration construct can also be combined with other operators
|
||
allowed in declarations. For instance,
|
||
|
||
@example
|
||
int **(*foo)();
|
||
@end example
|
||
|
||
@noindent
|
||
declares @code{foo} as a pointer to a function that returns
|
||
type @code{int **}, and
|
||
|
||
@example
|
||
int **(*foo[30])();
|
||
@end example
|
||
|
||
@noindent
|
||
declares @code{foo} as an array of 30 pointers to functions that
|
||
return type @code{int **}.
|
||
|
||
@example
|
||
int **(**foo)();
|
||
@end example
|
||
|
||
@noindent
|
||
declares @code{foo} as a pointer to a pointer to a function that
|
||
returns type @code{int **}.
|
||
|
||
@node Assigning Function Pointers
|
||
@subsection Assigning Function Pointers
|
||
@cindex assigning function pointers
|
||
@cindex function pointers, assigning
|
||
|
||
Assuming we have declared the variable @code{binary_op} as in the
|
||
previous section, giving it a value requires a suitable function to
|
||
use. So let's define a function suitable for the variable to point
|
||
to. Here's one:
|
||
|
||
@example
|
||
double
|
||
double_add (double a, double b)
|
||
@{
|
||
return a+b;
|
||
@}
|
||
@end example
|
||
|
||
Now we can give it a value:
|
||
|
||
@example
|
||
binary_op = double_add;
|
||
@end example
|
||
|
||
The target type of the function pointer must be upward compatible with
|
||
the type of the function (@pxref{Compatible Types}).
|
||
|
||
There is no need for @samp{&} in front of @code{double_add}.
|
||
Using a function name such as @code{double_add} as an expression
|
||
automatically converts it to the function's address, with the
|
||
appropriate function pointer type. However, it is ok to use
|
||
@samp{&} if you feel that is clearer:
|
||
|
||
@example
|
||
binary_op = &double_add;
|
||
@end example
|
||
|
||
@node Calling Function Pointers
|
||
@subsection Calling Function Pointers
|
||
@cindex calling function pointers
|
||
@cindex function pointers, calling
|
||
|
||
To call the function specified by a function pointer, just write the
|
||
function pointer value in a function call. For instance, here's a
|
||
call to the function @code{binary_op} points to:
|
||
|
||
@example
|
||
binary_op (x, 5)
|
||
@end example
|
||
|
||
Since the data type of @code{binary_op} explicitly specifies type
|
||
@code{double} for the arguments, the call converts @code{x} and 5 to
|
||
@code{double}.
|
||
|
||
The call conceptually dereferences the pointer @code{binary_op} to
|
||
``get'' the function it points to, and calls that function. If you
|
||
wish, you can explicitly represent the dereference by writing the
|
||
@code{*} operator:
|
||
|
||
@example
|
||
(*binary_op) (x, 5)
|
||
@end example
|
||
|
||
The @samp{*} reminds people reading the code that @code{binary_op} is
|
||
a function pointer rather than the name of a specific function.
|
||
|
||
@node The main Function
|
||
@section The @code{main} Function
|
||
@cindex @code{main} function
|
||
@findex main
|
||
|
||
Every complete executable program requires at least one function,
|
||
called @code{main}, which is where execution begins. You do not have
|
||
to explicitly declare @code{main}, though GNU C permits you to do so.
|
||
Conventionally, @code{main} should be defined to follow one of these
|
||
calling conventions:
|
||
|
||
@example
|
||
int main (void) @{@r{@dots{}}@}
|
||
int main (int argc, char *argv[]) @{@r{@dots{}}@}
|
||
int main (int argc, char *argv[], char *envp[]) @{@r{@dots{}}@}
|
||
@end example
|
||
|
||
@noindent
|
||
Using @code{void} as the parameter list means that @code{main} does
|
||
not use the arguments. You can write @code{char **argv} instead of
|
||
@code{char *argv[]}, and likewise for @code{envp}, as the two
|
||
constructs are equivalent.
|
||
|
||
@ignore @c Not so at present
|
||
Defining @code{main} in any other way generates a warning. Your
|
||
program will still compile, but you may get unexpected results when
|
||
executing it.
|
||
@end ignore
|
||
|
||
You can call @code{main} from C code, as you can call any other
|
||
function, though that is an unusual thing to do. When you do that,
|
||
you must write the call to pass arguments that match the parameters in
|
||
the definition of @code{main}.
|
||
|
||
The @code{main} function is not actually the first code that runs when
|
||
a program starts. In fact, the first code that runs is system code
|
||
from the file @file{crt0.o}. In Unix, this was hand-written assembler
|
||
code, but in GNU we replaced it with C code. Its job is to find
|
||
the arguments for @code{main} and call that.
|
||
|
||
@menu
|
||
* Values from main:: Returning values from the main function.
|
||
* Command-Line Parameters:: Accessing command-line parameters
|
||
provided to the program.
|
||
* Environment Variables:: Accessing system environment variables.
|
||
@end menu
|
||
|
||
@node Values from main
|
||
@subsection Returning Values from @code{main}
|
||
@cindex returning values from @code{main}
|
||
@cindex success
|
||
@cindex failure
|
||
@cindex exit status
|
||
|
||
When @code{main} returns, the process terminates. Whatever value
|
||
@code{main} returns becomes the exit status which is reported to the
|
||
parent process. While nominally the return value is of type
|
||
@code{int}, in fact the exit status gets truncated to eight bits; if
|
||
@code{main} returns the value 256, the exit status is 0.
|
||
|
||
Normally, programs return only one of two values: 0 for success,
|
||
and 1 for failure. For maximum portability, use the macro
|
||
values @code{EXIT_SUCCESS} and @code{EXIT_FAILURE} defined in
|
||
@code{stdlib.h}. Here's an example:
|
||
|
||
@cindex @code{EXIT_FAILURE}
|
||
@cindex @code{EXIT_SUCCESS}
|
||
@example
|
||
#include <stdlib.h> /* @r{Defines @code{EXIT_SUCCESS}} */
|
||
/* @r{and @code{EXIT_FAILURE}.} */
|
||
|
||
int
|
||
main (void)
|
||
@{
|
||
@r{@dots{}}
|
||
if (foo)
|
||
return EXIT_SUCCESS;
|
||
else
|
||
return EXIT_FAILURE;
|
||
@}
|
||
@end example
|
||
|
||
Some types of programs maintain special conventions for various return
|
||
values; for example, comparison programs including @code{cmp} and
|
||
@code{diff} return 1 to indicate a mismatch, and 2 to indicate that
|
||
the comparison couldn't be performed.
|
||
|
||
@node Command-Line Parameters
|
||
@subsection Accessing Command-Line Parameters
|
||
@cindex command-line parameters
|
||
@cindex parameters, command-line
|
||
|
||
If the program was invoked with any command-line arguments, it can
|
||
access them through the arguments of @code{main}, @code{argc} and
|
||
@code{argv}. (You can give these arguments any names, but the names
|
||
@code{argc} and @code{argv} are customary.)
|
||
|
||
The value of @code{argv} is an array containing all of the
|
||
command-line arguments as strings, with the name of the command
|
||
invoked as the first string. @code{argc} is an integer that says how
|
||
many strings @code{argv} contains. Here is an example of accessing
|
||
the command-line parameters, retrieving the program's name and
|
||
checking for the standard @option{--version} and @option{--help} options:
|
||
|
||
@example
|
||
#include <string.h> /* @r{Declare @code{strcmp}.} */
|
||
|
||
int
|
||
main (int argc, char *argv[])
|
||
@{
|
||
char *program_name = argv[0];
|
||
|
||
for (int i = 1; i < argc; i++)
|
||
@{
|
||
if (!strcmp (argv[i], "--version"))
|
||
@{
|
||
/* @r{Print version information and exit.} */
|
||
@r{@dots{}}
|
||
@}
|
||
else if (!strcmp (argv[i], "--help"))
|
||
@{
|
||
/* @r{Print help information and exit.} */
|
||
@r{@dots{}}
|
||
@}
|
||
@}
|
||
@r{@dots{}}
|
||
@}
|
||
@end example
|
||
|
||
@node Environment Variables
|
||
@subsection Accessing Environment Variables
|
||
@cindex environment variables
|
||
|
||
You can optionally include a third parameter to @code{main}, another
|
||
array of strings, to capture the environment variables available to
|
||
the program. Unlike what happens with @code{argv}, there is no
|
||
additional parameter for the count of environment variables; rather,
|
||
the array of environment variables concludes with a null pointer.
|
||
|
||
@example
|
||
#include <stdio.h> /* @r{Declares @code{printf}.} */
|
||
|
||
int
|
||
main (int argc, char *argv[], char *envp[])
|
||
@{
|
||
/* @r{Print out all environment variables.} */
|
||
int i = 0;
|
||
while (envp[i])
|
||
@{
|
||
printf ("%s\n", envp[i]);
|
||
i++;
|
||
@}
|
||
@}
|
||
@end example
|
||
|
||
Another method of retrieving environment variables is to use the
|
||
library function @code{getenv}, which is defined in @code{stdlib.h}.
|
||
Using @code{getenv} does not require defining @code{main} to accept the
|
||
@code{envp} pointer. For example, here is a program that fetches and prints
|
||
the user's home directory (if defined):
|
||
|
||
@example
|
||
#include <stdlib.h> /* @r{Declares @code{getenv}.} */
|
||
#include <stdio.h> /* @r{Declares @code{printf}.} */
|
||
|
||
int
|
||
main (void)
|
||
@{
|
||
char *home_directory = getenv ("HOME");
|
||
if (home_directory)
|
||
printf ("My home directory is: %s\n", home_directory);
|
||
else
|
||
printf ("My home directory is not defined!\n");
|
||
@}
|
||
@end example
|
||
|
||
@node Advanced Definitions
|
||
@section Advanced Function Features
|
||
|
||
This section describes some advanced or obscure features for GNU C
|
||
function definitions. If you are just learning C, you can skip the
|
||
rest of this chapter.
|
||
|
||
@menu
|
||
* Variable-Length Array Parameters:: Functions that accept arrays
|
||
of variable length.
|
||
* Variable Number of Arguments:: Variadic functions.
|
||
* Nested Functions:: Defining functions within functions.
|
||
* Inline Function Definitions:: A function call optimization technique.
|
||
@end menu
|
||
|
||
@node Variable-Length Array Parameters
|
||
@subsection Variable-Length Array Parameters
|
||
@cindex variable-length array parameters
|
||
@cindex array parameters, variable-length
|
||
@cindex functions that accept variable-length arrays
|
||
|
||
An array parameter can have variable length: simply declare the array
|
||
type with a size that isn't constant. In a nested function, the
|
||
length can refer to a variable defined in a containing scope. In any
|
||
function, it can refer to a previous parameter, like this:
|
||
|
||
@example
|
||
struct entry
|
||
tester (int len, char data[len][len])
|
||
@{
|
||
@r{@dots{}}
|
||
@}
|
||
@end example
|
||
|
||
Alternatively, in function declarations (but not in function
|
||
definitions), you can use @code{[*]} to denote that the array
|
||
parameter is of a variable length, such that these two declarations
|
||
mean the same thing:
|
||
|
||
@example
|
||
struct entry
|
||
tester (int len, char data[len][len]);
|
||
@end example
|
||
|
||
@example
|
||
struct entry
|
||
tester (int len, char data[*][*]);
|
||
@end example
|
||
|
||
@noindent
|
||
The two forms of input are equivalent in GNU C, but emphasizing that
|
||
the array parameter is variable-length may be helpful to those
|
||
studying the code.
|
||
|
||
You can also omit the length parameter, and instead use some other
|
||
in-scope variable for the length in the function definition:
|
||
|
||
@example
|
||
struct entry
|
||
tester (char data[*][*]);
|
||
@r{@dots{}}
|
||
int data_length = 20;
|
||
@r{@dots{}}
|
||
struct entry
|
||
tester (char data[data_length][data_length])
|
||
@{
|
||
@r{@dots{}}
|
||
@}
|
||
@end example
|
||
|
||
@c ??? check text above
|
||
|
||
@cindex parameter forward declaration
|
||
In GNU C, to pass the array first and the length afterward, you can
|
||
use a @dfn{parameter forward declaration}, like this:
|
||
|
||
@example
|
||
struct entry
|
||
tester (int len; char data[len][len], int len)
|
||
@{
|
||
@r{@dots{}}
|
||
@}
|
||
@end example
|
||
|
||
The @samp{int len} before the semicolon is the parameter forward
|
||
declaration; it serves the purpose of making the name @code{len} known
|
||
when the declaration of @code{data} is parsed.
|
||
|
||
You can write any number of such parameter forward declarations in the
|
||
parameter list. They can be separated by commas or semicolons, but
|
||
the last one must end with a semicolon, which is followed by the
|
||
``real'' parameter declarations. Each forward declaration must match
|
||
a subsequent ``real'' declaration in parameter name and data type.
|
||
|
||
Standard C does not support parameter forward declarations.
|
||
|
||
@node Variable Number of Arguments
|
||
@subsection Variable-Length Parameter Lists
|
||
@cindex variable-length parameter lists
|
||
@cindex parameters lists, variable length
|
||
@cindex function parameter lists, variable length
|
||
|
||
@cindex variadic function
|
||
A function that takes a variable number of arguments is called a
|
||
@dfn{variadic function}. In C, a variadic function must specify at
|
||
least one fixed argument with an explicitly declared data type.
|
||
Additional arguments can follow, and can vary in both quantity and
|
||
data type.
|
||
|
||
In the function header, declare the fixed parameters in the normal
|
||
way, then write a comma and an ellipsis: @samp{, ...}. Here is an
|
||
example of a variadic function header:
|
||
|
||
@example
|
||
int add_multiple_values (int number, ...)
|
||
@end example
|
||
|
||
@cindex @code{va_list}
|
||
@cindex @code{va_start}
|
||
@cindex @code{va_end}
|
||
The function body can refer to fixed arguments by their parameter
|
||
names, but the additional arguments have no names. Accessing them in
|
||
the function body uses certain standard macros. They are defined in
|
||
the library header file @file{stdarg.h}, so the code must
|
||
@code{#include} that file.
|
||
|
||
In the body, write
|
||
|
||
@example
|
||
va_list ap;
|
||
va_start (ap, @var{last_fixed_parameter});
|
||
@end example
|
||
|
||
@noindent
|
||
This declares the variable @code{ap} (you can use any name for it)
|
||
and then sets it up to point before the first additional argument.
|
||
|
||
Then, to fetch the next consecutive additional argument, write this:
|
||
|
||
@example
|
||
va_arg (ap, @var{type})
|
||
@end example
|
||
|
||
After fetching all the additional arguments (or as many as need to be
|
||
used), write this:
|
||
|
||
@example
|
||
va_end (ap);
|
||
@end example
|
||
|
||
Here's an example of a variadic function definition that adds any
|
||
number of @code{int} arguments. The first (fixed) argument says how
|
||
many more arguments follow.
|
||
|
||
@example
|
||
#include <stdarg.h> /* @r{Defines @code{va}@r{@dots{}} macros.} */
|
||
@r{@dots{}}
|
||
|
||
int
|
||
add_multiple_values (int argcount, ...)
|
||
@{
|
||
int counter, total = 0;
|
||
|
||
/* @r{Declare a variable of type @code{va_list}.} */
|
||
va_list argptr;
|
||
|
||
/* @r{Initialize that variable..} */
|
||
va_start (argptr, argcount);
|
||
|
||
for (counter = 0; counter < argcount; counter++)
|
||
@{
|
||
/* @r{Get the next additional argument.} */
|
||
total += va_arg (argptr, int);
|
||
@}
|
||
|
||
/* @r{End use of the @code{argptr} variable.} */
|
||
va_end (argptr);
|
||
|
||
return total;
|
||
@}
|
||
@end example
|
||
|
||
With GNU C, @code{va_end} is superfluous, but some other compilers
|
||
might make @code{va_start} allocate memory so that calling
|
||
@code{va_end} is necessary to avoid a memory leak. Before doing
|
||
@code{va_start} again with the same variable, do @code{va_end}
|
||
first.
|
||
|
||
@cindex @code{va_copy}
|
||
Because of this possible memory allocation, it is risky (in principle)
|
||
to copy one @code{va_list} variable to another with assignment.
|
||
Instead, use @code{va_copy}, which copies the substance but allocates
|
||
separate memory in the variable you copy to. The call looks like
|
||
@code{va_copy (@var{to}, @var{from})}, where both @var{to} and
|
||
@var{from} should be variables of type @code{va_list}. In principle,
|
||
do @code{va_end} on each of these variables before its scope ends.
|
||
|
||
Since the additional arguments' types are not specified in the
|
||
function's definition, the default argument promotions
|
||
(@pxref{Argument Promotions}) apply to them in function calls. The
|
||
function definition must take account of this; thus, if an argument
|
||
was passed as @code{short}, the function should get it as @code{int}.
|
||
If an argument was passed as @code{float}, the function should get it
|
||
as @code{double}.
|
||
|
||
C has no mechanism to tell the variadic function how many arguments
|
||
were passed to it, so its calling convention must give it a way to
|
||
determine this. That's why @code{add_multiple_values} takes a fixed
|
||
argument that says how many more arguments follow. Thus, you can
|
||
call the function like this:
|
||
|
||
@example
|
||
sum = add_multiple_values (3, 12, 34, 190);
|
||
/* @r{Value is 12+34+190.} */
|
||
@end example
|
||
|
||
In GNU C, there is no actual need to use the @code{va_end} function.
|
||
In fact, it does nothing. It's used for compatibility with other
|
||
compilers, when that matters.
|
||
|
||
It is a mistake to access variables declared as @code{va_list} except
|
||
in the specific ways described here. Just what that type consists of
|
||
is an implementation detail, which could vary from one platform to
|
||
another.
|
||
|
||
@node Nested Functions
|
||
@subsection Nested Functions
|
||
@cindex nested functions
|
||
@cindex functions, nested
|
||
@cindex downward funargs
|
||
@cindex thunks
|
||
|
||
A @dfn{nested function} is a function defined inside another function.
|
||
(The ability to do this is indispensable for automatic translation of
|
||
certain programming languages into C.) The nested function's name is
|
||
local to the block where it is defined. For example, here we define a
|
||
nested function named @code{square}, then call it twice:
|
||
|
||
@example
|
||
@group
|
||
foo (double a, double b)
|
||
@{
|
||
double square (double z) @{ return z * z; @}
|
||
|
||
return square (a) + square (b);
|
||
@}
|
||
@end group
|
||
@end example
|
||
|
||
The nested function definition can access all the variables of the containing
|
||
function that are visible at the point of its definition. This is
|
||
called @dfn{lexical scoping}. For example, here we show a nested
|
||
function that uses an inherited variable named @code{offset}:
|
||
|
||
@example
|
||
@group
|
||
bar (int *array, int offset, int size)
|
||
@{
|
||
int access (int *array, int index)
|
||
@{ return array[index + offset]; @}
|
||
int i;
|
||
@r{@dots{}}
|
||
for (i = 0; i < size; i++)
|
||
@r{@dots{}} access (array, i) @r{@dots{}}
|
||
@}
|
||
@end group
|
||
@end example
|
||
|
||
Nested function definitions can appear wherever automatic variable
|
||
declarations are allowed; that is, in any block, interspersed with the
|
||
other declarations and statements in the block.
|
||
|
||
The nested function's name is visible only within the parent block;
|
||
the name's scope starts from its definition and continues to the end
|
||
of the containing block. If the nested function's name
|
||
is the same as the parent function's name, there will be
|
||
no way to refer to the parent function inside the scope of the
|
||
name of the nested function.
|
||
|
||
Using @code{extern} or @code{static} on a nested function definition
|
||
is an error.
|
||
|
||
It is possible to call the nested function from outside the scope of its
|
||
name by storing its address or passing the address to another function.
|
||
You can do this safely, but you must be careful:
|
||
|
||
@example
|
||
@group
|
||
hack (int *array, int size, int addition)
|
||
@{
|
||
void store (int index, int value)
|
||
@{ array[index] = value + addition; @}
|
||
|
||
intermediate (store, size);
|
||
@}
|
||
@end group
|
||
@end example
|
||
|
||
Here, the function @code{intermediate} receives the address of
|
||
@code{store} as an argument. If @code{intermediate} calls @code{store},
|
||
the arguments given to @code{store} are used to store into @code{array}.
|
||
@code{store} also accesses @code{hack}'s local variable @code{addition}.
|
||
|
||
It is safe for @code{intermediate} to call @code{store} because
|
||
@code{hack}'s stack frame, with its arguments and local variables,
|
||
continues to exist during the call to @code{intermediate}.
|
||
|
||
Calling the nested function through its address after the containing
|
||
function has exited is asking for trouble. If it is called after a
|
||
containing scope level has exited, and if it refers to some of the
|
||
variables that are no longer in scope, it will refer to memory
|
||
containing junk or other data. It's not wise to take the risk.
|
||
|
||
The GNU C Compiler implements taking the address of a nested function
|
||
using a technique called @dfn{trampolines}. This technique was
|
||
described in @cite{Lexical Closures for C@t{++}} (Thomas M. Breuel,
|
||
USENIX C@t{++} Conference Proceedings, October 17--21, 1988).
|
||
|
||
A nested function can jump to a label inherited from a containing
|
||
function, provided the label was explicitly declared in the containing
|
||
function (@pxref{Local Labels}). Such a jump returns instantly to the
|
||
containing function, exiting the nested function that did the
|
||
@code{goto} and any intermediate function invocations as well. Here
|
||
is an example:
|
||
|
||
@example
|
||
@group
|
||
bar (int *array, int offset, int size)
|
||
@{
|
||
/* @r{Explicitly declare the label @code{failure}.} */
|
||
__label__ failure;
|
||
int access (int *array, int index)
|
||
@{
|
||
if (index > size)
|
||
/* @r{Exit this function,}
|
||
@r{and return to @code{bar}.} */
|
||
goto failure;
|
||
return array[index + offset];
|
||
@}
|
||
@end group
|
||
|
||
@group
|
||
int i;
|
||
@r{@dots{}}
|
||
for (i = 0; i < size; i++)
|
||
@r{@dots{}} access (array, i) @r{@dots{}}
|
||
@r{@dots{}}
|
||
return 0;
|
||
|
||
/* @r{Control comes here from @code{access}
|
||
if it does the @code{goto}.} */
|
||
failure:
|
||
return -1;
|
||
@}
|
||
@end group
|
||
@end example
|
||
|
||
To declare the nested function before its definition, use
|
||
@code{auto} (which is otherwise meaningless for function declarations;
|
||
@pxref{auto and register}). For example,
|
||
|
||
@example
|
||
bar (int *array, int offset, int size)
|
||
@{
|
||
auto int access (int *, int);
|
||
@r{@dots{}}
|
||
@r{@dots{}} access (array, i) @r{@dots{}}
|
||
@r{@dots{}}
|
||
int access (int *array, int index)
|
||
@{
|
||
@r{@dots{}}
|
||
@}
|
||
@r{@dots{}}
|
||
@}
|
||
@end example
|
||
|
||
@node Inline Function Definitions
|
||
@subsection Inline Function Definitions
|
||
@cindex inline function definitions
|
||
@cindex function definitions, inline
|
||
@findex inline
|
||
|
||
To declare a function inline, use the @code{inline} keyword in its
|
||
definition. Here's code to define functions to access two fields
|
||
in a structure, and inlines them so that there is no cost to accessing
|
||
them by calling the functions.
|
||
|
||
@example
|
||
struct list
|
||
@{
|
||
struct list *first, *second;
|
||
@};
|
||
|
||
inline struct list *
|
||
list_first (struct list *p)
|
||
@{
|
||
return p->first;
|
||
@}
|
||
|
||
inline struct list *
|
||
list_second (struct list *p)
|
||
@{
|
||
return p->second;
|
||
@}
|
||
@end example
|
||
|
||
Optimized compilation can substitute the inline function's body for
|
||
any call to it. This is called @emph{inlining} the function. It
|
||
makes the code that contains the call run faster, significantly so if
|
||
the inline function is small.
|
||
|
||
Here's a function that uses @code{list_second}:
|
||
|
||
@example
|
||
int
|
||
pairlist_length (struct list *l)
|
||
@{
|
||
int length = 0;
|
||
while (l)
|
||
@{
|
||
length++;
|
||
l = list_second (l);
|
||
@}
|
||
return length;
|
||
@}
|
||
@end example
|
||
|
||
Substituting the code of @code{list_second} into the definition of
|
||
@code{pairlist_length} results in this code, in effect:
|
||
|
||
@example
|
||
int
|
||
pairlist_length (struct list *l)
|
||
@{
|
||
int length = 0;
|
||
while (l)
|
||
@{
|
||
length++;
|
||
l = l->second;
|
||
@}
|
||
return length;
|
||
@}
|
||
@end example
|
||
|
||
Since the definition of @code{list_second} does not say @code{extern}
|
||
or @code{static}, that definition is used only for inlining. It
|
||
doesn't generate code that can be called at run time. If not all the
|
||
calls to the function are inlined, there must be a definition of the
|
||
same function name in another module for them to call.
|
||
|
||
@cindex inline functions, omission of
|
||
@c @opindex fkeep-inline-functions
|
||
Adding @code{static} to an inline function definition means the
|
||
function definition is limited to this compilation module. Also, it
|
||
generates run-time code if necessary for the sake of any calls that
|
||
were not inlined. If all calls are inlined then the function
|
||
definition does not generate run-time code, but you can force
|
||
generation of run-time code with the option
|
||
@option{-fkeep-inline-functions}.
|
||
|
||
@cindex extern inline function
|
||
Specifying @code{extern} along with @code{inline} means the function is
|
||
external and generates run-time code to be called from other
|
||
separately compiled modules, as well as inlined. You can define the
|
||
function as @code{inline} without @code{extern} in other modules so as
|
||
to inline calls to the same function in those modules.
|
||
|
||
Why are some calls not inlined? First of all, inlining is an
|
||
optimization, so non-optimized compilation does not inline.
|
||
|
||
Some calls cannot be inlined for technical reasons. Also, certain
|
||
usages in a function definition can make it unsuitable for inline
|
||
substitution. Among these usages are: variadic functions, use of
|
||
@code{alloca}, use of computed goto (@pxref{Labels as Values}), and
|
||
use of nonlocal goto. The option @option{-Winline} requests a warning
|
||
when a function marked @code{inline} is unsuitable to be inlined. The
|
||
warning explains what obstacle makes it unsuitable.
|
||
|
||
Just because a call @emph{can} be inlined does not mean it
|
||
@emph{should} be inlined. The GNU C compiler weighs costs and
|
||
benefits to decide whether inlining a particular call is advantageous.
|
||
|
||
You can force inlining of all calls to a given function that can be
|
||
inlined, even in a non-optimized compilation. by specifying the
|
||
@samp{always_inline} attribute for the function, like this:
|
||
|
||
@example
|
||
/* @r{Prototype.} */
|
||
inline void foo (const char) __attribute__((always_inline));
|
||
@end example
|
||
|
||
@noindent
|
||
This is a GNU C extension. @xref{Attributes}.
|
||
|
||
A function call may be inlined even if not declared @code{inline} in
|
||
special cases where the compiler can determine this is correct and
|
||
desirable. For instance, when a static function is called only once,
|
||
it will very likely be inlined. With @option{-flto}, link-time
|
||
optimization, any function might be inlined. To absolutely prevent
|
||
inlining of a specific function, specify
|
||
@code{__attribute__((__noinline__))} in the function's definition.
|
||
|
||
@node Obsolete Definitions
|
||
@section Obsolete Function Features
|
||
|
||
These features of function definitions are still used in old
|
||
programs, but you shouldn't write code this way today.
|
||
If you are just learning C, you can skip this section.
|
||
|
||
@menu
|
||
* Old GNU Inlining:: An older inlining technique.
|
||
* Old-Style Function Definitions:: Original K&R style functions.
|
||
@end menu
|
||
|
||
@node Old GNU Inlining
|
||
@subsection Older GNU C Inlining
|
||
|
||
The GNU C spec for inline functions, before GCC version 5, defined
|
||
@code{extern inline} on a function definition to mean to inline calls
|
||
to it but @emph{not} generate code for the function that could be
|
||
called at run time. By contrast, @code{inline} without @code{extern}
|
||
specified to generate run-time code for the function. In effect, ISO
|
||
incompatibly flipped the meanings of these two cases. We changed GCC
|
||
in version 5 to adopt the ISO specification.
|
||
|
||
Many programs still use these cases with the previous GNU C meanings.
|
||
You can specify use of those meanings with the option
|
||
@option{-fgnu89-inline}. You can also specify this for a single
|
||
function with @code{__attribute__ ((gnu_inline))}. Here's an example:
|
||
|
||
@example
|
||
inline __attribute__ ((gnu_inline))
|
||
int
|
||
inc (int *a)
|
||
@{
|
||
(*a)++;
|
||
@}
|
||
@end example
|
||
|
||
@node Old-Style Function Definitions
|
||
@subsection Old-Style Function Definitions
|
||
@cindex old-style function definitions
|
||
@cindex function definitions, old-style
|
||
@cindex K&R-style function definitions
|
||
|
||
The syntax of C traditionally allows omitting the data type in a
|
||
function declaration if it specifies a storage class or a qualifier.
|
||
Then the type defaults to @code{int}. For example:
|
||
|
||
@example
|
||
static foo (double x);
|
||
@end example
|
||
|
||
@noindent
|
||
defaults the return type to @code{int}. This is bad practice; if you
|
||
see it, fix it.
|
||
|
||
An @dfn{old-style} (or ``K&R'') function definition is the way
|
||
function definitions were written in the 1980s. It looks like this:
|
||
|
||
@example
|
||
@var{rettype}
|
||
@var{function} (@var{parmnames})
|
||
@var{parm_declarations}
|
||
@{
|
||
@var{body}
|
||
@}
|
||
@end example
|
||
|
||
In @var{parmnames}, only the parameter names are listed, separated by
|
||
commas. Then @var{parm_declarations} declares their data types; these
|
||
declarations look just like variable declarations. If a parameter is
|
||
listed in @var{parmnames} but has no declaration, it is implicitly
|
||
declared @code{int}.
|
||
|
||
There is no reason to write a definition this way nowadays, but they
|
||
can still be seen in older GNU programs.
|
||
|
||
An old-style variadic function definition looks like this:
|
||
|
||
@example
|
||
#include <varargs.h>
|
||
|
||
int
|
||
add_multiple_values (va_alist)
|
||
va_dcl
|
||
@{
|
||
int argcount;
|
||
int counter, total = 0;
|
||
|
||
/* @r{Declare a variable of type @code{va_list}.} */
|
||
va_list argptr;
|
||
|
||
/* @r{Initialize that variable.} */
|
||
va_start (argptr);
|
||
|
||
/* @r{Get the first argument (fixed).} */
|
||
argcount = va_arg (int);
|
||
|
||
for (counter = 0; counter < argcount; counter++)
|
||
@{
|
||
/* @r{Get the next additional argument.} */
|
||
total += va_arg (argptr, int);
|
||
@}
|
||
|
||
/* @r{End use of the @code{argptr} variable.} */
|
||
va_end (argptr);
|
||
|
||
return total;
|
||
@}
|
||
@end example
|
||
|
||
Note that the old-style variadic function definition has no fixed
|
||
parameter variables; all arguments must be obtained with
|
||
@code{va_arg}.
|
||
|
||
@node Compatible Types
|
||
@chapter Compatible Types
|
||
@cindex compatible types
|
||
@cindex types, compatible
|
||
|
||
Declaring a function or variable twice is valid in C only if the two
|
||
declarations specify @dfn{compatible} types. In addition, some
|
||
operations on pointers require operands to have compatible target
|
||
types.
|
||
|
||
In C, two different primitive types are never compatible. Likewise for
|
||
the defined types @code{struct}, @code{union} and @code{enum}: two
|
||
separately defined types are incompatible unless they are defined
|
||
exactly the same way.
|
||
|
||
However, there are a few cases where different types can be
|
||
compatible:
|
||
|
||
@itemize @bullet
|
||
@item
|
||
Every enumeration type is compatible with some integer type. In GNU
|
||
C, the choice of integer type depends on the largest enumeration
|
||
value.
|
||
|
||
@c ??? Which one, in GCC?
|
||
@c ??? ... it varies, depending on the enum values. Testing on
|
||
@c ??? fencepost, it appears to use a 4-byte signed integer first,
|
||
@c ??? then moves on to an 8-byte signed integer. These details
|
||
@c ??? might be platform-dependent, as the C standard says that even
|
||
@c ??? char could be used as an enum type, but it's at least true
|
||
@c ??? that GCC chooses a type that is at least large enough to
|
||
@c ??? hold the largest enum value.
|
||
|
||
@item
|
||
Array types are compatible if the element types are compatible
|
||
and the sizes (when specified) match.
|
||
|
||
@item
|
||
Pointer types are compatible if the pointer target types are
|
||
compatible.
|
||
|
||
@item
|
||
Function types that specify argument types are compatible if the
|
||
return types are compatible and the argument types are compatible,
|
||
argument by argument. In addition, they must all agree in whether
|
||
they use @code{...} to allow additional arguments.
|
||
|
||
@item
|
||
Function types that don't specify argument types are compatible if the
|
||
return types are.
|
||
|
||
@item
|
||
Function types that specify the argument types are compatible with
|
||
function types that omit them, if the return types are compatible and
|
||
the specified argument types are unaltered by the argument promotions
|
||
(@pxref{Argument Promotions}).
|
||
@end itemize
|
||
|
||
In order for types to be compatible, they must agree in their type
|
||
qualifiers. Thus, @code{const int} and @code{int} are incompatible.
|
||
It follows that @code{const int *} and @code{int *} are incompatible
|
||
too (they are pointers to types that are not compatible).
|
||
|
||
If two types are compatible ignoring the qualifiers, we call them
|
||
@dfn{nearly compatible}. (If they are array types, we ignore
|
||
qualifiers on the element types.@footnote{This is a GNU C extension.})
|
||
Comparison of pointers is valid if the pointers' target types are
|
||
nearly compatible. Likewise, the two branches of a conditional
|
||
expression may be pointers to nearly compatible target types.
|
||
|
||
If two types are compatible ignoring the qualifiers, and the first
|
||
type has all the qualifiers of the second type, we say the first is
|
||
@dfn{upward compatible} with the second. Assignment of pointers
|
||
requires the assigned pointer's target type to be upward compatible
|
||
with the right operand (the new value)'s target type.
|
||
|
||
@node Type Conversions
|
||
@chapter Type Conversions
|
||
@cindex type conversions
|
||
@cindex conversions, type
|
||
|
||
C converts between data types automatically when that seems clearly
|
||
necessary. In addition, you can convert explicitly with a @dfn{cast}.
|
||
|
||
@menu
|
||
* Explicit Type Conversion:: Casting a value from one type to another.
|
||
* Assignment Type Conversions:: Automatic conversion by assignment operation.
|
||
* Argument Promotions:: Automatic conversion of function parameters.
|
||
* Operand Promotions:: Automatic conversion of arithmetic operands.
|
||
* Common Type:: When operand types differ, which one is used?
|
||
@end menu
|
||
|
||
@node Explicit Type Conversion
|
||
@section Explicit Type Conversion
|
||
@cindex cast
|
||
@cindex explicit type conversion
|
||
|
||
You can do explicit conversions using the unary @dfn{cast} operator,
|
||
which is written as a type designator (@pxref{Type Designators}) in
|
||
parentheses. For example, @code{(int)} is the operator to cast to
|
||
type @code{int}. Here's an example of using it:
|
||
|
||
@example
|
||
@{
|
||
double d = 5.5;
|
||
|
||
printf ("Floating point value: %f\n", d);
|
||
printf ("Rounded to integer: %d\n", (int) d);
|
||
@}
|
||
@end example
|
||
|
||
Using @code{(int) d} passes an @code{int} value as argument to
|
||
@code{printf}, so you can print it with @samp{%d}. Using just
|
||
@code{d} without the cast would pass the value as @code{double}.
|
||
That won't work at all with @samp{%d}; the results would be gibberish.
|
||
|
||
To divide one integer by another without rounding,
|
||
cast either of the integers to @code{double} first:
|
||
|
||
@example
|
||
(double) @var{dividend} / @var{divisor}
|
||
@var{dividend} / (double) @var{divisor}
|
||
@end example
|
||
|
||
It is enough to cast one of them, because that forces the common type
|
||
to @code{double} so the other will be converted automatically.
|
||
|
||
The valid cast conversions are:
|
||
|
||
@itemize @bullet
|
||
@item
|
||
One numerical type to another.
|
||
|
||
@item
|
||
One pointer type to another.
|
||
(Converting between pointers that point to functions
|
||
and pointers that point to data is not standard C.)
|
||
|
||
@item
|
||
A pointer type to an integer type.
|
||
|
||
@item
|
||
An integer type to a pointer type.
|
||
|
||
@item
|
||
To a union type, from the type of any alternative in the union
|
||
(@pxref{Unions}). (This is a GNU extension.)
|
||
|
||
@item
|
||
Anything, to @code{void}.
|
||
@end itemize
|
||
|
||
@node Assignment Type Conversions
|
||
@section Assignment Type Conversions
|
||
@cindex assignment type conversions
|
||
|
||
Certain type conversions occur automatically in assignments
|
||
and certain other contexts. These are the conversions
|
||
assignments can do:
|
||
|
||
@itemize @bullet
|
||
@item
|
||
Converting any numeric type to any other numeric type.
|
||
|
||
@item
|
||
Converting @code{void *} to any other pointer type
|
||
(except pointer-to-function types).
|
||
|
||
@item
|
||
Converting any other pointer type to @code{void *}.
|
||
(except pointer-to-function types).
|
||
|
||
@item
|
||
Converting 0 (a null pointer constant) to any pointer type.
|
||
|
||
@item
|
||
Converting any pointer type to @code{bool}. (The result is
|
||
1 if the pointer is not null.)
|
||
|
||
@item
|
||
Converting between pointer types when the left-hand target type is
|
||
upward compatible with the right-hand target type. @xref{Compatible
|
||
Types}.
|
||
@end itemize
|
||
|
||
These type conversions occur automatically in certain contexts,
|
||
which are:
|
||
|
||
@itemize @bullet
|
||
@item
|
||
An assignment converts the type of the right-hand expression
|
||
to the type wanted by the left-hand expression. For example,
|
||
|
||
@example
|
||
double i;
|
||
i = 5;
|
||
@end example
|
||
|
||
@noindent
|
||
converts 5 to @code{double}.
|
||
|
||
@item
|
||
A function call, when the function specifies the type for that
|
||
argument, converts the argument value to that type. For example,
|
||
|
||
@example
|
||
void foo (double);
|
||
foo (5);
|
||
@end example
|
||
|
||
@noindent
|
||
converts 5 to @code{double}.
|
||
|
||
@item
|
||
A @code{return} statement converts the specified value to the type
|
||
that the function is declared to return. For example,
|
||
|
||
@example
|
||
double
|
||
foo ()
|
||
@{
|
||
return 5;
|
||
@}
|
||
@end example
|
||
|
||
@noindent
|
||
also converts 5 to @code{double}.
|
||
@end itemize
|
||
|
||
In all three contexts, if the conversion is impossible, that
|
||
constitutes an error.
|
||
|
||
@node Argument Promotions
|
||
@section Argument Promotions
|
||
@cindex argument promotions
|
||
@cindex promotion of arguments
|
||
|
||
When a function's definition or declaration does not specify the type
|
||
of an argument, that argument is passed without conversion in whatever
|
||
type it has, with these exceptions:
|
||
|
||
@itemize @bullet
|
||
@item
|
||
Some narrow numeric values are @dfn{promoted} to a wider type. If the
|
||
expression is a narrow integer, such as @code{char} or @code{short},
|
||
the call converts it automatically to @code{int} (@pxref{Integer
|
||
Types}).@footnote{On an embedded controller where @code{char}
|
||
or @code{short} is the same width as @code{int}, @code{unsigned char}
|
||
or @code{unsigned short} promotes to @code{unsigned int}, but that
|
||
never occurs in GNU C on real computers.}
|
||
|
||
In this example, the expression @code{c} is passed as an @code{int}:
|
||
|
||
@example
|
||
char c = '$';
|
||
|
||
printf ("Character c is '%c'\n", c);
|
||
@end example
|
||
|
||
@item
|
||
If the expression
|
||
has type @code{float}, the call converts it automatically to
|
||
@code{double}.
|
||
|
||
@item
|
||
An array as argument is converted to a pointer to its zeroth element.
|
||
|
||
@item
|
||
A function name as argument is converted to a pointer to that function.
|
||
@end itemize
|
||
|
||
@node Operand Promotions
|
||
@section Operand Promotions
|
||
@cindex operand promotions
|
||
|
||
The operands in arithmetic operations undergo type conversion automatically.
|
||
These @dfn{operand promotions} are the same as the argument promotions
|
||
except without converting @code{float} to @code{double}. In other words,
|
||
the operand promotions convert
|
||
|
||
@itemize @bullet
|
||
@item
|
||
@code{char} or @code{short} (whether signed or not) to @code{int}.
|
||
|
||
@item
|
||
an array to a pointer to its zeroth element, and
|
||
|
||
@item
|
||
a function name to a pointer to that function.
|
||
@end itemize
|
||
|
||
@node Common Type
|
||
@section Common Type
|
||
@cindex common type
|
||
|
||
Arithmetic binary operators (except the shift operators) convert their
|
||
operands to the @dfn{common type} before operating on them.
|
||
Conditional expressions also convert the two possible results to their
|
||
common type. Here are the rules for determining the common type.
|
||
|
||
If one of the numbers has a floating-point type and the other is an
|
||
integer, the common type is that floating-point type. For instance,
|
||
|
||
@example
|
||
5.6 * 2 @result{} 11.2 /* @r{a @code{double} value} */
|
||
@end example
|
||
|
||
If both are floating point, the type with the larger range is the
|
||
common type.
|
||
|
||
If both are integers but of different widths, the common type
|
||
is the wider of the two.
|
||
|
||
If they are integer types of the same width, the common type is
|
||
unsigned if either operand is unsigned, and it's @code{long} if either
|
||
operand is @code{long}. It's @code{long long} if either operand is
|
||
@code{long long}.
|
||
|
||
These rules apply to addition, subtraction, multiplication, division,
|
||
remainder, comparisons, and bitwise operations. They also apply to
|
||
the two branches of a conditional expression, and to the arithmetic
|
||
done in a modifying assignment operation.
|
||
|
||
@node Scope
|
||
@chapter Scope
|
||
@cindex scope
|
||
@cindex block scope
|
||
@cindex function scope
|
||
@cindex function prototype scope
|
||
|
||
Each definition or declaration of an identifier is visible
|
||
in certain parts of the program, which is typically less than the whole
|
||
of the program. The parts where it is visible are called its @dfn{scope}.
|
||
|
||
Normally, declarations made at the top-level in the source---that is,
|
||
not within any blocks and function definitions---are visible for the
|
||
entire contents of the source file after that point. This is called
|
||
@dfn{file scope} (@pxref{File-Scope Variables}).
|
||
|
||
Declarations made within blocks of code, including within function
|
||
definitions, are visible only within those blocks. This is called
|
||
@dfn{block scope}. Here is an example:
|
||
|
||
@example
|
||
@group
|
||
void
|
||
foo (void)
|
||
@{
|
||
int x = 42;
|
||
@}
|
||
@end group
|
||
@end example
|
||
|
||
@noindent
|
||
In this example, the variable @code{x} has block scope; it is visible
|
||
only within the @code{foo} function definition block. Thus, other
|
||
blocks could have their own variables, also named @code{x}, without
|
||
any conflict between those variables.
|
||
|
||
A variable declared inside a subblock has a scope limited to
|
||
that subblock,
|
||
|
||
@example
|
||
@group
|
||
void
|
||
foo (void)
|
||
@{
|
||
@{
|
||
int x = 42;
|
||
@}
|
||
// @r{@code{x} is out of scope here.}
|
||
@}
|
||
@end group
|
||
@end example
|
||
|
||
If a variable declared within a block has the same name as a variable
|
||
declared outside of that block, the definition within the block
|
||
takes precedence during its scope:
|
||
|
||
@example
|
||
@group
|
||
int x = 42;
|
||
|
||
void
|
||
foo (void)
|
||
@{
|
||
int x = 17;
|
||
printf ("%d\n", x);
|
||
@}
|
||
@end group
|
||
@end example
|
||
|
||
@noindent
|
||
This prints 17, the value of the variable @code{x} declared in the
|
||
function body block, rather than the value of the variable @code{x} at
|
||
file scope. We say that the inner declaration of @code{x}
|
||
@dfn{shadows} the outer declaration, for the extent of the inner
|
||
declaration's scope.
|
||
|
||
A declaration with block scope can be shadowed by another declaration
|
||
with the same name in a subblock.
|
||
|
||
@example
|
||
@group
|
||
void
|
||
foo (void)
|
||
@{
|
||
char *x = "foo";
|
||
@{
|
||
int x = 42;
|
||
@r{@dots{}}
|
||
exit (x / 6);
|
||
@}
|
||
@}
|
||
@end group
|
||
@end example
|
||
|
||
A function parameter's scope is the entire function body, but it can
|
||
be shadowed. For example:
|
||
|
||
@example
|
||
@group
|
||
int x = 42;
|
||
|
||
void
|
||
foo (int x)
|
||
@{
|
||
printf ("%d\n", x);
|
||
@}
|
||
@end group
|
||
@end example
|
||
|
||
@noindent
|
||
This prints the value of @code{x} the function parameter, rather than
|
||
the value of the file-scope variable @code{x}.
|
||
|
||
Labels (@pxref{goto Statement}) have @dfn{function} scope: each label
|
||
is visible for the whole of the containing function body, both before
|
||
and after the label declaration:
|
||
|
||
@example
|
||
@group
|
||
void
|
||
foo (void)
|
||
@{
|
||
@r{@dots{}}
|
||
goto bar;
|
||
@r{@dots{}}
|
||
@{ // @r{Subblock does not affect labels.}
|
||
bar:
|
||
@r{@dots{}}
|
||
@}
|
||
goto bar;
|
||
@}
|
||
@end group
|
||
@end example
|
||
|
||
Except for labels, a declared identifier is not
|
||
visible to code before its declaration. For example:
|
||
|
||
@example
|
||
@group
|
||
int x = 5;
|
||
int y = x + 10;
|
||
@end group
|
||
@end example
|
||
|
||
@noindent
|
||
will work, but:
|
||
|
||
@example
|
||
@group
|
||
int x = y + 10;
|
||
int y = 5;
|
||
@end group
|
||
@end example
|
||
|
||
@noindent
|
||
cannot refer to the variable @code{y} before its declaration.
|
||
|
||
@include cpp.texi
|
||
|
||
@node Integers in Depth
|
||
@chapter Integers in Depth
|
||
|
||
This chapter explains the machine-level details of integer types: how
|
||
they are represented as bits in memory, and the range of possible
|
||
values for each integer type.
|
||
|
||
@menu
|
||
* Integer Representations:: How integer values appear in memory.
|
||
* Maximum and Minimum Values:: Value ranges of integer types.
|
||
@end menu
|
||
|
||
@node Integer Representations
|
||
@section Integer Representations
|
||
|
||
@cindex integer representations
|
||
@cindex representation of integers
|
||
|
||
Modern computers store integer values as binary (base-2) numbers that
|
||
occupy a single unit of storage, typically either as an 8-bit
|
||
@code{char}, a 16-bit @code{short int}, a 32-bit @code{int}, or
|
||
possibly, a 64-bit @code{long long int}. Whether a @code{long int} is
|
||
a 32-bit or a 64-bit value is system dependent.@footnote{In theory,
|
||
any of these types could have some other size, bit it's not worth even
|
||
a minute to cater to that possibility. It never happens on
|
||
GNU/Linux.}
|
||
|
||
@cindex @code{CHAR_BIT}
|
||
The macro @code{CHAR_BIT}, defined in @file{limits.h}, gives the number
|
||
of bits in type @code{char}. On any real operating system, the value
|
||
is 8.
|
||
|
||
The fixed sizes of numeric types necessarily limit their @dfn{range
|
||
of values}, and the particular encoding of integers decides what that
|
||
range is.
|
||
|
||
@cindex two's-complement representation
|
||
For unsigned integers, the entire space is used to represent a
|
||
nonnegative value. Signed integers are stored using
|
||
@dfn{two's-complement representation}: a signed integer with @var{n}
|
||
bits has a range from @math{-2@sup{(@var{n} - 1)}} to @minus{}1 to 0
|
||
to 1 to @math{+2@sup{(@var{n} - 1)} - 1}, inclusive. The leftmost, or
|
||
high-order, bit is called the @dfn{sign bit}.
|
||
|
||
In two's-complement representation, there is only one value that means
|
||
zero, and the most negative number lacks a positive counterpart. As a
|
||
result, negating that number causes overflow; in practice, its result
|
||
is that number back again. We will revisit that peculiarity shortly.
|
||
|
||
For example, a two's-complement signed 8-bit integer can represent all
|
||
decimal numbers from @minus{}128 to +127. Negating @minus{}128 ought
|
||
to give +128, but that value won't fit in 8 bits, so the operation
|
||
yields @minus{}128.
|
||
|
||
Decades ago, there were computers that used other representations for
|
||
signed integers, but they are long gone and not worth any effort to
|
||
support. The GNU C language does not support them.
|
||
|
||
@c ??? Is this duplicate?
|
||
|
||
When an arithmetic operation produces a value that is too big to
|
||
represent, the operation is said to @dfn{overflow}. In C, integer
|
||
overflow does not interrupt the control flow or signal an error.
|
||
What it does depends on signedness.
|
||
|
||
For unsigned arithmetic, the result of an operation that overflows is
|
||
the @var{n} low-order bits of the correct value. If the correct value
|
||
is representable in @var{n} bits, that is always the result;
|
||
thus we often say that ``integer arithmetic is exact,'' omitting the
|
||
crucial qualifying phrase ``as long as the exact result is
|
||
representable.''
|
||
|
||
In principle, a C program should be written so that overflow never
|
||
occurs for signed integers, but in GNU C you can specify various ways
|
||
of handling such overflow (@pxref{Integer Overflow}).
|
||
|
||
Integer representations are best understood by looking at a table for
|
||
a tiny integer size; here are the possible values for an integer with
|
||
three bits:
|
||
|
||
@multitable @columnfractions .25 .25 .25 .25
|
||
@headitem Unsigned @tab Signed @tab Bits @tab 2s Complement
|
||
@item 0 @tab 0 @tab 000 @tab 000 (0)
|
||
@item 1 @tab 1 @tab 001 @tab 111 (-1)
|
||
@item 2 @tab 2 @tab 010 @tab 110 (-2)
|
||
@item 3 @tab 3 @tab 011 @tab 101 (-3)
|
||
@item 4 @tab -4 @tab 100 @tab 100 (-4)
|
||
@item 5 @tab -3 @tab 101 @tab 011 (3)
|
||
@item 6 @tab -2 @tab 110 @tab 010 (2)
|
||
@item 7 @tab -1 @tab 111 @tab 001 (1)
|
||
@end multitable
|
||
|
||
The parenthesized decimal numbers in the last column represent the
|
||
signed meanings of the two's-complement of the line's value. Recall
|
||
that, in two's-complement encoding, the high-order bit is 0 when
|
||
the number is nonnegative.
|
||
|
||
We can now understand the peculiar behavior of negation of the
|
||
most negative two's-complement integer: start with 0b100,
|
||
invert the bits to get 0b011, and add 1: we get
|
||
0b100, the value we started with.
|
||
|
||
We can also see overflow behavior in two's-complement:
|
||
|
||
@example
|
||
3 + 1 = 0b011 + 0b001 = 0b100 = (-4)
|
||
3 + 2 = 0b011 + 0b010 = 0b101 = (-3)
|
||
3 + 3 = 0b011 + 0b011 = 0b110 = (-2)
|
||
@end example
|
||
|
||
@noindent
|
||
A sum of two nonnegative signed values that overflows has a 1 in the
|
||
sign bit, so the exact positive result is truncated to a negative
|
||
value.
|
||
|
||
@c =====================================================================
|
||
|
||
@node Maximum and Minimum Values
|
||
@section Maximum and Minimum Values
|
||
@cindex maximum integer values
|
||
@cindex minimum integer values
|
||
@cindex integer ranges
|
||
@cindex ranges of integer types
|
||
@findex INT_MAX
|
||
@findex UINT_MAX
|
||
@findex SHRT_MAX
|
||
@findex LONG_MAX
|
||
@findex LLONG_MAX
|
||
@findex USHRT_MAX
|
||
@findex ULONG_MAX
|
||
@findex ULLONG_MAX
|
||
@findex CHAR_MAX
|
||
@findex SCHAR_MAX
|
||
@findex UCHAR_MAX
|
||
|
||
For each primitive integer type, there is a standard macro defined in
|
||
@file{limits.h} that gives the largest value that type can hold. For
|
||
instance, for type @code{int}, the maximum value is @code{INT_MAX}.
|
||
On a 32-bit computer, that is equal to 2,147,483,647. The
|
||
maximum value for @code{unsigned int} is @code{UINT_MAX}, which on a
|
||
32-bit computer is equal to 4,294,967,295. Likewise, there are
|
||
@code{SHRT_MAX}, @code{LONG_MAX}, and @code{LLONG_MAX}, and
|
||
corresponding unsigned limits @code{USHRT_MAX}, @code{ULONG_MAX}, and
|
||
@code{ULLONG_MAX}.
|
||
|
||
Since there are three ways to specify a @code{char} type, there are
|
||
also three limits: @code{CHAR_MAX}, @code{SCHAR_MAX}, and
|
||
@code{UCHAR_MAX}.
|
||
|
||
@findex INT_MIN
|
||
For each type that is or might be signed, there is another symbol that
|
||
gives the minimum value it can hold. (Just replace @code{MAX} with
|
||
@code{MIN} in the names listed above.) There is no minimum limit
|
||
symbol for types specified with @code{unsigned} because the
|
||
minimum for them is universally zero.
|
||
|
||
@code{INT_MIN} is not the negative of @code{INT_MAX}. In
|
||
two's-complement representation, the most negative number is 1 less
|
||
than the negative of the most positive number. Thus, @code{INT_MIN}
|
||
on a 32-bit computer has the value @minus{}2,147,483,648. You can't
|
||
actually write the value that way in C, since it would overflow.
|
||
That's a good reason to use @code{INT_MIN} to specify
|
||
that value. Its definition is written to avoid overflow.
|
||
|
||
@include fp.texi
|
||
|
||
@node Compilation
|
||
@chapter Compilation
|
||
@cindex object file
|
||
@cindex compilation module
|
||
@cindex make rules
|
||
@cindex link
|
||
|
||
Early in the manual we explained how to compile a simple C program
|
||
that consists of a single source file (@pxref{Compile Example}).
|
||
However, we handle only short programs that way. A typical C program
|
||
consists of many source files, each of which is usually a separate
|
||
@dfn{compilation module}---meaning that it has to be compiled
|
||
separately. (The source files that are not separate compilation
|
||
modules are those that are used via @code{#include}; see @ref{Header
|
||
Files}.)
|
||
|
||
To compile a multi-module program, you compile each of the program's
|
||
compilation modules, making an @dfn{object file} for that module. The
|
||
last step is to @dfn{link} the many object files together into a
|
||
single executable for the whole program.
|
||
|
||
For the full details of how to compile C programs (and other
|
||
languages' programs) with GCC, see @ref{Top,,, gcc, Using the GNU
|
||
Compiler Collection}. On the Web, all is available through
|
||
@url{https://gcc.gnu.org/onlinedocs/}. Here we give only a simple
|
||
introduction.
|
||
|
||
These commands compile two compilation modules, @file{foo.c} and
|
||
@file{bar.c}, running the compiler for each module:
|
||
|
||
@example
|
||
gcc -c -O -g foo.c
|
||
gcc -c -O -g bar.c
|
||
@end example
|
||
|
||
@noindent
|
||
In these commands, @option{-g} says to generate debugging information,
|
||
@option{-O} says to do some optimization, and @option{-c} says to put
|
||
the compiled code for that module into a corresponding object file and
|
||
go no further. The object file for @file{foo.c} is automatically
|
||
called @file{foo.o}, and so on.
|
||
|
||
If you wish, you can specify the additional compilation options. For
|
||
instance, @option{-Wformat -Wparenthesis -Wstrict-prototypes} request
|
||
additional warnings.
|
||
|
||
@cindex linking object files
|
||
After you compile all the program's modules, you link the object files
|
||
into a combined executable, like this:
|
||
|
||
@example
|
||
gcc -o foo foo.o bar.o
|
||
@end example
|
||
|
||
@noindent
|
||
In this command, @option{-o foo} species the file name for the
|
||
executable file, and the other arguments are the object files to link.
|
||
Always specify the executable file name in a command that generates
|
||
one.
|
||
|
||
One reason to divide a large program into multiple compilation modules
|
||
is to control how each module can access the internals of the others.
|
||
When a module declares a function or variable @code{extern}, other
|
||
modules can access it. The other functions and variables defined in a
|
||
module can't be accessed from outside that module.
|
||
|
||
The other reason for using multiple modules is so that changing one
|
||
source file does not require recompiling all of them in order to try
|
||
the modified program. It is sufficient to recompile the source file
|
||
that you changed, then link them all again. Dividing a large program
|
||
into many substantial modules in this way typically makes
|
||
recompilation much faster.
|
||
|
||
Normally we don't run any of these commands directly. Instead we
|
||
write a set of @dfn{make rules} for the program, then use the
|
||
@command{make} program to recompile only the source files that need to
|
||
be recompiled, by following those rules. @xref{Top, The GNU Make
|
||
Manual, , make, The GNU Make Manual}.
|
||
|
||
@node Directing Compilation
|
||
@chapter Directing Compilation
|
||
|
||
This chapter describes C constructs that don't alter the program's
|
||
meaning @emph{as such}, but rather direct the compiler how to treat
|
||
some aspects of the program.
|
||
|
||
@menu
|
||
* Pragmas:: Controlling compilation of some constructs.
|
||
* Static Assertions:: Compile-time tests for conditions.
|
||
@end menu
|
||
|
||
@node Pragmas
|
||
@section Pragmas
|
||
|
||
A @dfn{pragma} is an annotation in a program that gives direction to
|
||
the compiler.
|
||
|
||
@menu
|
||
* Pragma Basics:: Pragma syntax and usage.
|
||
* Severity Pragmas:: Settings for compile-time pragma output.
|
||
* Optimization Pragmas:: Controlling optimizations.
|
||
@end menu
|
||
|
||
@c See also @ref{Macro Pragmas}, which save and restore macro definitions.
|
||
|
||
@node Pragma Basics
|
||
@subsection Pragma Basics
|
||
|
||
C defines two syntactical forms for pragmas, the line form and the
|
||
token form. You can write any pragma in either form, with the same
|
||
meaning.
|
||
|
||
The line form is a line in the source code, like this:
|
||
|
||
@example
|
||
#pragma @var{line}
|
||
@end example
|
||
|
||
@noindent
|
||
The line pragma has no effect on the parsing of the lines around it.
|
||
This form has the drawback that it can't be generated by a macro expansion.
|
||
|
||
The token form is a series of tokens; it can appear anywhere in the
|
||
program between the other tokens.
|
||
|
||
@example
|
||
_Pragma (@var{stringconstant})
|
||
@end example
|
||
|
||
@noindent
|
||
The pragma has no effect on the syntax of the tokens that surround it;
|
||
thus, here's a pragma in the middle of an @code{if} statement:
|
||
|
||
@example
|
||
if _Pragma ("hello") (x > 1)
|
||
@end example
|
||
|
||
@noindent
|
||
However, that's an unclear thing to do; for the sake of
|
||
understandability, it is better to put a pragma on a line by itself
|
||
and not embedded in the middle of another construct.
|
||
|
||
Both forms of pragma have a textual argument. In a line pragma, the
|
||
text is the rest of the line. The textual argument to @code{_Pragma}
|
||
uses the same syntax as a C string constant: surround the text with
|
||
two @samp{"} characters, and add a backslash before each @samp{"} or
|
||
@samp{\} character in it.
|
||
|
||
With either syntax, the textual argument specifies what to do.
|
||
It begins with one or several words that specify the operation.
|
||
If the compiler does not recognize them, it ignores the pragma.
|
||
|
||
Here are the pragma operations supported in GNU C@.
|
||
|
||
@c ??? Verify font for []
|
||
@table @code
|
||
@item #pragma GCC dependency "@var{file}" [@var{message}]
|
||
@itemx _Pragma ("GCC dependency \"@var{file}\" [@var{message}]")
|
||
Declares that the current source file depends on @var{file}, so GNU C
|
||
compares the file times and gives a warning if @var{file} is newer
|
||
than the current source file.
|
||
|
||
This directive searches for @var{file} the way @code{#include}
|
||
searches for a non-system header file.
|
||
|
||
If @var{message} is given, the warning message includes that text.
|
||
|
||
Examples:
|
||
|
||
@example
|
||
#pragma GCC dependency "parse.y"
|
||
_pragma ("GCC dependency \"/usr/include/time.h\" \
|
||
rerun fixincludes")
|
||
@end example
|
||
|
||
@item #pragma GCC poison @var{identifiers}
|
||
@itemx _Pragma ("GCC poison @var{identifiers}")
|
||
Poisons the identifiers listed in @var{identifiers}.
|
||
|
||
This is useful to make sure all mention of @var{identifiers} has been
|
||
deleted from the program and that no reference to them creeps back in.
|
||
If any of those identifiers appears anywhere in the source after the
|
||
directive, it causes a compilation error. For example,
|
||
|
||
@example
|
||
#pragma GCC poison printf sprintf fprintf
|
||
sprintf(some_string, "hello");
|
||
@end example
|
||
|
||
@noindent
|
||
generates an error.
|
||
|
||
If a poisoned identifier appears as part of the expansion of a macro
|
||
that was defined before the identifier was poisoned, it will @emph{not}
|
||
cause an error. Thus, system headers that define macros that use
|
||
the identifier will not cause errors.
|
||
|
||
For example,
|
||
|
||
@example
|
||
#define strrchr rindex
|
||
_Pragma ("GCC poison rindex")
|
||
strrchr(some_string, 'h');
|
||
@end example
|
||
|
||
@noindent
|
||
does not cause a compilation error.
|
||
|
||
@item #pragma GCC system_header
|
||
@itemx _Pragma ("GCC system_header")
|
||
Specify treating the rest of the current source file as if it came
|
||
from a system header file. @xref{System Headers, System Headers,
|
||
System Headers, gcc, Using the GNU Compiler Collection}.
|
||
|
||
@item #pragma GCC warning @var{message}
|
||
@itemx _Pragma ("GCC warning @var{message}")
|
||
Equivalent to @code{#warning}. Its advantage is that the
|
||
@code{_Pragma} form can be included in a macro definition.
|
||
|
||
@item #pragma GCC error @var{message}
|
||
@itemx _Pragma ("GCC error @var{message}")
|
||
Equivalent to @code{#error}. Its advantage is that the
|
||
@code{_Pragma} form can be included in a macro definition.
|
||
|
||
@item #pragma message @var{message}
|
||
@itemx _Pragma ("message @var{message}")
|
||
Similar to @samp{GCC warning} and @samp{GCC error}, this simply prints an
|
||
informational message, and could be used to include additional warning
|
||
or error text without triggering more warnings or errors. (Note that
|
||
unlike @samp{GCC warning} and @samp{GCC error}, @samp{message} does not include
|
||
@samp{GCC} as part of the pragma.)
|
||
@end table
|
||
|
||
@node Severity Pragmas
|
||
@subsection Severity Pragmas
|
||
|
||
These pragmas control the severity of classes of diagnostics.
|
||
You can specify the class of diagnostic with the GCC option that causes
|
||
those diagnostics to be generated.
|
||
|
||
@table @code
|
||
@item #pragma GCC diagnostic error @var{option}
|
||
@itemx _Pragma ("GCC diagnostic error @var{option}")
|
||
For code following this pragma, treat diagnostics of the variety
|
||
specified by @var{option} as errors. For example:
|
||
|
||
@example
|
||
_Pragma ("GCC diagnostic error -Wformat")
|
||
@end example
|
||
|
||
@noindent
|
||
specifies to treat diagnostics enabled by the @var{-Wformat} option
|
||
as errors rather than warnings.
|
||
|
||
@item #pragma GCC diagnostic warning @var{option}
|
||
@itemx _Pragma ("GCC diagnostic warning @var{option}")
|
||
For code following this pragma, treat diagnostics of the variety
|
||
specified by @var{option} as warnings. This overrides the
|
||
@var{-Werror} option which says to treat warnings as errors.
|
||
|
||
@item #pragma GCC diagnostic ignore @var{option}
|
||
@itemx _Pragma ("GCC diagnostic ignore @var{option}")
|
||
For code following this pragma, refrain from reporting any diagnostics
|
||
of the variety specified by @var{option}.
|
||
|
||
@item #pragma GCC diagnostic push
|
||
@itemx _Pragma ("GCC diagnostic push")
|
||
@itemx #pragma GCC diagnostic pop
|
||
@itemx _Pragma ("GCC diagnostic pop")
|
||
These pragmas maintain a stack of states for severity settings.
|
||
@samp{GCC diagnostic push} saves the current settings on the stack,
|
||
and @samp{GCC diagnostic pop} pops the last stack item and restores
|
||
the current settings from that.
|
||
|
||
@samp{GCC diagnostic pop} when the severity setting stack is empty
|
||
restores the settings to what they were at the start of compilation.
|
||
|
||
Here is an example:
|
||
|
||
@example
|
||
_Pragma ("GCC diagnostic error -Wformat")
|
||
|
||
/* @r{@option{-Wformat} messages treated as errors. } */
|
||
|
||
_Pragma ("GCC diagnostic push")
|
||
_Pragma ("GCC diagnostic warning -Wformat")
|
||
|
||
/* @r{@option{-Wformat} messages treated as warnings. } */
|
||
|
||
_Pragma ("GCC diagnostic push")
|
||
_Pragma ("GCC diagnostic ignored -Wformat")
|
||
|
||
/* @r{@option{-Wformat} messages suppressed. } */
|
||
|
||
_Pragma ("GCC diagnostic pop")
|
||
|
||
/* @r{@option{-Wformat} messages treated as warnings again. } */
|
||
|
||
_Pragma ("GCC diagnostic pop")
|
||
|
||
/* @r{@option{-Wformat} messages treated as errors again. } */
|
||
|
||
/* @r{This is an excess @samp{pop} that matches no @samp{push}. } */
|
||
_Pragma ("GCC diagnostic pop")
|
||
|
||
/* @r{@option{-Wformat} messages treated once again}
|
||
@r{as specified by the GCC command-line options.} */
|
||
@end example
|
||
@end table
|
||
|
||
@node Optimization Pragmas
|
||
@subsection Optimization Pragmas
|
||
|
||
These pragmas enable a particular optimization for specific function
|
||
definitions. The settings take effect at the end of a function
|
||
definition, so the clean place to use these pragmas is between
|
||
function definitions.
|
||
|
||
@table @code
|
||
@item #pragma GCC optimize @var{optimization}
|
||
@itemx _Pragma ("GCC optimize @var{optimization}")
|
||
These pragmas enable the optimization @var{optimization} for the
|
||
following functions. For example,
|
||
|
||
@example
|
||
_Pragma ("GCC optimize -fforward-propagate")
|
||
@end example
|
||
|
||
@noindent
|
||
says to apply the @samp{forward-propagate} optimization to all
|
||
following function definitions. Specifying optimizations for
|
||
individual functions, rather than for the entire program, is rare but
|
||
can be useful for getting around a bug in the compiler.
|
||
|
||
If @var{optimization} does not correspond to a defined optimization
|
||
option, the pragma is erroneous. To turn off an optimization, use the
|
||
corresponding @samp{-fno-} option, such as
|
||
@samp{-fno-forward-propagate}.
|
||
|
||
@item #pragma GCC target @var{optimizations}
|
||
@itemx _Pragma ("GCC target @var{optimizations}")
|
||
The pragma @samp{GCC target} is similar to @samp{GCC optimize} but is
|
||
used for platform-specific optimizations. Thus,
|
||
|
||
@example
|
||
_Pragma ("GCC target popcnt")
|
||
@end example
|
||
|
||
@noindent
|
||
activates the optimization @samp{popcnt} for all
|
||
following function definitions. This optimization is supported
|
||
on a few common targets but not on others.
|
||
|
||
@item #pragma GCC push_options
|
||
@itemx _Pragma ("GCC push_options")
|
||
The @samp{push_options} pragma saves on a stack the current settings
|
||
specified with the @samp{target} and @samp{optimize} pragmas.
|
||
|
||
@item #pragma GCC pop_options
|
||
@itemx _Pragma ("GCC pop_options")
|
||
The @samp{pop_options} pragma pops saved settings from that stack.
|
||
|
||
Here's an example of using this stack.
|
||
|
||
@example
|
||
_Pragma ("GCC push_options")
|
||
_Pragma ("GCC optimize forward-propagate")
|
||
|
||
/* @r{Functions to compile}
|
||
@r{with the @code{forward-propagate} optimization.} */
|
||
|
||
_Pragma ("GCC pop_options")
|
||
/* @r{Ends enablement of @code{forward-propagate}.} */
|
||
@end example
|
||
|
||
@item #pragma GCC reset_options
|
||
@itemx _Pragma ("GCC reset_options")
|
||
Clears all pragma-defined @samp{target} and @samp{optimize}
|
||
optimization settings.
|
||
@end table
|
||
|
||
@node Static Assertions
|
||
@section Static Assertions
|
||
@cindex static assertions
|
||
@findex _Static_assert
|
||
|
||
You can add compiler-time tests for necessary conditions into your
|
||
code using @code{_Static_assert}. This can be useful, for example, to
|
||
check that the compilation target platform supports the type sizes
|
||
that the code expects. For example,
|
||
|
||
@example
|
||
_Static_assert ((sizeof (long int) >= 8),
|
||
"long int needs to be at least 8 bytes");
|
||
@end example
|
||
|
||
@noindent
|
||
reports a compile-time error if compiled on a system with long
|
||
integers smaller than 8 bytes, with @samp{long int needs to be at
|
||
least 8 bytes} as the error message.
|
||
|
||
Since calls @code{_Static_assert} are processed at compile time, the
|
||
expression must be computable at compile time and the error message
|
||
must be a literal string. The expression can refer to the sizes of
|
||
variables, but can't refer to their values. For example, the
|
||
following static assertion is invalid for two reasons:
|
||
|
||
@example
|
||
char *error_message
|
||
= "long int needs to be at least 8 bytes";
|
||
int size_of_long_int = sizeof (long int);
|
||
|
||
_Static_assert (size_of_long_int == 8, error_message);
|
||
@end example
|
||
|
||
@noindent
|
||
The expression @code{size_of_long_int == 8} isn't computable at
|
||
compile time, and the error message isn't a literal string.
|
||
|
||
You can, though, use preprocessor definition values with
|
||
@code{_Static_assert}:
|
||
|
||
@example
|
||
#define LONG_INT_ERROR_MESSAGE "long int needs to be \
|
||
at least 8 bytes"
|
||
|
||
_Static_assert ((sizeof (long int) == 8),
|
||
LONG_INT_ERROR_MESSAGE);
|
||
@end example
|
||
|
||
Static assertions are permitted wherever a statement or declaration is
|
||
permitted, including at top level in the file, and also inside the
|
||
definition of a type.
|
||
|
||
@strong{Note:} The @code{==} used instead of @code{>=} probably makes
|
||
the program wrong but not invalid.
|
||
|
||
@example
|
||
union y
|
||
@{
|
||
int i;
|
||
int *ptr;
|
||
_Static_assert (sizeof (int *) == sizeof (int),
|
||
"Pointer and int not same size");
|
||
@};
|
||
@end example
|
||
|
||
@node Type Alignment
|
||
@appendix Type Alignment
|
||
@cindex type alignment
|
||
@cindex alignment of type
|
||
@findex _Alignof
|
||
@findex __alignof__
|
||
|
||
Code for device drivers and other communication with low-level
|
||
hardware sometimes needs to be concerned with the alignment of
|
||
data objects in memory.
|
||
|
||
Each data type has a required @dfn{alignment}, always a power of 2,
|
||
that says at which memory addresses an object of that type can validly
|
||
start. A valid address for the type must be a multiple of its
|
||
alignment. If a type's alignment is 1, that means it can validly
|
||
start at any address. If a type's alignment is 2, that means it can
|
||
only start at an even address. If a type's alignment is 4, that means
|
||
it can only start at an address that is a multiple of 4.
|
||
|
||
The alignment of a type (except @code{char}) can vary depending on the
|
||
kind of computer in use. To refer to the alignment of a type in a C
|
||
program, use @code{_Alignof}, whose syntax parallels that of
|
||
@code{sizeof}. Like @code{sizeof}, @code{_Alignof} is a compile-time
|
||
operation, and it doesn't compute the value of the expression used
|
||
as its argument.
|
||
|
||
Nominally, each integer and floating-point type has an alignment equal to
|
||
the largest power of 2 that divides its size. Thus, @code{int} with
|
||
size 4 has a nominal alignment of 4, and @code{long long int} with
|
||
size 8 has a nominal alignment of 8.
|
||
|
||
However, each kind of computer generally has a maximum alignment, and
|
||
no type needs more alignment than that. If the computer's maximum
|
||
alignment is 4 (which is common), then no type's alignment is more
|
||
than 4.
|
||
|
||
The size of any type is always a multiple of its alignment; that way,
|
||
in an array whose elements have that type, all the elements are
|
||
properly aligned if the first one is.
|
||
|
||
These rules apply to all real computers today, but some embedded
|
||
controllers have odd exceptions. We don't have references to cite for
|
||
them.
|
||
@c We can't cite a nonfree manual as documentation.
|
||
|
||
Ordinary C code guarantees that every object of a given type is in
|
||
fact aligned as that type requires.
|
||
|
||
If the operand of @code{_Alignof} is a structure field, the value
|
||
is the alignment it requires. It may have a greater alignment by
|
||
coincidence, due to the other fields, but @code{_Alignof} is not
|
||
concerned about that. @xref{Structures}.
|
||
|
||
Older versions of GNU C used the keyword @code{__alignof__} for this,
|
||
but now that the feature has been standardized, it is better
|
||
to use the standard keyword @code{_Alignof}.
|
||
|
||
@findex _Alignas
|
||
@findex __aligned__
|
||
You can explicitly specify an alignment requirement for a particular
|
||
variable or structure field by adding @code{_Alignas
|
||
(@var{alignment})} to the declaration, where @var{alignment} is a
|
||
power of 2 or a type name. For instance:
|
||
|
||
@example
|
||
char _Alignas (8) x;
|
||
@end example
|
||
|
||
@noindent
|
||
or
|
||
|
||
@example
|
||
char _Alignas (double) x;
|
||
@end example
|
||
|
||
@noindent
|
||
specifies that @code{x} must start on an address that is a multiple of
|
||
8. However, if @var{alignment} exceeds the maximum alignment for the
|
||
machine, that maximum is how much alignment @code{x} will get.
|
||
|
||
The older GNU C syntax for this feature looked like
|
||
@code{__attribute__ ((__aligned__ (@var{alignment})))} to the
|
||
declaration, and was added after the variable. For instance:
|
||
|
||
@example
|
||
char x __attribute__ ((__aligned__ 8));
|
||
@end example
|
||
|
||
@xref{Attributes}.
|
||
|
||
@node Aliasing
|
||
@appendix Aliasing
|
||
@cindex aliasing (of storage)
|
||
@cindex pointer type conversion
|
||
@cindex type conversion, pointer
|
||
|
||
We have already presented examples of casting a @code{void *} pointer
|
||
to another pointer type, and casting another pointer type to
|
||
@code{void *}.
|
||
|
||
One common kind of pointer cast is guaranteed safe: casting the value
|
||
returned by @code{malloc} and related functions (@pxref{Dynamic Memory
|
||
Allocation}). It is safe because these functions do not save the
|
||
pointer anywhere else; the only way the program will access the newly
|
||
allocated memory is via the pointer just returned.
|
||
|
||
In fact, C allows casting any pointer type to any other pointer type.
|
||
Using this to access the same place in memory using two
|
||
different data types is called @dfn{aliasing}.
|
||
|
||
Aliasing is necessary in some programs that do sophisticated memory
|
||
management, such as GNU Emacs, but most C programs don't need to do
|
||
aliasing. When it isn't needed, @strong{stay away from it!} To do
|
||
aliasing correctly requires following the rules stated below.
|
||
Otherwise, the aliasing may result in malfunctions when the program
|
||
runs.
|
||
|
||
The rest of this appendix explains the pitfalls and rules of aliasing.
|
||
|
||
@menu
|
||
* Aliasing Alignment:: Memory alignment considerations for
|
||
casting between pointer types.
|
||
* Aliasing Length:: Type size considerations for
|
||
casting between pointer types.
|
||
* Aliasing Type Rules:: Even when type alignment and size matches,
|
||
aliasing can still have surprising results.
|
||
|
||
@end menu
|
||
|
||
@node Aliasing Alignment
|
||
@appendixsection Aliasing and Alignment
|
||
|
||
In order for a type-converted pointer to be valid, it must have the
|
||
alignment that the new pointer type requires. For instance, on most
|
||
computers, @code{int} has alignment 4; the address of an @code{int}
|
||
must be a multiple of 4. However, @code{char} has alignment 1, so the
|
||
address of a @code{char} is usually not a multiple of 4. Taking the
|
||
address of such a @code{char} and casting it to @code{int *} probably
|
||
results in an invalid pointer. Trying to dereference it may cause a
|
||
@code{SIGBUS} signal, depending on the platform in use (@pxref{Signals}).
|
||
|
||
@example
|
||
foo ()
|
||
@{
|
||
char i[4];
|
||
int *p = (int *) &i[1]; /* @r{Misaligned pointer!} */
|
||
return *p; /* @r{Crash!} */
|
||
@}
|
||
@end example
|
||
|
||
This requirement is never a problem when casting the return value
|
||
of @code{malloc} because that function always returns a pointer
|
||
with as much alignment as any type can require.
|
||
|
||
@node Aliasing Length
|
||
@appendixsection Aliasing and Length
|
||
|
||
When converting a pointer to a different pointer type, make sure the
|
||
object it really points to is at least as long as the target of the
|
||
converted pointer. For instance, suppose @code{p} has type @code{int
|
||
*} and it's cast as follows:
|
||
|
||
@example
|
||
int *p;
|
||
|
||
struct
|
||
@{
|
||
double d, e, f;
|
||
@} foo;
|
||
|
||
struct foo *q = (struct foo *)p;
|
||
|
||
q->f = 5.14159;
|
||
@end example
|
||
|
||
@noindent
|
||
the value @code{q->f} will run past the end of the @code{int} that
|
||
@code{p} points to. If @code{p} was initialized to the start of an
|
||
array of type @code{int[6]}, the object is long enough for three
|
||
@code{double}s. But if @code{p} points to something shorter,
|
||
@code{q->f} will run on beyond the end of that, overlaying some other
|
||
data. Storing that will garble that other data. Or it could extend
|
||
past the end of memory space and cause a @code{SIGSEGV} signal
|
||
(@pxref{Signals}).
|
||
|
||
@node Aliasing Type Rules
|
||
@appendixsection Type Rules for Aliasing
|
||
|
||
C code that converts a pointer to a different pointer type can use the
|
||
pointers to access the same memory locations with two different data
|
||
types. If the same address is accessed with different types in a
|
||
single control thread, optimization can make the code do surprising
|
||
things (in effect, make it malfunction).
|
||
|
||
Here's a concrete example where aliasing that can change the code's
|
||
behavior when it is optimized. We assume that @code{float} is 4 bytes
|
||
long, like @code{int}, and so is every pointer. Thus, the structures
|
||
@code{struct a} and @code{struct b} are both 8 bytes.
|
||
|
||
@example
|
||
#include <stdio.h>
|
||
struct a @{ int size; char *data; @};
|
||
struct b @{ float size; char *data; @};
|
||
|
||
void sub (struct a *p, struct b *q)
|
||
@{
|
||
 int x;
|
||
 p->size = 0;
|
||
 q->size = 1;
|
||
 x = p->size;
|
||
 printf("x       =%d\n", x);
|
||
 printf("p->size =%d\n", (int)p->size);
|
||
 printf("q->size =%d\n", (int)q->size);
|
||
@}
|
||
|
||
int main(void)
|
||
@{
|
||
 struct a foo;
|
||
 struct a *p = &foo;
|
||
 struct b *q = (struct b *) &foo;
|
||
|
||
 sub (p, q);
|
||
@}
|
||
@end example
|
||
|
||
This code works as intended when compiled without optimization. All
|
||
the operations are carried out sequentially as written. The code
|
||
sets @code{x} to @code{p->size}, but what it actually gets is the
|
||
bits of the floating point number 1, as type @code{int}.
|
||
|
||
However, when optimizing, the compiler is allowed to assume
|
||
(mistakenly, here) that @code{q} does not point to the same storage as
|
||
@code{p}, because their data types are not allowed to alias.
|
||
|
||
From this assumption, the compiler can deduce (falsely, here) that the
|
||
assignment into @code{q->size} has no effect on the value of
|
||
@code{p->size}, which must therefore still be 0. Thus, @code{x} will
|
||
be set to 0.
|
||
|
||
GNU C, following the C standard, @emph{defines} this optimization as
|
||
legitimate. Code that misbehaves when optimized following these rules
|
||
is, by definition, incorrect C code.
|
||
|
||
The rules for storage aliasing in C are based on the two data types:
|
||
the type of the object, and the type it is accessed through. The
|
||
rules permit accessing part of a storage object of type @var{t} using
|
||
only these types:
|
||
|
||
@itemize @bullet
|
||
@item
|
||
@var{t}.
|
||
|
||
@item
|
||
A type compatible with @var{t}. @xref{Compatible Types}.
|
||
|
||
@item
|
||
A signed or unsigned version of one of the above.
|
||
|
||
@item
|
||
A qualified version of one of the above.
|
||
@xref{Type Qualifiers}.
|
||
|
||
@item
|
||
An array, structure (@pxref{Structures}), or union type
|
||
(@code{Unions}) that contains one of the above, either directly as a
|
||
field or through multiple levels of fields. If @var{t} is
|
||
@code{double}, this would include @code{struct s @{ union @{ double
|
||
d[2]; int i[4]; @} u; int i; @};} because there's a @code{double}
|
||
inside it somewhere.
|
||
@c For structures, shouldn't it be the first field?
|
||
|
||
@item
|
||
A character type.
|
||
@end itemize
|
||
|
||
What do these rules say about the example in this subsection?
|
||
|
||
For @code{foo.size} (equivalently, @code{a->size}), @var{t} is
|
||
@code{int}. The type @code{float} is not allowed as an aliasing type
|
||
by those rules, so @code{b->size} is not supposed to alias with
|
||
elements of @code{a}. Based on that assumption, GNU C makes a
|
||
permitted optimization that was not, in this case, consistent with
|
||
what the programmer intended the program to do.
|
||
|
||
Whether GCC actually performs type-based aliasing analysis depends on
|
||
the details of the code. GCC has other ways to determine (in some cases)
|
||
whether objects alias, and if it gets a reliable answer that way, it won't
|
||
fall back on type-based heuristics.
|
||
|
||
@c @opindex -fno-strict-aliasing
|
||
The importance of knowing the type-based aliasing rules is not so as
|
||
to ensure that the optimization is done where it would be safe, but so
|
||
as to ensure it is @emph{not} done in a way that would break the
|
||
program. You can turn off type-based aliasing analysis by giving GCC
|
||
the option @option{-fno-strict-aliasing}.
|
||
|
||
@node Digraphs
|
||
@appendix Digraphs
|
||
@cindex digraphs
|
||
|
||
C accepts aliases for certain characters. Apparently in the 1990s
|
||
some computer systems had trouble inputting these characters, or
|
||
trouble displaying them. These digraphs almost never appear in C
|
||
programs nowadays, but we mention them for completeness.
|
||
|
||
@table @samp
|
||
@item <:
|
||
An alias for @samp{[}.
|
||
@item :>
|
||
An alias for @samp{]}.
|
||
@item <%
|
||
An alias for @samp{@{}.
|
||
@item %>
|
||
An alias for @samp{@}}.
|
||
@item %:
|
||
An alias for @samp{#},
|
||
used for preprocessing directives (@pxref{Directives}) and
|
||
macros (@pxref{Macros}).
|
||
@end table
|
||
|
||
@node Attributes
|
||
@appendix Attributes in Declarations
|
||
@cindex attributes
|
||
@findex __attribute__
|
||
|
||
You can specify certain additional requirements in a declaration, to
|
||
get fine-grained control over code generation, and helpful
|
||
informational messages during compilation. We use a few attributes in
|
||
code examples throughout this manual, including
|
||
|
||
@table @code
|
||
@item aligned
|
||
The @code{aligned} attribute specifies a minimum alignment for a
|
||
variable or structure field, measured in bytes:
|
||
|
||
@example
|
||
int foo __attribute__ ((aligned (8))) = 0;
|
||
@end example
|
||
|
||
@noindent
|
||
This directs GNU C to allocate @code{foo} at an address that is a
|
||
multiple of 8 bytes. However, you can't force an alignment bigger
|
||
than the computer's maximum meaningful alignment.
|
||
|
||
@item packed
|
||
The @code{packed} attribute specifies to compact the fields of a
|
||
structure by not leaving gaps between fields. For example,
|
||
|
||
@example
|
||
struct __attribute__ ((packed)) bar
|
||
@{
|
||
char a;
|
||
int b;
|
||
@};
|
||
@end example
|
||
|
||
@noindent
|
||
allocates the integer field @code{b} at byte 1 in the structure,
|
||
immediately after the character field @code{a}. The packed structure
|
||
is just 5 bytes long (assuming @code{int} is 4 bytes) and its
|
||
alignment is 1, that of @code{char}.
|
||
|
||
@item deprecated
|
||
Applicable to both variables and functions, the @code{deprecated}
|
||
attribute tells the compiler to issue a warning if the variable or
|
||
function is ever used in the source file.
|
||
|
||
@example
|
||
int old_foo __attribute__ ((deprecated));
|
||
|
||
int old_quux () __attribute__ ((deprecated));
|
||
@end example
|
||
|
||
@item __noinline__
|
||
The @code{__noinline__} attribute, in a function's declaration or
|
||
definition, specifies never to inline calls to that function. All
|
||
calls to that function, in a compilation unit where it has this
|
||
attribute, will be compiled to invoke the separately compiled
|
||
function. @xref{Inline Function Definitions}.
|
||
|
||
@item __noclone__
|
||
The @code{__noclone__} attribute, in a function's declaration or
|
||
definition, specifies never to clone that function. Thus, there will
|
||
be only one compiled version of the function. @xref{Label Value
|
||
Caveats}, for more information about cloning.
|
||
|
||
@item always_inline
|
||
The @code{always_inline} attribute, in a function's declaration or
|
||
definition, specifies to inline all calls to that function (unless
|
||
something about the function makes inlining impossible). This applies
|
||
to all calls to that function in a compilation unit where it has this
|
||
attribute. @xref{Inline Function Definitions}.
|
||
|
||
@item gnu_inline
|
||
The @code{gnu_inline} attribute, in a function's declaration or
|
||
definition, specifies to handle the @code{inline} keyword the way GNU
|
||
C originally implemented it, many years before ISO C said anything
|
||
about inlining. @xref{Inline Function Definitions}.
|
||
@end table
|
||
|
||
For full documentation of attributes, see the GCC manual.
|
||
@xref{Attribute Syntax, Attribute Syntax, System Headers, gcc, Using
|
||
the GNU Compiler Collection}.
|
||
|
||
@node Signals
|
||
@appendix Signals
|
||
@cindex signal
|
||
@cindex handler (for signal)
|
||
@cindex @code{SIGSEGV}
|
||
@cindex @code{SIGFPE}
|
||
@cindex @code{SIGBUS}
|
||
|
||
Some program operations bring about an error condition called a
|
||
@dfn{signal}. These signals terminate the program, by default.
|
||
|
||
There are various different kinds of signals, each with a name. We
|
||
have seen several such error conditions through this manual:
|
||
|
||
@table @code
|
||
@item SIGSEGV
|
||
This signal is generated when a program tries to read or write outside
|
||
the memory that is allocated for it, or to write memory that can only
|
||
be read. The name is an abbreviation for ``segmentation violation''.
|
||
|
||
@item SIGFPE
|
||
This signal indicates a fatal arithmetic error. The name is an
|
||
abbreviation for ``floating-point exception'', but covers all types of
|
||
arithmetic errors, including division by zero and overflow.
|
||
|
||
@item SIGBUS
|
||
This signal is generated when an invalid pointer is dereferenced,
|
||
typically the result of dereferencing an uninitialized pointer. It is
|
||
similar to @code{SIGSEGV}, except that @code{SIGSEGV} indicates
|
||
invalid access to valid memory, while @code{SIGBUS} indicates an
|
||
attempt to access an invalid address.
|
||
@end table
|
||
|
||
These kinds of signal allow the program to specify a function as a
|
||
@dfn{signal handler}. When a signal has a handler, it doesn't
|
||
terminate the program; instead it calls the handler.
|
||
|
||
There are many other kinds of signal; here we list only those that
|
||
come from run-time errors in C operations. The rest have to do with
|
||
the functioning of the operating system. The GNU C Library Reference
|
||
Manual gives more explanation about signals (@pxref{Program Signal
|
||
Handling, The GNU C Library, , libc, The GNU C Library Reference
|
||
Manual}).
|
||
|
||
@node GNU Free Documentation License
|
||
@appendix GNU Free Documentation License
|
||
|
||
@include fdl.texi
|
||
|
||
@node GNU General Public License
|
||
@appendix GNU General Public License
|
||
|
||
@include gpl.texi
|
||
|
||
@node Symbol Index
|
||
@unnumbered Index of Symbols and Keywords
|
||
|
||
@printindex fn
|
||
|
||
@node Concept Index
|
||
@unnumbered Concept Index
|
||
|
||
@printindex cp
|
||
|
||
@bye
|