List: New Yorker GNU Linux Scene
Admin: To unsubscribe send unsubscribe name-at-domian.com in the body to hangout-request-at-www2.mrbrklyn.com
X-Keywords:
X-UID: 28373
Status: RO
Content-Length: 25319
Lines: 501
Introduction to C on GNU/Linux
When working with GNU based Free
Operating systems, most implementations are based on the original
work of earlier UNIX systems. Unix was originaly developed out of
AT&T. One of the key developments which made Unix an early
sucesses was the co-development of the C programing language. This
article hopes to introduce C to beginning users of GNU/Linux and BSD.
When C was developed, part of it's
goal was to make a highly portable syntax which still gave low level
access to the memory and the CPU. The result was a 3 tier development
system. All C programs are compiled from some text into a binary
program. It is the binarywhich runs on your computer. The program
which creates the binary is called a compiler. The compiler on GNU
systems is Richard Stalman's gcc. The compiler parses a text file, or
a series of text files, processes all the instructions, and builds a
binary program from the instructions. It does this by compinary
binary code. Some of the code it produces is imported from external
libraries. Some of it is new binary code. Pathced together thismost
often produces a single binary application.
The three tier system of C includes
libraries, source code and header files. Header files tell the
compiler where to find code definitions. It is also sometimes needed
to tell the compiler where libraries live which are defined in the
source code or the headers. While this can seem confusing, as you
become familiar with C, it will become more natural. Let's look at a
simple example to see how these three tiers interact with each other.
We start by opening a simple text
file called prog1.c with the VI editor:
#include
int main(int argc, char **argv){
printf("Welcome to NYLXS\n");
return(1);
exit(1);
}
Exit the file and now run the compiler
with the following command: ruben$: gcc prog1.c
This command starts the compiler
and creates a new file called a.out. a.out is the executable program.
Run it from the command line:
ruben$: ./a.out Welcome to NYLXS
ruben$:
We can now examine all three
components of our C program.
The fisrt line in our program tells
the compiler to look for a file called stdio.h and to bring it into
our program. stdio.h is the main C in and out library header file. It
defines many function in C, including the printf function. Without
this file our compiler can not find printf. After this line we are
now dropd into our original code. In our case we begin with the
definition of the main subroutine. All C progams have a subroutine
'main'. Main has a defined prototype
int main (int argc, char* argv[]);
This should never change. Main is
the launcher of all activity within your C program. Lastly, our
compiler accessed libraries on the system in order to build your
binary. Despite the fact that our command to gcc did not explicitely
introduce any librarirs, our C program was built from them anyway.
Sometimes the compiler needs libraries it can not natively find.
Under these conditions our gcc command needs an option to tell it
where to find a library. For example, if we need to use an advanced
math function, we need to tell gcc to link with the math library like
this:
gcc -lmath program1.c
Let's examine the nature of C more
closely by looking at a slightly more complex program:
#include <stdio.h>
#include <string.h>
char name[255] = {'\0'};
int main(int argc, char **argv){
printf("Welcome to NYLXS\n");
printf("Enter your name-->\n");
fgets(name, sizeof(name), stdin);
while(strcmp("\n", name) != 0){
printf("value ->%s size->%d\n", name, sizeof(name));
fgets(name, sizeof(name), stdin);
}
return(1);
}
This program includes two external
header files to define library functions. The first one we saw
before, stdio.h. The second include file, strings.h defines the
standard C library for strings. The function strcmp is used to test
each string we recieve from standard input.
Before we declare main(), we define
and initialize a symbol called 'name'. C is a strongly typed
language. Every variable in C needs to be pre-declared as one
which stores a particular kind of data. If we try to assign to the
variable data which is diferent that it's predefined type, the gcc
compiler will complain and probibly not create a binary file.
In this case, the symbol 'name' is
marketed as a variable of type char. The words int, char,
double, float are examples for key words in C which define data
types. In our editor they are marked in green. char name means that
this variable is marked as a character type variable. It stores only
carachters. In the example of 'name' the declaration also declares
this variable as an array. An array is a group of data accessable
through an index.
Let's' look at this line more
closely
char
name[255] = {'\0'};
name
is declared as a char data type through the keyword char
name
is declared as an array because of the the square bracket to
the right of the symbol.
name
is declared as an array with 255 chars because of the number in the
square brackets in the declaration. Different data types are stored
in different sized memory locations. Charactors are universally
defined as being of 1 byte or 8 bits. By declaring name to be an
array of 255 charactors in length, we essentially tell the computer
to please allocate a space in memory with 255 bytes. We will look at
this closer in a minute.
When
we declare the array, we can fill it with data. This is done through
the Curly Braces {}
The
array is initially filled with the 'zero' byte: 00000000. We do this
by initializing the array with a String Contant null
'\0'.
String
constant are defined using single quotes. The \0 is a special
character which means 00000000
When
we initialize the array with less entries than all the array
elements, then C fills the rest of the array with null characters.
It
is not necessary to initialize an array in a declaration. It is
usually necessary to define the size of the array when you
declare it with a few exceptions as will be noted.
One
such exception to the above rule would be if we initialize the array
and declare it together like this:
int
numbers[]={1,2,3,4,5,6,7,8};
In this case, the array is
declared with 8 elements, even without the number in the square
bracket.
The next line is where we define
out main function. As we said before, all C programs require a main
function. Main is the jumping off point for all C programs. However,
in most regards, main looks like any other function in C. Let's look
more closely at the main declaration:
int
main(int argc, char
**argv){
The int in green before the symbol
'main', tells C that main is returning an integer. In fact, this
integer is returned to the shell when you run a program on the
command line. You can check it's value after your program is finished
by entering: echo $? on the command line of a bash shell.
All functions are defined with a
symbol(). The paranthesis tells C this symbol is a funcion, just as
the sqaure brackets tells C a symbol is an array. Within the
paranthesis we put parameters which are expected to be passed to our
function. Unlike other languages, such are Perl, the parameters
defined in our function must be used when these functions are used.
In the case of main, the funcion is used by the operating system of
shell and our two arguments (argv and argc) are automatically filled
by the Operations or shell when the program is called.
argc is represents the number of
arguements which are called with the program. argv is the arguements
themselves, represented as arrays of chars. Hence, argc is
declared as an int data type and argv is a char data type.
Inside of main, our program begins
to work. Our program not processes these lines from top to bottom in
order. The first line prints the greating, "Welcome to NYLXS"
and adds a line feed. The \n is a special character, in some ways
like \0 combination which means add a line feed and start at the new
line. We will look at the printf function in more detail later. The
next line prints to standard out a prompt for user input: "Enter
your name-->". The next line retrieves information from the
Standard Input Device, most often a keyboard, and stores that
information into the array of characters which we asked to be
previously allocated with the symbol 'name'. We can store up to 255
characters into our array.
Let's look at the fgets function.
Like most C functions, fgets is documented in the man page of your
Gnu/Linux system. Let's look at the manual page:
ruben$: man fgets
GETS(3) Linux Programmer's Manual GETS(3)
NAME
fgetc, fgets, getc, getchar, gets, ungetc - input of char
acters and strings
SYNOPSIS
#include <stdio.h>
int fgetc(FILE *stream);
char *fgets(char *s, int size, FILE *stream);
int getc(FILE *stream);
int getchar(void);
char *gets(char *s);
int ungetc(int c, FILE *stream);
DESCRIPTION
fgetc() reads the next character from stream and returns
it as an unsigned char cast to an int, or EOF on end of
file or error.
getc() is equivalent to fgetc() except that it may be
implemented as a macro which evaluates stream more than
once.
getchar() is equivalent to getc(stdin).
gets() reads a line from stdin into the buffer pointed to
by s until either a terminating newline or EOF, which it
replaces with '\0'. No check for buffer overrun is per
formed (see BUGS below).
fgets() reads in at most one less than size characters
from stream and stores them into the buffer pointed to by
s. Reading stops after an EOF or a newline. If a newline
is read, it is stored into the buffer. A '\0' is stored
after the last character in the buffer.
ungetc() pushes c back to stream, cast to unsigned char,
where it is available for subsequent read operations.
Pushed - back characters will be returned in reverse
order; only one pushback is guaranteed.
Calls to the functions described here can be mixed with
each other and with calls to other input functions from
the stdio library for the same input stream.
For non-locking counterparts, see unlocked_stdio(3).
RETURN VALUE
fgetc(), getc() and getchar() return the character read as
an unsigned char cast to an int or EOF on end of file or
error.
gets() and fgets() return s on success, and NULL on error
or when end of file occurs while no characters have been
read.
ungetc() returns c on success, or EOF on error.
CONFORMING TO
ANSI - C, POSIX.1
BUGS
Never use gets(). Because it is impossible to tell with
out knowing the data in advance how many characters gets()
will read, and because gets() will continue to store char
acters past the end of the buffer, it is extremely danger
ous to use. It has been used to break computer security.
Use fgets() instead.
It is not advisable to mix calls to input functions from
the stdio library with low - level calls to read() for the
file descriptor associated with the input stream; the
results will be undefined and very probably not what you
want.
SEE ALSO
read(2), write(2), ferror(3), fopen(3), fread(3),
fseek(3), puts(3), scanf(3), unlocked_stdio(3)
The man page tells us several
important thing about this function and it's use in C. All functions
(in all programming languages) represent a process. A process has 3
components: input, output and side effect.

Diagran
of a process
The inputs of functions are the
parameters. The output is the return value which for main is an int.
The side effects is all the work the program does which is not it's
return value.
From the man page, we can see that
fgets is one of a group of C functions which include gets, getc and
others. In addition, the man page tells us that fgets is in the stdio
library. It tells us to include put #include <stdio.h> into our
code to gain access to the function. It defines the function for us
as follows:
char *fgets(char *s, int size, FILE
*stream);
fgets takes 3 parameters for input.
A pointer to character data, an integer, and a file stream. Let's
look at all thre definitions:
char
*s: A pointer to character data: A pointer is a symbol which has at
it's value a memory address as a value. In this case, the memory
adress has to be an allocated area in memory which is typed as a
char set of data. In our example, we have a char array called
'name'. With arrays, C will convert the symbol of an array to a
pointer of the address where the array is located. C does this for
us automatically. This is a specific property of arrays and can not
be depended upon to happe with other kinds of data constructions
unless specified in the C programming specification.
int
size: An integer which represents a SIZE_T data type. SIZE_T is a
special data type in C which is used to store and describe the size
of data constructions in our programs.
FILE
*stream: File streams are pointers to devices and or other
programming constructions which provide a stream of data into and
out of our program. All programs in Unix inherit three streams:
Standard
In (stdin) - usualy the keyboard
Standard
Out (std) most normaly the screen
stderr(stderr) - another
output most normaly to the screen, but in this case, it is used
only for error messages and the like
Because C has strict data typing, a
function definition is very clear and specific about the use of a
function. Other information which is described in the man page mostly
concerns the side effect of the function. In the case of fgets we are
told it reads in at most one less than size characters from stream
and stores them into the buffer pointed to by s. In addition, we are
told it stops read when it recieves an End of File marker (EOF) or a
new line (line feed) charactor. We are told the line feed character
is added to the buffer, and then fgets adds an additional character
'\0'.
Let's now see how we used it in our
program:
fgets(name, sizeof(name),
stdin);
We call fgets with the parameter
'name' which is the symbol which defines our array of chars. It
automatically convert for us to a pointer to a char data
construction, or our array of chars. The second argument is
sizeof(name). sizeof is a marco in C (similar to a fucntion) which
returns the size of a state construction. In this case, that data
construction is name which is of size 255 (which means it has 255
bytes). The third parameter is stdin. stdin is the default symbol for
our Standard Input File Stream pointer. We inherit it from the
environment.
Finally, you might notice that we
disregard the return value of fgets. Since the function stores the
input into 'name', we can do this. However it is often prudent to
test the return value of a function to assure that it worked
properly. If fgets returns a 'NULL', it would mean that our program
encountered a problem in its environment.
It is ciritical that fgets can not
try to put more characters into our array than is allocated for it in
memory. If we did that, we can create a security problem, and invade
the memory of other programs in our syste. This is bad. Therefor, we
limit the input ability of fgets by the size of our array. This is
good and proper programming practice which you must adopt.
The next section of our program
introduces looping and flow control. Much of our time programming
involves working on conditional actions (do this if you hear a click)
or loops (do this over and over until the the user says uncle). The
while key word in C creates a conditional loop. The expression
inside the paranthesis is tested. If it returns a possitive integer,
or a non-null character, it enter the loop. The actions within the
loop are inside the curly braces. When the last action within the
braces is evaluated, then it returns to the top and tests the
expression in the paranthesis again.
Inside the paranthesis of our while
loop, we call a function called strcmp (do a man strcmp
now). strcmp looks at two strings and compares them. It then returns
a positive number, a negitive number or a 0 (zero) depending upon if
the first string is great than, less than or equal to the second.
Characters in a string are
reprented by integer numbers which are one byte in size. Since their
is eight bits in a byte, at most, you can represent 256 characters in
a char. There is a standard integer which respesents each key on the
keyboard. This standard association of characters to byte integers is
called the ASCII standard table. In this table, the letter A is 65
and Z is 90. All the rest of the capital letters fall inbetween in
order. The letter 'a' is 97, and 'z' is 122. Again, all the lower
case letters fall inbetween in order. In this manner, strcmp can
compare the strings by their ASCII representation. It is important to
note at this point that there is a very tight relationship between
short integers (integers stored in a single byte) and characters in
C. It should be also noted that strcmp reads the arrays of chars
until it reaches a '\0' (nul) character. Anything stored after the
nul is ignored.
Our program checks if our input
buffer (name) is equal to "\n". "\n" is a string
constant. All string constants add a '\0' to the end of their
allocated array. So the comaprision is actually to '\n\0'. Since
fgets adds the null to the end of the string, everyone is happy with
this comparision.
Our program now repeats all the
steps in the curly braces until the user enters "\n" into
the keyboard on an empty line.

This sample program and the
explanation is a good introduction to C for a beginner. But their is
far more to learn, even for a beginner, which I hope to explore in
the coming months of the NYLXS journal. In the meantime, I challenge
you to try a few things with this program.
First, change the size integer in
fgets to 5 and try to enter 10 characters into your keyboard. What
happens with your program?
Second, try rewriting this program
so that you fill the char array with 255 characters and NO NULL value
at the end. How does this affect the strcmp function.
(hint
try adding this code into your program and comment out the fgets:
for(i = name;i<(name+256);i++){
*i = getchar();
printf("Char entered->%c\n", *i);
}
)
Third - Try changing the size of
the name array to 5 and enter 10 characters on the prompt.
What happens?
____________________________
NYLXS: New Yorker Free Software Users Scene
Fair Use -
because it's either fair use or useless....
NYLXS is a trademark of NYLXS, Inc