C++ Syntax Before exploring how we actually represent data in our C++ programs, I want to introduce a formal discussion of basic C++ syntax, which, like data types, doesn't get enough attention in most standardized texts. All programming languages require syntax rules in order for the compilers, to parse and create working machine code. These syntax rules require basic understanding of core components. These components include files, structure, statements, data and operators. Starting from the top, first you have files, and files usually have naming rules. C++ inherits from C nearly all the file structures, and actually requires a greater knowledge in more detail and at an earlier level of expertise. C++, because of its object oriented design depends heavily on library creation. In fact, even for beginners, most of your work happens on the library level. All C programs inherit from the Unix environment, which co-developed with C, the need for an initiation of the main function definition. All C programs start with main, and main will then absorb and use all other parts of the systems libraries and programming to produce your completed program. Main is located in your upper most programming file. Standard Programming Files: A standard programming file is when the top most programming will take place. In the C language, most of your code, especially as a beginner takes place in this file. Most commonly these files have a suffix of either .cc oro .C. file.C for example is a standard C++ File name. A standard programming file will have several components: 1) Include Preprocessor Directives - These import header files and define the definitions of the symbols which your not spontaneously creating, that your program will use. And include directive might look like this: #include Which tells the compiler to load up the definitions of all the functions and objects defined in the iostream library. Standard C++ libraries are included using the angle bracket notions as above. They are search for by your compiler in a set of standard locations which are defined by your compiler and programming environment (something I wouldn't mind understanding better on modern Linux and GNU environments). If you use the syntax #include "myheader" with double quotes, the compiler will look for these headers in the local directory. C libraries are accessable in C++ and can either have a standard C language notion #include #include ***Note the .h suffix being included*** or use the C++ version #include #include 2) Macro and other Preprocessor Compiler Directives - Help set up conditions in which libraries and header files are brought into your code. The curly braces forms a block in which the coder can add really as many instructions as they choose to. These blocks of statements are seen in many C++ syntax structures including functions, if statements, for loops and other structures. While the above is a minimal C++ source file structure, generally most of the heavy lifting of your code takes place outside of main in user defined functions which you create, as well as objects. A more realistic first program skeleton might well look something like this: Here we can see the declaring and defining for 4 user defined functions that are outside of our program, and get instantiated only when called. the for functions are called read, sort, compact and write. And notice that we are using the standard namespace called std. in our program to help prevent duplication and to create different versions of a program as might be needed for differing architecture or conditions. The list of Preprocessor Directives are as follows: #define #endif #ifdef #ifndef #include (as discussed above) A Macro directive might look like this: #ifndef HEAD #define HEAD #include #include #endif Development of skills using these directives, which is a language in a language, is one of the skills that advanced C and C++ coders have that separate them from amateurs. This Macro is telling the compiler to include the libraries and symbols for iostream and string from the core C++ library if and ONLY IF, the symbol HEAD, in the compiler instructions, haven't been already defined. There are also constants that your program has which the compiler adds to your code which include __cplusplus __DATE__ __FILE__ __LINE__ __STDC__ __TIME__ __DATE__ and __TIME__ are the date and time the program is compiled. 3) Original Code and runtime directives starting with main. C++ has added a new programming directive called the "using" directive which is used to create namespace. Namespace gives a finer grain control of which symbols your code recognizes in a specified space. Its really important and in many ways was a long time coming to the C family of languages. Most importantly it prevents you from accidentally stepping on library symbols or words that you might not have been aware of or that programmers after you might not be aware of. It also allows to define the same symbol in multiple locations of your code without stepping on your own toes. So todays modern C++ main program files might look something look something like this: #ifndef TOP_H #include #define TOP_H #endif #ifndef INTARRAY_H #include "intarray.h" #define INTARRAY_H #endif using namespace std; int main( int argc, const char* argv[] ) { //YOUR PROGRAMMING CODE } There is a catch to the namespace usage though. It might very well be that your library files, especially if you are creating them yourself, which you will in C++, have the using directive. If so, you will likely depend on them. Header Files: Header files normally have a .h suffix. file1.h would be an exampe of a header file for C or C++. These are the files that are being included in you #include preprocessor directive. These files are often distributed with a program and you can examine them. They are useful for discovering the definitions of programing objects in libraries are used and often programmers will point them out to you as a form of documentation, which itself is a practice I'm not happy about because many programmers mistake them as a substitute for real documentation. Library Files: After researching this, it has occurred to me that there is an ambiguity about the structure of C and C++ Programming files. Professional programs generally have header files that are described above, but don't have a proper name for the coding files that associate with the headers and which produce object binary files and static or linked libraries. For a beginner this is all confusing and the lack of proper nomenclature makes this all the more harder to learn. I little bit of compiler theory is needed to understand the files structure and binary construction of your program. For now, I just want to point out that programming objects defined in your header file for use in your programming has to have source code to produce the actually machine code that is represented by the symbols in your header file. Those library source files will not have the main function. But the compiler can be asked to create what is called object files, which are partially processed C binary code for later inclusion in your program. When we look closer at the gcc compiler we will examine these object files and learn why they are so important. What is important to say, however, is that in C++, because of its object orientation and its emphasis on creating Application Programming Interfaces (API), most of the C++ coding you will do is taking place in these library C++ source files (which I will refer to as Library Code from here on out). There are two kinds of Library code files that you will work with, that which you create, and that which you borrow from your system for inclusion in your programs. User defined: User defined library files define the code to create working programming objects that are normally declared in your matching header files. These programming source code files look just like your main programming file except they don't have the main function. Your top most main programming file is dependent on these library code files. The code they produce has to be linked into your program by your compiler. Standard C++ or Packaged third party: These are the standard libraries, either in source or in object files, that define standard language needs and are usually found somewhere in /lib or /usr/lib on your system. Standard C++ File Creation: All our C++ programs has to be created in with a standard text editor. The code that the compiler works on, also known as translation units for the compiler at straight ASCII text. You can NOT use a word processor. My preferred text editor is VIM or GVIM, which is a derivative of VI. VI is the standard text editor on Unix like systems and there are many tutorials for it around the internet. Other editors include EMACS, and then there are C++ working environments like Anjuta, which I strongly discourage. I discourage the Programming Integrated Programming environments because with GNU and Unix like systems, your OS is your integrated environment, and I believe one should learn to use the standard tools that are on your GNU/Linux system. A standard C++ file needs to have at least one function defined. We will look at functions (also called methods, more closely later, but a new programmer should get use to looking at them from the start, since everything in C++ is encapsulated in a function called main. Functions are defined by following structure "return type" "function name (the symbol)" ( Argument list) { Statements that end in semi-colon; } Functions do not that semi-colons after the closing curly brace. The main function looks like this int main(int argc, char * argv[]){ return 0; } A realistic C++ main program file, including preprocessor directives would look as follows #include using namespace std; int main(int argc, char * argv[]){ return 0; } The curly braces forms a block in which the coder can add really as many instructions as they choose to. These blocks of statements are seen in many C++ syntax structures including functions, if statements, for loops and other structures. While the above is a minimal C++ source file structure, generally most of the heavey lifting of your code takes place outside of main in user defined functions which you create, as well as objects. A more realistic first program skeletan might well look something like this: #include using namespace std; void oxygen(){ cout << "oxygen()\n";} void hydrogen(){ cout << "hydrogen()\n";} void helium(){ cout << "helium()\n";} void neon(){ cout << "neon()\n";} int main(int argc, char * argv[]){ oxygen(); hydrogen(); helium(); neon(); return 0; } Here we can see the declaring and defining for 4 user defined functions that are outside of our program, and get instantated only when called. the for functions are called read, sort, compact and write. And notice that we are using the standard namespace called std. Statement Structure: All C and C++ statements (although not all syntax) ends with a semi-colon. You can even put two semi-colons on a single line, separated by a semicolon, but in general this isn't recommend. Statements are constructed with Data, Operators and Keywords. C++ has an extended set of Keywords than C. Keywords: Keywords are any symbols that the Standard C++ recognizes as having instructional meaning, that is the tell the compiler to do something. The Key Words in C++ are as follows, and learning the exact meaning of all the keywords is essential to learning C++. These are inherited from C: auto const double float int short struct unsigned break continue else for long signed switch void case default enum goto register sizeof typedef volatile char do extern if return static union while These are the extended set added to C++ asm dynamic_cast namespace reinterpret_cast try bool explicit new static_cast typeid catch false operator template typename class friend private this using const_cast inline public throw virtual delete mutable protected true wchar_t and most C++ Compilers also recognize the follow Keywords and bitand compl not_eq or_eq xor_eq and_eq bitor not or xor Keywords are completely reserved and can not be used as symbols by any user defined variables in your program. They are exclusive to the language and compilers. There are other important predefined symbols that C++ uses as well. These are not strictly exclusive to the Language, however, overloading them or using them as symbols for variables is a very bad idea. There is a lot of them, but some of them might include cin endl INT_MIN iomanip main npos std cout include INT_MAX iostream MAX_RAND NULL string not to mention the Macros like __DATE__ and __TIME__ Operators: Operators, are very much like functions or methods in that they define processes, taking in arguments and returning outputs (and having side affects). In the C Language, Operators are immutable. You can't change their meaning. In C++ many of them can be overloaded, that is that you can create, and change their meaning. A lot of C++ study involves discussing the overloading of Operators. All Operators, as they do in Mathematics, have precedence and associativity. For example, in arithmetic: 4 x 3 - 10 = 22 and not -28 or 2. That is because multiplication has a higher precedence that subtraction and the associativity is left to right. A complete list of C++ operators is considerable and as follows: ┌────────────────────────┬─────────────────────────────────────────┬────────────────┐ │ Operator │ Type │ Associativity │ ├────────────────────────┼─────────────────────────────────────────┼────────────────┤ │ :: │ binary scope resolution │ │ │ :: │ unary scope resolution │ │ ├────────────────────────┼─────────────────────────────────────────┤ │ │ () │ parentheses │ │ │ [] │ array subscript │ │ │ . │ member selection via object │ │ │ -> │ member selection via pointer │ left to right │ │ ++ │ unary postincrement │ │ │ -- │ unary postdecrement │ │ │ typeid │ run-time type information │ │ │ dynamic_cast< type > │ run-time type-checked cast │ │ │ static_cast │ compile-time type-checked cast │ │ │ reinterpret_cast │ cast for non-standard conversions │ │ │ const_cast │ cast away const-ness │ │ ├────────────────────────┼─────────────────────────────────────────┼────────────────┤ │ ++ │ unary preincrement │ │ │ -- │ unary predecrement │ │ │ + │ unary plus │ │ │ - │ unary minus │ │ │ ! │ unary logical negation │ │ │ ~ │ unary bitwise complement │ │ │ ( type ) │ C-style unary cast │ right to left │ │ sizeof │ determine size in bytes │ │ │ & │ address │ │ │ * │ dereference │ │ │ new │ dynamic memory allocation │ │ │ new[] │ dynamic array allocation │ │ │ delete │ dynamic memory deallocation │ │ │ delete[] │ dynamic array deallocation │ │ ├────────────────────────┼─────────────────────────────────────────┼────────────────┤ │ .* │ pointer to member via object │ │ │ ->* │ pointer to member via pointer │ │ ├────────────────────────┼─────────────────────────────────────────┤ │ │ * │ multiplication │ │ │ / │ division │ │ │ % │ modulus │ │ ├────────────────────────┼─────────────────────────────────────────┤ │ │ + │ addition │ │ │ - │ subtraction │ │ ├────────────────────────┼─────────────────────────────────────────┤ │ │ << │ bitwise left shift │ │ │ >> │ bitwise right shift │ │ ├────────────────────────┼─────────────────────────────────────────┤ │ │ < │ relational less than │ │ │ <= │ relational less than or equal to │ left to right │ │ > │ relational greater than │ │ │ >= │ relational greater than or equal to │ │ ├────────────────────────┼─────────────────────────────────────────┤ │ │ == │ relational is equal to │ │ │ != │ relational is not equal to │ │ ├────────────────────────┼─────────────────────────────────────────┤ │ │ & │ bitwise AND │ │ ├────────────────────────┼─────────────────────────────────────────┤ │ │ ^ │ bitwise exclusive OR │ │ ├────────────────────────┼─────────────────────────────────────────┤ │ │ | │ bitwise inclusive OR │ │ ├────────────────────────┼─────────────────────────────────────────┤ │ │ && │ logical AND │ │ ├────────────────────────┼─────────────────────────────────────────┤ │ │ || │ logical OR │ │ ├────────────────────────┼─────────────────────────────────────────┼────────────────┤ │ ?: │ ternary conditional │ │ ├────────────────────────┼─────────────────────────────────────────┤ │ │ = │ assignment │ │ │ += │ addition assignment │ │ │ -= │ subtraction assignment │ │ │ *= │ multiplication assignment │ │ │ /= │ division assignment │ right to left │ │ %= │ modulus assignment │ │ │ &= │ bitwise AND assignment │ │ │ ^= │ bitwise exclusive OR assignment │ │ │ |= │ bitwise inclusive OR assignment │ │ │ >>= │ bitwise left shift assignment │ │ │ <<= │ bitwise right shift with assignment │ │ ├────────────────────────┼─────────────────────────────────────────┼────────────────┤ │ , │ comma │ left to right │ └────────────────────────┴─────────────────────────────────────────┴────────────────┘ We will walk through this complete list of operators later. Data Assignments in C++ Statements: We've now bootstrapped enough background information in order to examine data assignment and variables in C++ statements, and to compile small programs to explore C++ data better. C and C++ are know as compiled typed languages. What this means is that the C source code itself does not directly run a program, unlike scripting languages like Perl, Python, Rudy, Bourn Shell Scripting and such. The program has to be compiled, linked with standard library binary libraries, and outputted into a binary file that the machine and operating system can run. The TYPED aspect of C++ means that all variables have to be defined as some specific data type, one of the data types such as we discussed earlier (such as int, char, short, long, double, float). Functions and Methods are data and have a type. Its a code type. But for the purposes of Syntax they are typed according to the kind of data that they return. When we create variables in C and C++ they have to be typed. In addition, there is 3 phases in variable creation, which can be either in separate statements of combined into one statement. The three phases are: Declaration Definition (needed for functions and methods) Initialization We declare a variable using the name and its type, its symbol and is initializing it, we include an assignment of data. For example: int i; declares a variable i which is of type int, and most usually defines it. float j; declares a variable j of type float, as it is defined specifically to your architecture and environment. We can then assign an integer to i and a float to j using the following syntax. #include using namespace std; int i; float j; //Declarations of variables int main(int argv, char * argc[]) { i = 10; j = 5.6; return 0; } Now here is the catch. WHEN WE DECLARE A VARIABLE in an ordinary C++ coding file, one that has the function main within it, placing the variable in the global namespace above or outside of the "main" function, then we are defining and declaring the variable and the variable exists globally for your program, from the point that the variable is declare on to the end of the file, and all the other functions and code that you bring into your program from that point forward. However, it is normal for real programming for one to use at least three files for your program. Your main C++ code file, your library source file which is used to create reusable objects in your code base, and then finally, in HEADER files, the .h files that are imported with #ifndef MYOWNLIB #include "myownlibrary" #define MYOWNLIB #endif statements. However, it is a programming error to ***define*** variables within your headers. It can confuse the linker, because then they compiler will make space for all the defined variables in every object that uses the header file in your program, and all those objects may not agree as to their values later on. So you want to declare it, so that everyone knows what the common type and symbol is, but you don't want to define it, so that the linker doesn't come along and fill in all those definitions, possibly incorrectly, and separately if not independently, in all the objects of your complex program. In order to allow for the declaration without the definition of a variable in the header file, you must use the "extern" keyword in header files to create proper global scope of variables. This is not true with functions or classes, which we will see later. They have implicit "extern"s. To beat a dead horse, since this is one of the most common areas of programming bugs, understand this: Where we declare a variable is important and determines where your variable is viewable in your program. Complex programs have so many variables in them that it is critical to try to restrict their access and usage to the smallest subset of need as practical. This is accomplished through two mechanisms, Scope and Namespace. We've seen namespace already with the using directive using namespace std; imports into our section of programming all the reserved symbols of the standard namespace. Scope, on the other hand, restricts variable access to specific blocks of our programs, not just the symbols. When we declare our variables outside of main, as we did in the program file1.cc, then the scope is considered to be global and the variables i and j are available throughout our program. Scope is a bit more complex We've looked at global scope already and we will run up against it time and again. According to Lippman there are three kinds of Scope: local, namespace and class. Scope in C++ is fairly complicated, and not well described in the standard texts. Anywhere you have a block you can declare and define variables, and they are local to that block. This is particularly important in functions. A block is a group of statements which are surrounded by curly braces. Like the unix shell, a block inherits the variables in its scope from the name space it is called from, including blocks that wrap it. So you can have blocks within blocks. If you declare a variable already in play, it creates a new temporary version of that variable. Variables in functions, while they might be declared outside the function, and defined within a function, would be created by the compiler, but don't become active until they are called. Each function creates a stack of runtime processes. The top most process, ignoring threading and forked processes for the moment, needs to run its course and end prior to calling function can continue. Since everything begins with a function called main, a runtime stack might look like this, each platter on the stack having its own unique environment and variables scoped within, excluding global variables which they can affect and see. function2 | function1-------| | main ---| In this scenario, when function2 finishes, all of its local variables die and no long exist in runtime, and control returns back to function1. function1 can then call function3, and add new plates to the runtime stack. We will explore this more when we look at functions and methods. Now lets return to the representation of data in C++ and look at programming examples. I will now increasingly be showing example code on the NYLXS website because of nice HTML tool in VIM that can show code in syntax color. See http://www.nylxs.com/docs/workshops/?C=M;O=D for coding examples as mentions here. We can start by building a nice example program, which will have 4 working files, a header file, a main programming file, a library programming file and a make file. There are easier ways of showing these data type examples, but I feel it is important that new C and C++ programmer become comfortable and familiar with multiple file programming projects, especially for C++. Also, I might mention a tip or two with VIM and GVIM, in order to outline more general GNU tool sets to help the C and C++ coder. There are also, no doubt, likewise examples in EMACS and other editors, but I'm not going to mention them, and I really don't want an editor war to break out in the workshop. Data Representation in Detail: chars: Characters are the most basic data type in C and C++ being stored in memory in a single signed or unsigned byte. chars can be represented in several ways, and internally are represented as small ints, either signed or unsigned. We can create character constants by putting keyboard characters into single quotes ==> 'B' and assigned them to either char, unsigned char, or pointer indirectly by addressing to char variable or some other memory construction that stores a char or unsigned char. Characters can be represented in programming code, not only with the char var = 'X'; syntax, but also by Integer, Octal Code and Hexadecimal code which represent ASCII mapped characters. The syntax for these codes includes the following: Integer constants: (note the single quotes) char letter = 65; //stores an ASCII A. char letterOct = '\102'; char letterHex = '\x43'; char letterOct = 0102; char letterHex = 0x43; Octals have the generic form of 3 digits preceded by a backslash if quoted, but the integer syntax can be used. And integer that starts with a 0 is interpreted as non-decimal. Remember that you feel more comfortable with decimal numbers, but the machine couldn't care less. If the 0 is followed by an x or X, then it is hexadecimal. If you put it in single quotes, it needs a backslash first. Their is also a short hand for special characters that can not be readily typed from a US 105 key keyboard. These are also backslashed. I'll show a complete list in the code example, but the most important two, by far, are '\n', the line feed, and '\t' the horizontal tab. The '\n' can also be represented in C++ with the endl symbol (which stands for end of line). I'm not going to discuss the broken MS end of line. A character literal can be also have an 'L' in front of it to use for double wide characters, such as in Chinese etc. It has to be stored in an appropriate variable called a wchar_t. You can't use any of these literal representations on the left side of an assignment operator. For some, this might seem obvious that you can do this assignment: '0' = 'G'; but trust me, some day you will do just this for some twisted reason. Here is a sample program to show all the character variations: Three Files: First the Header File http://www.nylxs.com/docs/workshops/cpp/data.h.html http://www.nylxs.com/docs/workshops/cpp/data.h 1 #ifndef DATA_H 2 #define DATA_H 3 #endif /* DATA_H */ 4 5 void show_chars(); 6 void show_ints(); 7 void show_floats(); 8 void show_arrays(); 9 void show_cstrings(); 10 void show_strings(); 11 The Library Source File: http://www.nylxs.com/docs/workshops/cpp/file_1.cc.html: http://www.nylxs.com/docs/workshops/cpp/file_1.cc 1 #include 2 3 using namespace std; 4 5 6 void show_chars() 7 { 8 //declare and define char types in a function 9 char letter; 10 unsigned char letterU; 11 //assigning chars means using a single quote mark 12 letter = 'R'; 13 letterU = 'u'; 14 //declare,define and assign 15 char letterNull = 0; 16 char letterZero = '0'; 17 char letterINT = 65; 18 char letterAlert = '\a'; 19 char letterBackspace = '\b'; 20 char letterFormFeed = '\f'; 21 char letterNewLine = '\n'; 22 char letterCarriageReturn = '\r'; 23 char letterVertTab = '\v'; 24 char letterHorzTab = '\t'; 25 char letterBackslash = '\\'; 26 char letterQuestionMark = '\?'; 27 char letterSingleQuote = '\''; 28 char letterDoubleQuote = '\"'; 29 char letterOct = '\103'; 30 char letterHex = '\x44'; 31 char letterOct2 = 0104; 32 char letterHex2 = 0x45; 33 //depreciated and causes a segfault char * letterptr = "\x48"; 34 cout << "signed char ==> " << letter << endl; 35 cout << "unsigned char==> " << letterU << endl; 36 cout << "NULL char ==> " << letterNull << endl; 37 cout << "ZERO char ==> " << letterZero << endl; 38 cout << "INT char ==> " << letterINT << endl; 39 cout << "Alert char ==> " << letterAlert << endl; 40 cout << "Backspace char ==> ::" << letterBackspace << "end" << endl; 41 cout << "FormFeed char ==> " << letterFormFeed << "end" << endl; 42 cout << "New Line char ==> " << letterNewLine << "end" << endl; 43 cout << "Carriage Return char ==> " << letterCarriageReturn << "end" << endl; 44 cout << "Verticle Tab char ==> " << letterVertTab << "end" << endl; 45 cout << "Horizonal Tab char ==> " << letterHorzTab << letterHorzTab << "end" << endl; 46 cout << "Backslash char ==> " << letterBackslash << endl; 47 cout << "Question Mark char ==> " << letterQuestionMark << endl; 48 cout << "Single Quotes char ==> " << letterSingleQuote << endl; 49 cout << "Double Quote char ==> " << letterDoubleQuote << endl; 50 cout << "Octal char ==> " << letterOct << endl; 51 cout << "Hexidecimal char ==> " << letterHex << endl; 52 cout << "Octal char 2 ==> " << letterOct2 << endl; 53 cout << "Hexidecimal char 2 ==> " << letterHex2<< endl; 54 // cout << "NOT REALLY a pointer to char but a string ==> " << *letterptr << endl; 55 56 //can't asign a value to a string literal or const char: *letterptr = 'd'; 57 // cout << "pointer to char ==> " << *letterptr << endl; 58 59 60 61 } 62 63 void show_ints() 64 { 65 } 66 67 void show_floats() 68 { 69 } 70 71 void show_arrays() 72 { 73 } 74 75 void show_cstrings() 76 { 77 } 78 79 void show_strings() 80 { 81 } 82 The Main programing file: http://www.nylxs.com/docs/workshops/cpp/file_1.cc.html http://www.nylxs.com/docs/workshops/cpp/file_1.cc #include #include "data.h" int main(int argv, char * argc []){ show_chars(); show_ints(); show_floats(); show_arrays(); show_cstrings(); show_strings(); } and the Makefile to compile http://www.nylxs.com/docs/workshops/cpp/makefile.html http://www.nylxs.com/docs/workshops/cpp/makefile data : data.o data_main.o g++ -o data data.o data_main.o data.o : file_1.cc data.h g++ -Wall -o data.o -c file_1.cc data_main.o : file_1_main.cc g++ -Wall -o data_main.o -c file_1_main.cc Just to say it, there is some differences in the character handling in modern C++ and C, specific to pointers. This syntax, which is almost always wrong char * ptr = "A"; //Double QUOTES THERE needs to specify itself as a const const char * ptr = "A"; and you should be aware that the double quotes is not a character, but a string of the size of 2 chars, a null char is implied, something we will :be exploring more closely in the near future. Furthermore, there is no direct way to assign the address of a literal char to a pointer. char * ptr = 'A'; // Wrong char * ptr = "A"; // Wrong and depreciated and a string You can do this: char letter = 'A'; chat * ptr = &letter; //we will look at this syntax when looking at //pointers in full Workshop Assignment: Print Out a complete set of ASCII chars and the decimal and numbers associated with them. You can use a for loop int i = 0; for( i = 0; i < 127; i++){ //you code in here } and then using unsigned chars, for fun, extend it to a complete set of 256 chars. C++ Integer types - Data Representation Constant and Literal Integers are represented as numbers without any quotes. They can be represented in decimal, octal, and Hexadecimal forms. They can be assigned to int, short, long and signed and unsigned variables and they are literal, and therefor can not be left values: ie values that go on the left of an assignment operator. Decimal Forms look like these examples: int i = 255; short i = 255; long i = 4000; unsigned short = 32; signed int = -273; Octal examples are similar to the integers that we saw with chars and begin with zeros: int o = 042; short o = 0101; long o = 0076; unsigned o = 0101; singed long o =011102; Hexadecimal examples begin with x or X and again are without quotes: int h = 0xFF; short = 0x12; long = 0xA2E44D; unsigned = 0xA2; signed long = 0xAF23E4; Integers also have the ability to be expressed as Unsigned or Long literal values, if the need arises to do so. unsigned regist = 121U; long lightspeed = 123456789L; #define MAXVAL 1234567U While discussing integer values, the programmer also needs to be aware of the size_t typedef that C and C++ uses for the sizeof() operator. The sizeof() operator returns the size of any data object in size of bytes. Its return value is an integer of type size_t. Because in C and C++ we manage memory directly, the sizeof() operator plays a significant role in your programming. And example of sizeof() is: size_t i = sizeof(oint); One of the unusual properties of integers and chars is that both represent real integer values. As a result, many of the mathematical operators can be used with them and they can be assigned to each other. There are automated rules for "recasting" the data types as they interact with each other. We'll look at this rules in detail later. But for now one should be aware that statements theses are common in C and C++. char letter, letter2; int number; short num2; long num3; letter = 'c'; letter2 = 'G'; letter++; //now stores 'd' cout << letter << " " << endl; //prints 'd' and a line feed num2 = letter; cout << num2 << " " << endl; //prints '100' and a line feed cout << letter + leter2; Floating Point Floating Point data essentially can be represented in the two styles already discussed, as decimal and scientific notation. Decimal notation can be followed by an 'F' or 'f' for single precision or a double precision with an 'L' or 'l' (not a 'D' for double). Here are some examples: double avog = 6.23xE23; float pizza = 0.125; float trip = 102.7F float population = 8323456L; float pop_brklyn = 8.32xE6; Casting: C and C++ are typed languages, but they have some flexibility built into their design in this regard. This can be good and bad, because this also means that the language will give you room to hang yourself if you don't learn the explicit rules for the accommodations that C++ will make for you. For example, one can assign a float into an integer - but then you are left to understand what the resulting outcome is. And these are very difficult bugs to catch because the code looks correct, seems logical, and it is a raw syntax error. For example, what does this legal code do? #include #include ; int a,b,c,z; float d,e,f; a = 1.25; b = 2.50; c = 5; d = 0.6125; e = 0.30625; f = 0.153125; z = 25U + 75; z = z * a; cout << b/2 << endl; z=((pow(b,2)) + e)/((pow(f,e)) * z); cout << "I have no clue what this results in and can't be paid enough to debug it " << z << endl; Rules for Casting: Implicit Rules: When the compiler is confronted with two different data types, it tries to work operations by casting one data type to another which is compatible to the expression. The programmer can also manually do such casting. The Implicit casting rules are as follows: There are 4 events that trigger C and C++ to do implicit type conversion: A) When the operants in a mathematic or logic expression are of two different type: example char a = 'A'; int b = 9; long c; c = a + b; if(c == b){ //do something } B) When the assignment on the right side of an equation doesn't match the type of the variable or lvaue on the left: example char a = 'a'; double b; b = c; C) When the argument to a function or method doesn't match the parameter. We will see this when we look at functions. D) When the return statement doesn't match the function type. Again, we will see this when we examine functions. Rules: Generally the implicit cast rules are designed to loose the least about of precision possible. If an arithmetic operation includes floating point data then the implicit casting of data follows the following rules float --promotion-->double-->long double Neither operand in an arithmetic operation has a float: int-->unsigned int-->long-->unsigned long : Note that these promotions can lose there sign (lose negativity). I also not that when attempting to verify these promotion rules in C and in C++, that they don't hold. I can not create an example program that will promote the signed data type to an unsigned one, and to then lose the sign. According to Lippman the C++ rules for implicit promotion in C++ is as follows: In arithmetic operations involving binary operators and data of mixed type will convert to the widest data type present. All arithmetic expressions involving types less wide that an INT are promoted to an INT before processing. When data is present in the expression as a long double, everything is converted to long double's.... otherwise, if neither is a long double and one is of type double, then everything is converted to double... otherwise, if neither is a double, then if one type is a float, everything is converted to a float... Otherwise, when there are no floats involved, then integer promotion is evaluated. At the beginning of evaluation all integers small than a INT is promoted to an INT. Unsigned short ints are promoted to ints as well unless they can't fit, then they are promoted to unsigned ints. Now, after the floats are done being evaluation the larger ints are evaluated for promotion. If there is an unsigned long, all are converted to unsigned long (not the loss of negative values)... Otherwise, if there is no unsigned longs, and we have a long, the others are converted to long, and an unsigned int is converted to long if it is large enough to hold the bytes, otherwise it is converted to a unsigned long ... Otherwise...if there is an unsigned int, then everything is promoted to unsigned it. Again, I will repeat that I have not been able to confirm the documented implicit conversion in arithmetic operations in cases where sign is lost (conversions from unsigned ints to long for example), unless the value is being assigned to an unsigned variable. Explicit Cast: For a variety of reasons, one might need to cast data intentionally. There are two styles to do this, the older C style and the newer C++ standard. First the new style. The kinds of New Style Casting: static_cast, dynamic_cast, const_cast, and reinterpret_cast. Syntax for these casts follows the following conventions: int int_variable = static_cast(char_variable); cast_name(variable) in the general form. I'm not going to yet explain the differences at this point, but will come back to it soon enough. I will say that the result is to forcefully convert the data from one type to another, in the case above, from a char to a int. In the C style, parenthesis are used to make the cast: char letter; int var = (int) letter; casts the value of letter to an int. Aggregate Data Types: Most of the action involving your program will involve more than a single indepent integer, char or float. Groups of data types together creates most of the useful. C and C++ gives multiple tools for handling these agregate data types. The key element is the C style array. An array's syntax is declared, defined and assigned like the elementry data types, and looks like this, using the square bracket operator: char mystraing[]; // Declares an array of chars without dimension char mystring[100]; //Declares an array of chars with 100 chars within //it char * mystring[]; //Declares an array of pointers to chars similar to //the paramenter of main char * argv[]; One can assign and declare your array with a single statement. When doing so, C and C++ has several syntax tools to help you create many necessecary subtle data contructions that you need for your programming. The comments below outlines these examples and behaviors. char mystring[] = "This is our first string"; //Declares a char array of //27 chars which is terminated with a null value char mystring[] = {'a','b','c','d','e'}; //Creates an array of 5 chars. int matrix[100] = {1.2,3,4,5}; //This creates an array of 100 //integers filling the first 5 locations with 1,2,3,4,5 //and then adds 0's or NULLS to the remaining 95 indexed //locations int matrix[100] = {'1'.'2','3','4','5'}; //This creates an array of 100 //integers where the equivilent of the short //intergers which represent the ascii values for //the characters '1' and '2' etc, and then fills //the rest of the array with zeros. It is //similar to the next statement (but not //exactly) char matrix[100] = "12345"; // This example creates a string literal //"12345" which ends in a null, and then //pads the rest of the array with nulls. The //result is the same as above, but via a //different mechanism because all string //literals end in null. The above examples //has implicit promotion from char to integer //types. This example must be a char type, //otherwise the the compiler will not accept //the assignment. Furthmore, only the care //type will print a string when asked. The //top example needs an explicit cast. See //and thry this example for a demonstration. #include using namespace std; int main(int argc, char * argv[]){ unsigned short int matrix[100] = {'1','2','3','4','5'}; char matrix2[100] = "12345"; cout << "First Martix "<< matrix << endl; cout << "Second Matrix " << matrix2 << endl; for(int i=0;i<5;i++){ cout << matrix[i] << endl; } for(int i=0;i<5;i++){ cout << static_cast(matrix[i]) << endl; } return 0; } ruben@www2:~/cplus> g++ -Wall test.cc -o test.bin You have mail in /var/mail/ruben ruben@www2:~/cplus> ruben@www2:~/cplus> ./test.bin First Martix 0xbfc98c3c Second Matrix 12345 49 50 51 52 53 1 2 3 4 5 ruben@www2:~/cplus> Notice that the second matrix prints a seemingly random number. That number is actually the memory address that matrix points at. It acts like a pointer in the context of cout. The for loop itself will be looked at more closely when we discuss flow control operators. We can not mix data types in an array. An array is defined by as a single data type only. Arrays are indexed starting with zero. You have to know the size of your arrays, otherwise you can walk past the end of them into the undefined sections of your memory. Usually this will cause a segmentation fault, but not always. Arrrays have syntax that allow them to be converted to pointers. Pointers is the next section, after we look at arrays, and we wil look closely at pointers and arrays at soon. Arrays can have two dimensions like this: float matrix[4][7]; That declares an array of 4 columns of nine rows (c before r), for example, we can initialize such an array like this: float matrix[4][7] = { { 2.11, 2.22, 2.33, 2.44, 2.55, 2.66, 2.77 }, { 3.11, 3.33, 3.33, 3.44, 3.55, 3.66, 3.77 }, { 4.11, 4.44, 4.33, 4.44, 4.55, 4.66, 4.77 }, { 5.11, 5.55, 5.33, 5.44, 5.55, 5.66, 5.77 } }; or you can drop in inside curly braces and the compiler will do the rest.. float matrix[4][7] = { 2.11, 2.22. 2.33, 2.44, 2.55, 2.66, 2.77 , 3.11, 3.33. 3.33, 3.44, 3.55, 3.66, 3.77, 4.11, 4.44. 4.33, 4.44, 4.55, 4.66, 4.77 , 5.11, 5.55. 5.33, 5.44, 5.55, 5.66, 5.77 }; Although we stupid humans conceptualize this as columns and rows, in RAM this is stored as a single linear block of memory. There are alot of minefields with two dimensional arrays, and this program shows some of them: #include using namespace std; int main(int argc, char * argv[]){ unsigned short int matrix[100] = {'1','2','3','4','5'}; char matrix2[1000] = "12345"; float dmatrix[4][7] = { { 2.11, 2.22, 2.33, 2.44, 2.55, 2.66, 2.77 }, { 3.11, 3.33, 3.33, 3.44, 3.55, 3.66, 3.77 }, { 4.11, 4.44, 4.33, 4.44, 4.55, 4.66, 4.77 }, { 5.11, 5.55, 5.33, 5.44, 5.55, 5.66, 5.77 } }; float * track; cout << "First Martix "<< matrix << endl; cout << "Second Matrix " << matrix2 << endl; for(int i=0;i<5;i++){ cout << matrix[i] << endl; } for(int i=0;i<5;i++){ cout << static_cast(matrix[i]) << endl; } for(int i=0;i<100;i++){ cout << &matrix[i] << endl; } for(int i=0;i<5;i++){ cout << "STRING " << reinterpret_cast(&matrix2[i]) << endl; } track = *dmatrix; float * last = &dmatrix[3][6]; for(int count = 0; track <= last; track++){ cout << "Position ==>" << count++ << "\tMemory Location==>"<" << *track < using namespace std; int main(int argc, char * argv[]){ unsigned short int matrix[100] = {'1','2','3','4','5'}; char matrix2[1000] = "12345"; float dmatrix[4][7] = { { 2.11, 2.22, 2.33, 2.44, 2.55, 2.66, 2.77 }, { 3.11, 3.33, 3.33, 3.44, 3.55, 3.66, 3.77 }, { 4.11, 4.44, 4.33, 4.44, 4.55, 4.66, 4.77 }, { 5.11, 5.55, 5.33, 5.44, 5.55, 5.66, 5.77 } }; float * track; cout << "First Martix "<< matrix << endl; cout << "Second Matrix " << matrix2 << endl; for(int i=0;i<5;i++){ cout << matrix[i] << endl; } for(int i=0;i<5;i++){ cout << static_cast(matrix[i]) << endl; } for(int i=0;i<100;i++){ cout << &matrix[i] << endl; } for(int i=0;i<5;i++){ cout << "STRING " << reinterpret_cast(&matrix2[i]) << endl; } track = *dmatrix; float * last = &dmatrix[3][6]; for(int count = 0; track <= last; track++){ cout << "Position with track ==>" << count++ << "\tMemory Location==>"<" << *track <" << count++ << "\tMemory Location==>"<" << *track2 <" << size_dmatrix << endl; cout << "Size of Track ==>" << size_track << endl; cout << "Size of Track2==>" << size_track2 << endl; cout << "Size of dmatix[0] ==>" << size_dmatrix_row << endl; return 0; } A particularly special array is the charater array, which in C forms the basis for strings. We already know that a single char is a C and C++ built in data type and we can have an array of chars, and lastly that we can have string literals, which are constant. For review, lets look at code examples of each: Example A: char car = 'A'; //a single character assigned to a char variable. Note //the single quote Example B: char cararray[] = {'A', 'B', 'C', 'D'}; // The definition and assignment An array of 4 chars, //which got it's size with the //initialization of the array //and of which each of which //element can be accessed //through indexing ie: //char b = cararray[3] or //pointers such as //char b = *(cararray + 3) Example C: const char *stringo = "ABCD"; //This is the assignment of a string //constant literal to a pointer to a //char constant. This is a real string //that differs from the above example //because it creates an array of chars, //not 4 chars long but 5 chars long //because it appends a NULL character //to the end Example D: char[] = "My Dog Has Fleas\n";//similar to above with 19 char //assigned to the array ending with //a NULL char but not a constant //literal There are some important but subtle differences between "true" string literals and strings formed by manually creating arrays of chars as shown in the technique of Example B and Example D. We can see an example of this difference in the following code. Make a new directory and in GVIM or the editor of your choice create the following files: test.cc ------------------------------------------ #include #include "test.h" using namespace std; int main(int argc, char * argv[]){ stringexample(); return 0; } ---------------------------------------------- test.h ---------------------------------------------- #ifndef TEST_H #define TEST_H #endif /* TEST_H */ void stringexample(); -------------------------------------------------- string_ex.cc -------------------------------------------------- #include #include "test.h" using namespace std; void stringexample(){ char test[] = "My Dog has Fleas\n"; const char * test2 = "My Dog has Fleas\n"; cout << test; test[3] = 'C'; test[4] = 'a'; test[5] = 't'; cout << test; cout << test2; const_cast(test2[3]) = 'C'; const_cast(test2[4]) = 'a'; const_cast(test2[5]) = 't'; cout << test2; } -------------------------------------------------- test.cc -------------------------------------------------- #include #include "test.h" using namespace std; int main(int argc, char * argv[]){ stringexample(); return 0; } ---------------------------------------------------- and create the following makefile ---------------------------------------------------- test.bin : test.o string_ex.o g++ -Wall -o test.bin test.o string_ex.o test.o : test.cc g++ -Wall -c test.cc string_ex.o : string_ex.cc test.h g++ -Wall -c string_ex.cc ---------------------------------------------------- Note that the makefile MUST have those TABS and not spaces Then run 'make' gcc gives you the following output g++ -Wall -c string_ex.cc string_ex.cc: In function ‘void stingexample()’: string_ex.cc:15: error: assignment of read-only location ‘*(test2 + 3u)’ string_ex.cc:15: error: invalid use of const_cast with type ‘char’, which is not a pointer, reference, nor a pointer-to-data-member type string_ex.cc:16: error: assignment of read-only location ‘*(test2 + 4u)’ string_ex.cc:16: error: invalid use of const_cast with type ‘char’, which is not a pointer, reference, nor a pointer-to-data-member type string_ex.cc:17: error: assignment of read-only location ‘*(test2 + 5u)’ string_ex.cc:17: error: invalid use of const_cast with type ‘char’, which is not a pointer, reference, nor a pointer-to-data-member type make: *** [string_ex.o] Error 1 This is a very useful error message and the GCC compiler is now taking the programming to school. Lets look at the complaints of the compiler about our code. The first problem gcc makes is about line 15 in string_ex.cc which is this line: const_cast(test2[3] = 'C'); The compiler is telling us that array (or string) and test2 points to is read only. That variable is defined on line 8: const char * test2 = "My Dog has Fleas\n"; It is obvious from the code that the data is defined as a "const", less obvious is that the compiler will complain and refuse to compile if you do NOT make test2 a "const". Because of the assignment of the string literal to the char pointer, it must be a const. Therefore, we tried to cast the const away with const_cast, and that fails as well because, as the compiler says to us: invalid use of const_cast with type ‘char’, which is not a pointer, reference, nor a pointer-to-data-member type. We can not just cast away to constantness of the string literal assigned to test2. So again we see that arrays and pointers have differences, and arrays and strings have even great differences. The standard iostream object "cout" will recognize both as strings for printing to standard output. Incidentally, we can compile this substitution for string_ex.cc -------------------------------------------------- #include #include "test.h" using namespace std; void stringexample(){ char test[] = "My Dog has Fleas\n"; const char * test2 = "My Dog has Fleas\n"; char * test3; cout << test; test[3] = 'C'; test[4] = 'a'; test[5] = 't'; cout << test; cout << test2; test3 = const_cast(test2); test3[3] = 'C'; test3[4] = 'a'; test3[5] = 't'; cout << test2; } --------------------------------------------- but it creates a segmentation fault on the line: test3[3] = 'C'; if you add the compiler options -ggdb to your g++ commands in your makefile, you can trace this error. This UNDERSCORES how dangerous that explicit casting can be, even when allowed. The next agregate data type that C++ provides is structs and unions. Structs and Unions have largely been superceded in C++ by Classes. They are inherited from C and are designed as the key multi-type agregrate data type that C uses to create related gruops of data, similar to what a database can provide. Through coding algorithism, their importance have grown far greater than just mear static records, but in this section we will look only at their basic syntax and usage. The real limitation to arrays is that data must be all of the same type. structs create a new type which contain many of types with in. The basic format of a struct is as follows: Declaration: stuct struct_type_name { data type; data tpye; .... }[optional object name]; Notice the semicolon on the end of the struct declaration, which is unusual and particular to struct's in that it is after the curley brace. structs create a new user defined data type, and after their declaration can be used as any other data type: Example: struct birds{ char species[30]; char gender[1]; char color[10]; int size_in_inches; double weight; char diet[30]; }; birds african_grey; birds canary; or you can make the instances with the declation: struct birds{ char species[30]; char gender[1]; char color[10]; int size_in_inches; double weight; char diet[30]; } canary, african_grey; Now it might be noted here an important diference between C and C++. In C, you need the struct keyword to create instances of your struct's. C EXAMPLE: struct birds african_grey; struct birds canary; Otherwise, you need to use typedef in C typedef birds parrot; parrot african_grey, scarlet_macaw, conour, monk, bundie; or even use typedef in the declaration: typedef struct { char species[30]; char gender[1]; char color[10]; int size_in_inches; double weight; char diet[30]; } birds; birds african_grey; birds canary; This is a variation of typedef, which we haven't covered. In general typedef creates an alias for a datatype of any kind: typedef data_type alias; typedef int BOOL; typedef char[300] buffer; thus the above example: typedef struct { char species[30]; char gender[1]; char color[10]; int size_in_inches; double weight; char diet[30]; } birds; is not the same as struct birds { char species[30]; char gender[1]; char color[10]; int size_in_inches; double weight; char diet[30]; } canary, african_grey; None creates a new data type, "birds" and the other creates new instances of the struct, "canary", "african_grey". ****in C*** struct's can be initialized in C++ as follows, with the use of the assignment operator and curly braces: birds african_grey = { "African Grey Congo", 'M', "Light Grey", 13, 8.34, "Fruit and Seed" }; birds canary = { "Red Factor Canary", 'M', "Red to Orange", 2, 1.2, "Seed and Grass" }; in C remember to add the struct keyword. The internal data types are reached by use of the dot operator. char * color = canary.color; cout << canary.color << endl; strcpy(canary.color, "Deep Red"); //can not assign a char[] You can make a pointer to a struct, and this is often useful birds * canary; but remember that you have no memory allocated for members yet. birds *finches, parrot={"Conure", 'F', "Green", 6, 60, "Oranges and Peanuts"}; finches = &parrot; When you use a pointer, access to members is gained using the infix operaotr "->" Example: struct birds{ char species[30]; char gender[1]; char color[10]; int size_in_inches; double weight; char diet[30]; } *finches, parrots = {"Conure",'F',"Yellow",'6',60,"Peanuts and Oranges"}; finches = &parrots; cout << "Your " << finches->species << "is " << finches->color << endl; By creating arrays of struct's, large databases of records can be stored in your program birds finches[100]; strcpy(finches[0].species, "Zebra Finch"); birds *parrots[100]; strcpy(parrot[0]->species, "Amazon");