C++ Syntax Before exploring how we actually represent data in our C++ programs, I want to introduce a formal discussion of basic C++ syntax, which, like data types, doesn't get enough attention in most standardized texts. All programming languages require syntax rules in order for the compilers, to parse and create working machine code. These syntax rules require basic understanding of core components. These components include files, structure, statements, data and operators. Starting from the top, first you have files, and files usually have naming rules. C++ inherits from C nearly all the file structures, and actually requires a greater knowledge in more detail and at an earlier level of expertise. C++, because of its object oriented design depends heavily on library creation. In fact, even for beginners, most of your work happens on the library level. All C programs inherit from the Unix environment, which co-developed with C, the need for an initiation of the main function definition. All C programs start with main, and main will then absorb and use all other parts of the systems libraries and programming to produce your completed program. Main is located in your upper most programming file. Standard Programming Files: A standard programming file is when the top most programming will take place. In the C language, most of your code, especially as a beginner takes place in this file. Most commonly these files have a suffix of either .cc oro .C. file.C for example is a standard C++ File name. A standard programming file will have several components: 1) Include Preprocessor Directives - These import header files and define the definitions of the symbols which your not spontaneously creating, that your program will use. And include directive might look like this: #include Which tells the compiler to load up the definitions of all the functions and objects defined in the iostream library. Standard C++ libraries are included using the angle bracket notions as above. They are search for by your compiler in a set of standard locations which are defined by your compiler and programming environment (something I wouldn't mind understanding better on modern Linux and GNU environments). If you use the syntax #include "myheader" with double quotes, the compiler will look for these headers in the local directory. C libraries are accessable in C++ and can either have a standard C language notion #include #include ***Note the .h suffix being included*** or use the C++ version #include #include 2) Macro and other Preprocessor Compiler Directives - Help set up conditions in which libraries and header files are brought into your code. The curly braces forms a block in which the coder can add really as many instructions as they choose to. These blocks of statements are seen in many C++ syntax structures including functions, if statements, for loops and other structures. While the above is a minimal C++ source file structure, generally most of the heavy lifting of your code takes place outside of main in user defined functions which you create, as well as objects. A more realistic first program skeleton might well look something like this: Here we can see the declaring and defining for 4 user defined functions that are outside of our program, and get instantiated only when called. the for functions are called read, sort, compact and write. And notice that we are using the standard namespace called std. in our program to help prevent duplication and to create different versions of a program as might be needed for differing architecture or conditions. The list of Preprocessor Directives are as follows: #define #endif #ifdef #ifndef #include (as discussed above) A Macro directive might look like this: #ifndef HEAD #define HEAD #include #include #endif Development of skills using these directives, which is a language in a language, is one of the skills that advanced C and C++ coders have that separate them from amateurs. This Macro is telling the compiler to include the libraries and symbols for iostream and string from the core C++ library if and ONLY IF, the symbol HEAD, in the compiler instructions, haven't been already defined. There are also constants that your program has which the compiler adds to your code which include __cplusplus __DATE__ __FILE__ __LINE__ __STDC__ __TIME__ __DATE__ and __TIME__ are the date and time the program is compiled. 3) Original Code and runtime directives starting with main. C++ has added a new programming directive called the "using" directive which is used to create namespace. Namespace gives a finer grain control of which symbols your code recognizes in a specified space. Its really important and in many ways was a long time coming to the C family of languages. Most importantly it prevents you from accidentally stepping on library symbols or words that you might not have been aware of or that programmers after you might not be aware of. It also allows to define the same symbol in multiple locations of your code without stepping on your own toes. So todays modern C++ main program files might look something look something like this: #ifndef TOP_H #include #define TOP_H #endif #ifndef INTARRAY_H #include "intarray.h" #define INTARRAY_H #endif using namespace std; int main( int argc, const char* argv[] ) { //YOUR PROGRAMMING CODE } There is a catch to the namespace usage though. It might very well be that your library files, especially if you are creating them yourself, which you will in C++, have the using directive. If so, you will likely depend on them. Header Files: Header files normally have a .h suffix. file1.h would be an exampe of a header file for C or C++. These are the files that are being included in you #include preprocessor directive. These files are often distributed with a program and you can examine them. They are useful for discovering the definitions of programing objects in libraries are used and often programmers will point them out to you as a form of documentation, which itself is a practice I'm not happy about because many programmers mistake them as a substitute for real documentation. Library Files: After researching this, it has occurred to me that there is an ambiguity about the structure of C and C++ Programming files. Professional programs generally have header files that are described above, but don't have a proper name for the coding files that associate with the headers and which produce object binary files and static or linked libraries. For a beginner this is all confusing and the lack of proper nomenclature makes this all the more harder to learn. I little bit of compiler theory is needed to understand the files structure and binary construction of your program. For now, I just want to point out that programming objects defined in your header file for use in your programming has to have source code to produce the actually machine code that is represented by the symbols in your header file. Those library source files will not have the main function. But the compiler can be asked to create what is called object files, which are partially processed C binary code for later inclusion in your program. When we look closer at the gcc compiler we will examine these object files and learn why they are so important. What is important to say, however, is that in C++, because of its object orientation and its emphasis on creating Application Programming Interfaces (API), most of the C++ coding you will do is taking place in these library C++ source files (which I will refer to as Library Code from here on out). There are two kinds of Library code files that you will work with, that which you create, and that which you borrow from your system for inclusion in your programs. User defined: User defined library files define the code to create working programming objects that are normally declared in your matching header files. These programming source code files look just like your main programming file except they don't have the main function. Your top most main programming file is dependent on these library code files. The code they produce has to be linked into your program by your compiler. Standard C++ or Packaged third party: These are the standard libraries, either in source or in object files, that define standard language needs and are usually found somewhere in /lib or /usr/lib on your system. Standard C++ File Creation: All our C++ programs has to be created in with a standard text editor. The code that the compiler works on, also known as translation units for the compiler at straight ASCII text. You can NOT use a word processor. My preferred text editor is VIM or GVIM, which is a derivative of VI. VI is the standard text editor on Unix like systems and there are many tutorials for it around the internet. Other editors include EMACS, and then there are C++ working environments like Anjuta, which I strongly discourage. I discourage the Programming Integrated Programming environments because with GNU and Unix like systems, your OS is your integrated environment, and I believe one should learn to use the standard tools that are on your GNU/Linux system. A standard C++ file needs to have at least one function defined. We will look at functions (also called methods, more closely later, but a new programmer should get use to looking at them from the start, since everything in C++ is encapsulated in a function called main. Functions are defined by following structure "return type" "function name (the symbol)" ( Argument list) { Statements that end in semi-colon; } Functions do not that semi-colons after the closing curly brace. The main function looks like this int main(int argc, char * argv[]){ return 0; } A realistic C++ main program file, including preprocessor directives would look as follows #include using namespace std; int main(int argc, char * argv[]){ return 0; } The curly braces forms a block in which the coder can add really as many instructions as they choose to. These blocks of statements are seen in many C++ syntax structures including functions, if statements, for loops and other structures. While the above is a minimal C++ source file structure, generally most of the heavey lifting of your code takes place outside of main in user defined functions which you create, as well as objects. A more realistic first program skeletan might well look something like this: #include using namespace std; void oxygen(){ cout << "oxygen()\n";} void hydrogen(){ cout << "hydrogen()\n";} void helium(){ cout << "helium()\n";} void neon(){ cout << "neon()\n";} int main(int argc, char * argv[]){ oxygen(); hydrogen(); helium(); neon(); return 0; } Here we can see the declaring and defining for 4 user defined functions that are outside of our program, and get instantated only when called. the for functions are called read, sort, compact and write. And notice that we are using the standard namespace called std. Statement Structure: All C and C++ statements (although not all syntax) ends with a semi-colon. You can even put two semi-colons on a single line, separated by a semicolon, but in general this isn't recommend. Statements are constructed with Data, Operators and Keywords. C++ has an extended set of Keywords than C. Keywords: Keywords are any symbols that the Standard C++ recognizes as having instructional meaning, that is the tell the compiler to do something. The Key Words in C++ are as follows, and learning the exact meaning of all the keywords is essential to learning C++. These are inherited from C: auto const double float int short struct unsigned break continue else for long signed switch void case default enum goto register sizeof typedef volatile char do extern if return static union while These are the extended set added to C++ asm dynamic_cast namespace reinterpret_cast try bool explicit new static_cast typeid catch false operator template typename class friend private this using const_cast inline public throw virtual delete mutable protected true wchar_t and most C++ Compilers also recognize the follow Keywords and bitand compl not_eq or_eq xor_eq and_eq bitor not or xor Keywords are completely reserved and can not be used as symbols by any user defined variables in your program. They are exclusive to the language and compilers. There are other important predefined symbols that C++ uses as well. These are not strictly exclusive to the Language, however, overloading them or using them as symbols for variables is a very bad idea. There is a lot of them, but some of them might include cin endl INT_MIN iomanip main npos std cout include INT_MAX iostream MAX_RAND NULL string not to mention the Macros like __DATE__ and __TIME__ Operators: Operators, are very much like functions or methods in that they define processes, taking in arguments and returning outputs (and having side affects). In the C Language, Operators are immutable. You can't change their meaning. In C++ many of them can be overloaded, that is that you can create, and change their meaning. A lot of C++ study involves discussing the overloading of Operators. All Operators, as they do in Mathematics, have precedence and associativity. For example, in arithmetic: 4 x 3 - 10 = 22 and not -28 or 2. That is because multiplication has a higher precedence that subtraction and the associativity is left to right. A complete list of C++ operators is considerable and as follows: ┌────────────────────────┬─────────────────────────────────────────┬────────────────┐ │ Operator │ Type │ Associativity │ ├────────────────────────┼─────────────────────────────────────────┼────────────────┤ │ :: │ binary scope resolution │ │ │ :: │ unary scope resolution │ │ ├────────────────────────┼─────────────────────────────────────────┤ │ │ () │ parentheses │ │ │ [] │ array subscript │ │ │ . │ member selection via object │ │ │ -> │ member selection via pointer │ left to right │ │ ++ │ unary postincrement │ │ │ -- │ unary postdecrement │ │ │ typeid │ run-time type information │ │ │ dynamic_cast< type > │ run-time type-checked cast │ │ │ static_cast │ compile-time type-checked cast │ │ │ reinterpret_cast │ cast for non-standard conversions │ │ │ const_cast │ cast away const-ness │ │ ├────────────────────────┼─────────────────────────────────────────┼────────────────┤ │ ++ │ unary preincrement │ │ │ -- │ unary predecrement │ │ │ + │ unary plus │ │ │ - │ unary minus │ │ │ ! │ unary logical negation │ │ │ ~ │ unary bitwise complement │ │ │ ( type ) │ C-style unary cast │ right to left │ │ sizeof │ determine size in bytes │ │ │ & │ address │ │ │ * │ dereference │ │ │ new │ dynamic memory allocation │ │ │ new[] │ dynamic array allocation │ │ │ delete │ dynamic memory deallocation │ │ │ delete[] │ dynamic array deallocation │ │ ├────────────────────────┼─────────────────────────────────────────┼────────────────┤ │ .* │ pointer to member via object │ │ │ ->* │ pointer to member via pointer │ │ ├────────────────────────┼─────────────────────────────────────────┤ │ │ * │ multiplication │ │ │ / │ division │ │ │ % │ modulus │ │ ├────────────────────────┼─────────────────────────────────────────┤ │ │ + │ addition │ │ │ - │ subtraction │ │ ├────────────────────────┼─────────────────────────────────────────┤ │ │ << │ bitwise left shift │ │ │ >> │ bitwise right shift │ │ ├────────────────────────┼─────────────────────────────────────────┤ │ │ < │ relational less than │ │ │ <= │ relational less than or equal to │ left to right │ │ > │ relational greater than │ │ │ >= │ relational greater than or equal to │ │ ├────────────────────────┼─────────────────────────────────────────┤ │ │ == │ relational is equal to │ │ │ != │ relational is not equal to │ │ ├────────────────────────┼─────────────────────────────────────────┤ │ │ & │ bitwise AND │ │ ├────────────────────────┼─────────────────────────────────────────┤ │ │ ^ │ bitwise exclusive OR │ │ ├────────────────────────┼─────────────────────────────────────────┤ │ │ | │ bitwise inclusive OR │ │ ├────────────────────────┼─────────────────────────────────────────┤ │ │ && │ logical AND │ │ ├────────────────────────┼─────────────────────────────────────────┤ │ │ || │ logical OR │ │ ├────────────────────────┼─────────────────────────────────────────┼────────────────┤ │ ?: │ ternary conditional │ │ ├────────────────────────┼─────────────────────────────────────────┤ │ │ = │ assignment │ │ │ += │ addition assignment │ │ │ -= │ subtraction assignment │ │ │ *= │ multiplication assignment │ │ │ /= │ division assignment │ right to left │ │ %= │ modulus assignment │ │ │ &= │ bitwise AND assignment │ │ │ ^= │ bitwise exclusive OR assignment │ │ │ |= │ bitwise inclusive OR assignment │ │ │ >>= │ bitwise left shift assignment │ │ │ <<= │ bitwise right shift with assignment │ │ ├────────────────────────┼─────────────────────────────────────────┼────────────────┤ │ , │ comma │ left to right │ └────────────────────────┴─────────────────────────────────────────┴────────────────┘ We will walk through this complete list of operators later. Data Assignments in C++ Statements: We've now bootstrapped enough background information in order to examine data assignment and variables in C++ statements, and to compile small programs to explore C++ data better. C and C++ are know as compiled typed languages. What this means is that the C source code itself does not directly run a program, unlike scripting languages like Perl, Python, Rudy, Bourn Shell Scripting and such. The program has to be compiled, linked with standard library binary libraries, and outputted into a binary file that the machine and operating system can run. The TYPED aspect of C++ means that all variables have to be defined as some specific data type, one of the data types such as we discussed earlier (such as int, char, short, long, double, float). Functions and Methods are data and have a type. Its a code type. But for the purposes of Syntax they are typed according to the kind of data that they return. When we create variables in C and C++ they have to be typed. In addition, there is 3 phases in variable creation, which can be either in separate statements of combined into one statement. The three phases are: Declaration Definition (needed for functions and methods) Initialization We declare a variable using the name and its type, its symbol and is initializing it, we include an assignment of data. For example: int i; declares a variable i which is of type int. float j; declares a variable j of type float, as it is defined specifically to your architecture and environment. We can then assign an integer to i and a float to j using the following syntax. #include using namespace std; int i; float j; //Declarations of variables int main(int argv, char * argc[]) { i = 10; j = 5.6; return 0; } Where we declare a variable is important and determines where your variable is viewable in your program. Complex programs have so many variables in them that it is critical to try to restrict their access and usage to the smallest subset of need as practical. This is accomplished through two mechanisms, Scope and Namespace. We've seen namespace already with the using directive using namespace std; imports into out section of programming all the reserved symbols of the standard namespace. Scope, on the other hand, restricts variable access to specific blocks of our programs, not just the symbols. When we declare our variables outside of main, as we did in the program file1.cc, then the scope is considered to be global and the variables i and j are available throughout our program.