const char ASTERISK = '*';
char ch;
char letter;
letter = 'a';
cin >> ch;
cout << letter;
The char data type sets aside one byte to store a character
value.
Recall that one byte equals 8 bits,
where each bit contains either a zero or a one.
We will number the bits of a byte as shown in this diagram:
b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0 |
What values do the 8 bits have when they represent various characters? For example, what pattern of zeroes and ones is used to represent, say, the letter A? A common encoding scheme for character data is ASCII, which stands for American Standard Code for Information Interchange. This standard describes what bit patterns represent various symbols, digits, letters, and so forth. By and large, all the computer systems you use encode characters using ASCII. (We will discuss why this is a bit of a simplification later in this lab.)
Before we examine an ASCII code chart, we must realize that
any pattern of zeroes and ones in a byte is also a binary number
and thus has a numeric value!
For example, the bit pattern of all zeroes
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
In the ASCII code, bit b7 is always a zero. Since this leaves us with 7 bits (b6 through b0) for our code, this means there are 128 possible patterns. (27 = 128.) An ASCII code chart will thus have 128 entries; because we start numbering from zero, the entries will be numbered 0 through 127.
Please click on the ASCII Code Chart link to open a new window that will display the chart.
Examine the entries in the ASCII Code Chart. Notice that the
128 entries are divided into four sections of 32 characters.
The first section, in green, consists of the control characters.
Many of these characters (for example, the horizontal tab, HT)
cannot be displayed directly on a screen.
The next section, in light brown, consists of various symbols,
punctuation marks, and numerals. The third section, in blue,
consists mainly of uppercase letters of the alphabet. The fourth and final
section, in light blue, consists mainly of the lowercase letters of the
alphabet.
Each entry has a (very small red) number identifying it. For example, the
uppercase letter A (in the blue third section) has the number
65. This number represents the value of the bit pattern
that represents the letter A.
Here is the exciting part: in C++ the number 65 and the character 'A'
are considered to be the same thing!
Thus, in the following IF statement:
if ('A'==65)
the boolean expression (aka predicate) would evaluate to true.
Note in particular the very first ASCII character: the NULL character (abbreviated NUL). The bit pattern for the NULL character is all zeroes and thus the NULL character has a value of zero. Note also that the blank character is called space (abbreviated SP) in the ASCII code and has a value of 32.
'0'<'1'
evaluates to true because 48 is less than 49.
Similarly the relations
'2'<'3'
through
'8'<'9'
will also evaluate to true.
In ASCII,
the uppercase characters 'A' to 'Z' have the codes 65 through 90.
Thus the boolean expression
'A'<'B'
will evaluate to true.
In ASCII,
the lowercase letters have the codes 97 through 122.
Thus the boolean expression
'a'<'b'
will evaluate to true.
Because of the ASCII encoding scheme,
an uppercase letter is less than a lowercase letter.
Thus the boolean expression
'A'<'a'
will evaluate to true.
Note also that digits (numerals) are less than letters.
Thus the boolean expression
'9'<'A'
will evaluate to true.
Because char values are just a form of integer data,
it is possible to do arithmetic with them. For example,
'C'-'A'
evaluates to 2, because 67-65 equals two.
Can you predict what the following would print?
It would print the character k, because 'k' is the tenth
character after 'a'.
Or, put another way, because 97 (the ASCII value of 'a')
plus 10 equals 107, which is the ASCII value of 'k'.
char ch;
ch = 'a' + 10;
cout << ch;
Usually the C++ compiler can determine from context if we
are using a char item as an ASCII character or as a numeric quantity.
Sometimes, however, we need to be explicit in a program that we want to
use an ASCII character as a numeric quantity.
In those instances, we can use the C++ type cast operator int.
It is used much like a function.
Thus int('0') is the value 48,
int('A') is the value 65,
and int('a') is the value 97.
Similarly, the C++ type cast operator char
will produce a character whose code is the given integer.
Hence char(97) is 'a', char(48) is '0', etc.
Thus the following code would first print the letter A on one line,
the number 65 on the second line,
the number 37 on the third line,
and the symbol % (percent sign) on the fourth line.
int ac = 37;
char ch = 'A';
cout << ch << endl; // prints A
cout << int(ch) << endl; // prints 65
cout << ac << endl; // prints 37
cout << char(ac) << endl; // prints %
Function | Purpose |
---|---|
tolower(ch) | If ch is uppercase, the function returns the corresponding
lowercase letter. Otherwise, ch is returned. |
toupper(ch) | If ch is lowercase, the function returns the corresponding
uppercase letter. Otherwise, ch is returned. |
isalpha(ch) | Returns true if ch is an upper or lower case letter.
Otherwise false is returned. |
isdigit(ch) | Returns true if ch is a digit.
Otherwise false is returned. |
islower(ch) | Returns true if ch is a lowercase letter.
Otherwise false is returned. |
isupper(ch) | Returns true if ch is an uppercase letter.
Otherwise false is returned. |
isspace(ch) | Returns true if ch is
a space (SP), newline (LF), formfeed (FF), carriage return (CR),
tab (HT), or vertical tab (VT).
Otherwise false is returned. |
bool isdigit(char ch);
cin >> ch;
causes, by default, the blanks to be skipped
and the character j is read into ch.
We can ask C++ not to skip whitespace on input by setting a format state flag. The following C++ statement would cause the flag skipws to be turned off (it defaults to on, which means that whitespace is skipped over):
cin.unsetf(ios::skipws);
After the flag is turned off, if the input is
cin >> ch;
will cause ch to contain a blank
since whitespace is no longer skipped.
Turning on and off a format state flag is a hassle.
C++ provides an easier mechanism to allow one to input one character
at a time with whitespace treated the same way as any other character.
The get() function reads the very next character in the input stream
without skipping any whitespace.
Using get() instead of cin in the above example,
if the input is again
cin.get(ch);
will read a blank into the variable ch.
$ script lab13ex4.log
$ pr -n -t -e4 cla13a.cpp
$ c++ cla13a.cpp -o recase
$ recase < $CLA/cla13a.cpp
$ exit
(Be sure to properly exit the script session!)
Note: this section is optional reading.
If you are pressed for time, you may skip to
the next section
(C-style Character Strings )
.
If you are French and your name is Thévenod, or you are German and your name is Günter, or you are Italian and your name is Pezzè (note the special characters in each name), clearly ASCII is an inadequate encoding scheme---it simply does not have the characters or symbols necessary to write many foreign words and names. Extra characters and symbols can be grafted atop the ASCII code by relaxing the assumption that bit b7 always be zero. A discussion of other character sets is beyond the scope of this closed lab. However, if you are interested in the subject, you are encouraged to follow the following link: Character sets & Encodings.
But what if you need to write in Chinese or Japanese? Languages such as these have an enormous alphabet of characters. UNICODE is a code that can accommodate even such enormous alphabets. UNICODE is a 16-bit encoding scheme with 65536 characters. (216 = 65536.) The use of UNICODE is becoming more common. (For example, strings in the languages Python 3 and Java are stored as UNICODE characters.) If you are interested in the subject, you are encouraged to visit http://www.unicode.org for more information.
cout << "Enter the test score:";
or
cout << "x=";
Thus a list of characters contained in double quotes
is a C string, also sometimes called a
C-style character string. Because C strings
are stored as an array, the C string
"x="
is stored as
A C string is terminated by the special character '\0' called the NULL character. The NULL character acts as a sentinel and marks the end of a C-style character string.
Note the difference between a C string and a character. For example, the C string "x" is stored as an array
whereas the character 'x' is stored as a single character
Character arrays are used to store C strings. For example:
will cause the array, lastName to contain:
char lastName[10] = "Jackson";
If instead, we had initialized lastName with a string bigger
than can be stored in the array, for example:
then no error message will be generated in C++,
but the following unfortunate situation will occur:
char lastName[10] = "Washington";
"x="
on the answer sheet, one would type:
x = \0
0 1 2
char line[80];
char ch;
int charCount = 0;
// Read a line of text into the character
// array "line". If the line is too long,
// only read the first 79 characters
cin.get(ch);
while ( ch!='\n' && charCount<79 )
{
line[charCount] = ch;
charCount++;
cin.get(ch);
}
// Place the NULL terminating character in the last position
line[charCount] = '\0';
Alternately, the extraction (input) operator >>
can be used to read in a C string.
As when reading numeric and character data,
beginning whitespace characters (blanks, newline, tab, etc.) are ignored;
next, all characters up to the first whitespace character
are read and placed in the variable.
When the first whitespace character is encountered,
reading terminates and the NULL character is placed in the string.
For example, if we have:
and our input is: Mouse, Mickey
char name[20];
cout << "Please enter the name: "
cin >> name;
then internally, name will contain:
The same input will occur no matter how many blanks, tabs, etc preceded Mouse. As you can
see, the first whitespace terminates the string. We must remember to use the whitespace as a
separator of data items. For example, if we read
where test is an int variable and our input is
cin >> name >> test;
then the name will contain:
and the variable test will not have any value read into it.
Output of C Strings
A C string can be printed one character at a time,
for example using the put() function.
Alternately, the C string can be printed all at once using cout.
Assume name is a character string initialized as follows:
The output
char name[15] = "Washington";
will cause the string "Washington" to be printed in the
leftmost 10 positions of a new line.
Assuming the header file, iomanip, has been added to a program,
the instruction:
cout << endl << name;
will cause "Washington" to be printed
right justified in a field of width 15 with 5 preceding blanks.
To left justify the name, we can use
cout << setw(15) << name;
This will cause "Washington" to be printed left-justified
with 5 blanks after the name.
cout << left << setw(15) << name;
You can not turn on right justification,
instead we must resume right justification
by turning off left justification!
The following statement will do this:
cout.unsetf(ios::left);
$ script lab13ex6.log
$ pr -n -t -e4 cla13b.cpp
$ c++ cla13b.cpp
$ a.out
...the data you enter...
$ exit
(Be sure to properly exit the script session!)
Comparison of C Strings
The function strcmp() has been provided to compare two C strings,
say str1 and str2.
If str1 < str2 (based on a character by character comparison),
a value less than 0 is returned.
If str1 and str2 are the same, a value of 0 is returned.
If str1 > str2, a value greater than 0 is returned.
The function makes character comparisons of the elements in
str1 and str2 starting with the 0th character.
It stops when it finds characters that are not equal
or when it reaches the end of one of the strings.
Examples:
strcmp("A","B") returns < 0 (negative)
strcmp("James","Jami") returns < 0 (negative) because 'e' < 'i'
strcmp("135", "24") returns < 0 (negative) because '1' < '2'
strcmp("ABCD","ABC") returns > 0 (positive)
strcmp("ABC","ABCD") returns < 0 (negative)
strcmp("89", "89") returns 0 (zero)
Example:
will cause the variable name to contain:
char name[15];
strcpy(name, "Mr. Mouse");
If the source string is shorter than the destination string, the remaining characters in the destination string remain unchanged. If the source string is longer than the destination string, storage locations following those allocated to the destination string will be overwritten. This may cause surprising (and hard to debug) results.
For example, strlen("Mr. Mouse") returns the value 9.
char str1[5];
char str2[5]="mom";
int length;
strcpy(str1, "Joe");
length = strlen(str1);
strcpy(str1, "Joseph");
length = strlen(str1);
int MyStrLen(char str[]);
Write a main program to test your function. The main program should contain a sentinel-controlled loop that
The loop should terminate when the sentinel string END is read.
(That is, the three input characters "END
".)
Remember to use the function strcmp()
to test for the terminating condition.
Use as your source file name "cla13c.cpp".
As the first line in the source file, have a comment
(much like the programs
$CLA/cla13a.cc
and
$CLA/cla13d.cpp
)
containing your name and CSCI 2170 section number.
Use as your executable file name "length".
You are to submit the source program listing of the source code,
compilation results, and a sample run of the program using input
of your choosing.
Something like the following UNIX commands will let you create what is required:
(Be sure to properly exit the script session!)
$ script lab13ex9.log
$ pr -n -t -e4 cla13c.cpp
$ c++ cla13c.cpp -o length
$ length
...the data you enter...
$ exit
$ man string
On the answer sheet, give the names of five additional C string functions not described above.
Submit the log files you have created in Lab 13 typing
$ handin lab13log lab13ex4.log lab13ex6.log lab13ex9.logFrom the PC you are working on, you must also submit the answer sheet (AnswerSheet13.pdf) using the following directions:
Congratulations!
You have finished Lab 13.