5.4: Character Data Type

Last updated
Save as PDF

Page ID: 29050

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Overview of Character Data Type

The character data type basically represents individual or single characters. Characters comprise a variety of symbols such as the alphabet (both upper and lower case) the numeral digits (0 to 9), punctuation, etc. All computers store character data in a one byte field as an integer value. Because a byte consists of 8 bits, this one byte field has 28 or 256 possibilities using the positive values of 0 to 255.

Most microcomputers use the ASCII (stands for American Standard Code for Information Interchange and is pronounced "ask-key") Character Set which has established values for 0 to 127. For the values of 128 to 255 they usually use the Extended ASCII Character Set. When we hit the capital A on the keyboard, the keyboard sends a byte with the bit pattern equal to an integer 65. When the byte is sent from the memory to the monitor, the monitor converts the integer value of 65 to into the symbol of the capital A to display on the monitor.

The character data type attributes include:

C++ Reserved Word	char
Represent	Single characters
Size	1 byte
Normal Signage	Unsigned (positive values only)
Domain (Values Allowed)	Values from 0 to 127 as shown in the standard ASCII Character Set, plus values 128 to 255 from Extended ASCII Character Set
C++ syntax rule	Single quote marks - Example: 'A'

Notice that char and unsigned char are both 1 byte, a wide char is 2 to 4 bytes.

DATA TYPE	SIZE (IN BYTES)	RANGE
signed char	1	-128 to 127
unsigned char	1	0 to 255
wchar_t	2 or 4	1 wide character

Since some languages cannot represent all of their alphabet's characters in an 8 bit value, it was decided to create wide characters to solve this issue. In 1989, the International Organization for Standardization began work on the Universal Character Set (UCS), a multilingual character set that could be encoded using either a 16-bit (2-byte) or 32-bit (4-byte) value. These larger values required the use of a datatype larger than 8-bits to store the new character values in memory. Thus the term wide character was used to differentiate them from traditional 8-bit character datatypes.

Character arithmetic in C++

As already known character known character range is between -128 to 127 or 0 to 255. This point has to be kept in mind while doing character arithmetic. To understand better let’s take an example.

Look at this example to understand better.

// A C++ program to demonstrate character
// arithmetic in C++.
#include <bits/stdc++.h>
using namespace std;
  
int main()
{
    char ch = 65;
    // The numerical value is 65, BUT...this is declared as a char, so it outputs the char that is 65
    // See https://www.ascii-code.com/ - scroll down to 65 and look at the 5th column
    cout << ch << endl;
    
    // Now we add zero and C++ will see it as an integer value - it gets promoted.
    cout << ch + 0 << endl;
    
    // We add 32 but force it back to a char with "char(ch + 32)" 65 + 32 = 97 
    // Look again at the https://www.ascii-code.com/ table for 97
    cout << char(ch + 32) << endl;
    return 0;
}

Output:

A
65
a

Without a ‘+’ operator character value is printed. But when used along with ‘+’ operator behaved differently. Use of ‘+’ operator implicitly typecasts it to an ‘int’. So to conclude, in character arithmetic, typecasting of char variable to ‘char’ is explicit and to ‘int’ it is implicit.

Adapted from:
"C++ Data Types" by Harsh Agarwal, Geeks for Geeks is licensed under CC BY-SA 4.0
"Character Data Type" by Kenneth Leroy Busbee, (Download for free at http://cnx.org/contents/303800f3-07f...93e8948c5@22.2) is licensed under CC BY 4.0
"Character arithmetic in C and C++" by Parveen Kumar, Geeks for Geeks is licensed under CC BY-SA 4.0