Skip to main content
Engineering LibreTexts

5.4: Character Data Type

  • Page ID
    29050
  • Overview of Character Data Type

    The character data type basically represents individual or single characters. Characters comprise a variety of symbols such as the alphabet (both upper and lower case) the numeral digits (0 to 9), punctuation, etc. All computers store character data in a one byte field as an integer value. Because a byte consists of 8 bits, this one byte field has 28 or 256 possibilities using the positive values of 0 to 255.

    Most microcomputers use the ASCII (stands for American Standard Code for Information Interchange and is pronounced "ask-key") Character Set which has established values for 0 to 127. For the values of 128 to 255 they usually use the Extended ASCII Character Set. When we hit the capital A on the keyboard, the keyboard sends a byte with the bit pattern equal to an integer 65. When the byte is sent from the memory to the monitor, the monitor converts the integer value of 65 to into the symbol of the capital A to display on the monitor.

    The character data type attributes include:

    C++ Reserved Word char
    Represent Single characters
    Size 1 byte
    Normal Signage Unsigned (positive values only)
    Domain (Values Allowed) Values from 0 to 127 as shown in the standard ASCII Character Set, plus values 128 to 255 from Extended ASCII Character Set
    C++ syntax rule Single quote marks - Example: 'A'

     Notice that char and unsigned char are both 1 byte, a wide char is 2 to 4 bytes. 

    DATA TYPE SIZE (IN BYTES) RANGE
    signed char 1 -128 to 127
    unsigned char 1 0 to 255
    wchar_t 2 or 4 1 wide character

     Since some languages cannot represent all of their alphabet's characters in an 8 bit value, it was decided to create wide characters to solve this issue. In 1989, the International Organization for Standardization began work on the Universal Character Set (UCS), a multilingual character set that could be encoded using either a 16-bit (2-byte) or 32-bit (4-byte) value. These larger values required the use of a datatype larger than 8-bits to store the new character values in memory. Thus the term wide character was used to differentiate them from traditional 8-bit character datatypes.

    Character arithmetic in C++

    As already known character known character range is between -128 to 127 or 0 to 255. This point has to be kept in mind while doing character arithmetic. To understand better let’s take an example.

    Look at this example to understand better.

    // A C++ program to demonstrate character
    // arithmetic in C++.
    #include <bits/stdc++.h>
    using namespace std;
      
    int main()
    {
        char ch = 65;
        // The numerical value is 65, BUT...this is declared as a char, so it outputs the char that is 65
        // See https://www.ascii-code.com/ - scroll down to 65 and look at the 5th column
        cout << ch << endl;
        
        // Now we add zero and C++ will see it as an integer value - it gets promoted.
        cout << ch + 0 << endl;
        
        // We add 32 but force it back to a char with "char(ch + 32)" 65 + 32 = 97 
        // Look again at the https://www.ascii-code.com/ table for 97
        cout << char(ch + 32) << endl;
        return 0;
    } 

    Output:

    A
    65
    a 
    

    Without a ‘+’ operator character value is printed. But when used along with ‘+’ operator behaved differently. Use of ‘+’ operator implicitly typecasts it to an ‘int’. So to conclude, in character arithmetic, typecasting of char variable to ‘char’ is explicit and to ‘int’ it is implicit.

    Adapted from:
    "C++ Data Types" by Harsh Agarwal, Geeks for Geeks is licensed under CC BY-SA 4.0
    "Character Data Type" by Kenneth Leroy Busbee, (Download for free at http://cnx.org/contents/303800f3-07f...93e8948c5@22.2) is licensed under CC BY 4.0
    "Character arithmetic in C and C++" by Parveen Kumar, Geeks for Geeks is licensed under CC BY-SA 4.0