5.5: Representing strings
- Page ID
- 40736
Related issues sometimes come up with strings. First, remember that C strings are null-terminated. When you allocate space for a string, don’t forget the extra byte at the end.
Also, the letters and numbers in C strings are encoded in ASCII. The ASCII codes for the digits “0” through “9” are 48 through 57, not 0 through 9. The ASCII code 0 is the NUL character that marks the end of a string. And the ASCII codes 1 through 9 are special characters used in some communication protocols. ASCII code 7 is a bell; on some terminals, printing it makes a sound.
The ASCII code for the letter “A” is 65; the code for “a” is 97. Here are those codes in binary:
65 = b0100 0001 97 = b0110 0001
A careful observer will notice that they differ by a single bit. And this pattern holds for the rest of the letters; the sixth bit (counting from the right) acts as a “case bit”, 0 for upper-case letters and 1 for lower case letters.
As an exercise, write a function that takes a string and converts from lower-case to upper-case by flipping the sixth bit. As a challenge, you can make a faster version by reading the string 32 or 64 bits at a time, rather than one character at a time. This optimization is made easier if the length of the string is a multiple of 4 or 8 bytes.
If you read past the end of a string, you are likely to see strange characters. Conversely, if you write a string and then accidentally read it as an int or float, the results will be hard to interpret.
For example, if you run:
char array[] = "allen"; float *p = array; printf("%f\n", *p);
You will find that the ASCII representation of the first 8 characters of my name, interpreted as a double-precision floating point number, is 69779713878800585457664.