Advanced C Arrays
In C, an array is formed by laying out all the elements contiguously in memory. The square bracket syntax can be used to refer to the elements in the array. The array as a whole is referred to by the address of the first element which is also known as the "base address" of the whole array.
{
int array[6];
int sum = 0;
sum += array[0] + array[1]; // refer to elements using []
}
0 1 2 3 4 5
array
Index
array[0] array[1] array[2] ...
The array name acts like a pointer to the first element- in this case an (int*). The programmer can refer to elements in the array with the simple [ ] syntax such as array. This scheme works by combining the base address of the whole array with the index to compute the base address of the desired element in the array. It just requires a little arithmetic. Each element takes up a fixed number of bytes which is known at compile-time. So the address of element n in the array using 0 based indexing will be at an offset of (n * element_size) bytes from the base address of the whole array. address of nth element = address_of_0th_element + (n * element_size_in_bytes)
know what it's doing. The [ ] takes the integer index, multiplies by the element size, adds
the resulting offset to the array base address, and finally dereferences the resulting pointer to get to the desired element.
{
int intArray[6];
intArray[3] = 13;
}
34
0 1 2 3 4 5
intArray (intArray+3)
Index
Offset
in bytes =
n * elem_size
0 4 8 12 16 20
Assume sizeof(int) = 4i.e. Each array
element takes up 4 bytes.
13
12 bytes of offset
'+' Syntax
In a closely related piece of syntax, a + between a pointer and an integer does the same
offset computation, but leaves the result as a pointer. The square bracket syntax gives the nth element while the + syntax gives a pointer to the nth element. So the expression (intArray + 3) is a pointer to the integer intArray[3]. (intArray + 3) is of type (int*) while intArray[3] is of type int. The two expressions only differ by whether the pointer is dereferenced or not. So the expression (intArray + 3) is exactly equivalent to the expression (&(intArray[3])). In fact those two probably compile to exactly the same code. They both represent a pointer to the element at index 3. Any [] expression can be written with the + syntax instead. We just need to add in the pointer dereference. So intArray[3] is exactly equivalent to *(intArray + 3). For most purposes, it's easiest and most readable to use the [] syntax. Every once in a while the + is convenient if you needed a pointer to the element instead of the element itself. Pointer++ Style -- strcpy()
If p is a pointer to an element in an array, then (p+1) points to the next element in the array. Code can exploit this using the construct p++ to step a pointer over the elements in an array. It doesn't help readability any, so I can't recommend the technique, but you may see it in code written by others.
(This example was originally inspired by Mike Cleron) There's a library function called
strcpy(char* destination, char* source) which copies the bytes of a C string from one place to another. Below are four different implementations of strcpy() written in order: from most verbose to most cryptic. In the first one, the normally straightforward while loop is actually sortof tricky to ensure that the terminating null character is copied over. The second removes that trickiness by moving assignment into the test. The last two are cute (and they demonstrate using ++ on pointers), but not really the sort of code you want to maintain. Among the four, I think strcpy2() is the best stylistically. With a smart compiler, all four will compile to basically the same code with the same efficiency. 35
// Unfortunately, a straight while or for loop won't work.
// The best we can do is use a while (1) with the test
// in the middle of the loop.
void strcpy1(char dest[], const char source[]) {
int i = 0;
while (1) {
dest[i] = source[i];
if (dest[i] == '\0') break; // we're done
i++;
}
}
// Move the assignment into the test
void strcpy2(char dest[], const char source[]) {
int i = 0;
while ((dest[i] = source[i]) != '\0') {
i++;
}
}
// Get rid of i and just move the pointers.
// Relies on the precedence of * and ++.
void strcpy3(char dest[], const char source[])
{
while ((*dest++ = *source++) != '\0') ;
}
// Rely on the fact that '\0' is equivalent to FALSE
void strcpy4(char dest[], const char source[])
{
while (*dest++ = *source++) ;
}
Pointer Type Effects
Both [ ] and + implicitly use the compile time type of the pointer to compute the element_size which affects the offset arithmetic. When looking at code, it's easy to assume that everything is in the units of bytes.
int *p;
p = p + 12; // at run-time, what does this add to p? 12?
The above code does not add the number 12 to the address in p-- that would increment p
by 12 bytes. The code above increments p by 12 ints. Each int probably takes 4 bytes, so at run time the code will effectively increment the address in p by 48. The compiler figures all this out based on the type of the pointer. Using casts, the following code really does just add 12 to the address in the pointer p. It works by telling the compiler that the pointer points to char instead of int. The size of char is defined to be exactly 1 byte (or whatever the smallest addressable unit is on the
computer). In other words, sizeof(char) is always 1. We then cast the resulting 36 (char*) back to an (int*). The programmer is allowed to cast any pointer type to any other pointer type like this to change the code the compiler generates.
p = (int*) ( ((char*)p) + 12);
Arrays and Pointers
One effect of the C array scheme is that the compiler does not distinguish meaningfully
between arrays and pointers-- they both just look like pointers. In the following example,
the value of intArray is a pointer to the first element in the array so it's an (int*). The value of the variable intPtr is also (int*) and it is set to point to a single integer i. So what's the difference between intArray and intPtr? Not much as far as the compiler is concerned. They are both just (int*) pointers, and the compiler is perfectly happy to apply the [] or + syntax to either. It's the programmer's responsibility to ensure that the elements referred to by a [] or + operation really are there. Really its' just the same old rule that C doesn't do any bounds checking. C thinks of the single integer i as just a sort of degenerate array of size 1.
{
int intArray[6];
int *intPtr;
int i;
intPtr = &i;
intArray[3] = 13; // ok
intPtr[0] = 12; // odd, but ok. Changes i.
intPtr[3] = 13; // BAD! There is no integer reserved here!
}
37
0 1 2 3 4 5
intArray
i
intPtr
(intArray+3)
(intPtr+3)
These bytes exist, but they have not been explicitly reserved. They are the bytes which happen to be adjacent to the memory for i. They are probably being used to store something already, such as a smashed looking smiley face. The 13 just gets blindly written over the smiley face. This error will only be apparent later when the program tries to read the smiley face data.
Index
13
12 13
Array Names Are Const One subtle distinction between an array and a pointer, is that the pointer which represents the base address of an array cannot be changed in the code. The array base address behaves like a const pointer. The constraint applies to the name of the array where it is declared in the code-- the variable ints in the example below.
{
int ints[100]
int *p;
int i;
ints = NULL; // NO, cannot change the base addr ptr
ints = &i; // NO
ints = ints + 1; // NO
ints++; // NO
p = ints; // OK, p is a regular pointer which can be changed
// here it is getting a copy of the ints pointer
p++; // OK, p can still be changed (and ints cannot)
p = NULL; // OK
p = &i; // OK
foo(ints); // OK (possible foo definitions are below)
}
Array parameters are passed as pointers. The following two definitions of foo look different, but to the compiler they mean exactly the same thing. It's preferable to use whichever syntax is more accurate for readability. If the pointer coming in really is the base address of a whole array, then use [ ].
void foo(int arrayParam[]) {
arrayParam = NULL; // Silly but valid. Just changes the local pointer
}
void foo(int *arrayParam) {
arrayParam = NULL; // ditto
If you enjoyed this post and wish to be informed whenever a new post is published, then make sure you subscribe to my regular Email Updates. Subscribe Now!
0 comments:
Have any question? Feel Free To Post Below: