The C integer mystery
Consider the following C program:
#include <stdio.h>
int main()
{
unsigned long bigNum = 50000000000; // 50 billion
printf("%lu", bigNum);
}
Will bigNum
print out what we expect?
Let’s run this as a 64-bit program, built with GCC on macOS.
$ ./a.out
50000000000
Great, this looks correct. The program prints what we expect.
Now, let’s run this as a 64-bit program again, but this time with MSVC on Windows 10.
C:\test>main
2755359744
C:\test>
Huh? Why did the value get messed up?
What just happened?
The macOS compiler (GCC) and the Windows 10 compiler (MSVC) defined unsigned long
to be different sizes, when compiling their respective 64-bit programs.
When it comes to 64-bit computing, MSVC follows the LLP64 data model while GCC uses the LP64 data model.
This simply means that the MSVC definition of a long
type was 4 bytes while the GCC definition was 8 bytes.
While the max value was allowed to be up to 2^64 bits (18,446,744,073,709,551,615) with GCC, the max value was only allowed to be up to 2^32 bits (4,294,967,295) with MSVC.
So with MSVC, our value of 50,000,000,000 went past the defined max value of 4,294,967,295.
Due to this integer overflow, the value “wrapped around” 11 times and finally ended up as:
50,000,000,000 % (4,294,967,295 + 1)
or
2755359744.
So is MSVC the problem?
No, not at all. The problem is with the code.
We’d have the same problem with GCC too if we were to compile our code as a 32-bit GCC program.
In the table below (from the GCC manual), we can see that long
types also differ in size between GCC 32-bit and 64-bit programs.
C Integer Type | Size in bytes (32-bit GCC program) | Size in bytes (64-bit GCC program) | Range |
---|---|---|---|
char |
1 | 1 | Same as signed char |
short |
2 | 2 | Same as signed short |
int |
4 | 4 | Same as signed int |
long |
4 | 8 | Same as signed long |
long long |
8 | 8 | Same as signed long long |
signed char |
1 | 1 | -128 to 127 |
signed short |
2 | 2 | -32,768 to 32,767 |
signed int |
4 | 4 | -2,147,483,648 to 2,147,483,647 |
signed long |
4 | At least 4, but could be 8 | -2,147,483,648 to 2,147,483,647 (32-bit representation) or -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 (64-bit representation) |
signed long long |
8 | 8 | -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 |
unsigned char |
1 | 1 | 0 to 255 |
unsigned short |
2 | 2 | 0 to 65,535 |
unsigned int |
4 | 4 | 0 to 4,294,967,295 |
unsigned long |
4 | At least 4, but could be 8 | 0 to 4,294,967,295 (32-bit representation) or 0 to 18,446,744,073,709,551,615 (64-bit representation) |
unsigned long long |
8 | 8 | 0 to 18,446,744,073,709,551,615 |
The lesson here is that, since the size of long
depends on the compiler (which further depends on the operating system and processor), we can’t make assumptions about its size.
How do we check an integer type’s actual size?
We can use sizeof
to check each C integer type.
Below is an example program that we can use for comparing the output among GCC 32-bit, GCC 64-bit, and MSVC 64-bit programs.
#include <stdio.h>
int main ()
{
printf("char %zu\n", sizeof(char));
printf("short %zu\n", sizeof(short));
printf("int %zu\n", sizeof(int));
printf("long %zu\n", sizeof(long));
printf("long long %zu\n", sizeof(long long));
printf("signed char %zu\n", sizeof(char));
printf("signed short %zu\n", sizeof(signed short));
printf("signed int %zu\n", sizeof(signed int));
printf("signed long %zu\n", sizeof(signed long));
printf("signed long long %zu\n", sizeof(signed long long));
printf("unsigned char %zu\n", sizeof(unsigned char));
printf("unsigned short %zu\n", sizeof(unsigned short));
printf("unsigned int %zu\n", sizeof(unsigned int));
printf("unsigned long %zu\n", sizeof(unsigned long));
printf("unsigned long long %zu\n", sizeof(unsigned long long));
}
NOTE: sizeof
actually returns the size as size_t
so we use %zu
as the format specifier. Technically the definition of size_t
also depends on libraries such as the GNU C Library.
Output of 32-bit executable built with x86-64 gcc 11.2, using -m32
compilation option:
From online result: https://godbolt.org/z/obxcqGhMG
char 1
short 2
int 4
long 4
long long 8
signed char 1
signed short 2
signed int 4
signed long 4
signed long long 8
unsigned char 1
unsigned short 2
unsigned int 4
unsigned long 4
unsigned long long 8
Output of 64-bit executable built with x86-64 gcc 11.2, using -m64
compilation option:
From online result: https://godbolt.org/z/zej1a36rT
char 1
short 2
int 4
long 8
long long 8
signed char 1
signed short 2
signed int 4
signed long 8
signed long long 8
unsigned char 1
unsigned short 2
unsigned int 4
unsigned long 8
unsigned long long 8
Output of 64-bit executable (MSVC for x64) on Windows 10:
C:\test>test
char 1
short 2
int 4
long 4
long long 8
signed char 1
signed short 2
signed int 4
signed long 4
signed long long 8
unsigned char 1
unsigned short 2
unsigned int 4
unsigned long 4
unsigned long long 8
C:\test>c:\sigcheck\sigcheck.exe test.exe
Sigcheck v2.82 - File version and signature viewer
Copyright (C) 2004-2021 Mark Russinovich
Sysinternals - www.sysinternals.com
C:\test\test.exe:
Verified: Unsigned
Link date: 2:00 PM 9/16/2021
Publisher: n/a
Company: n/a
Description: n/a
Product: n/a
Prod version: n/a
File version: n/a
MachineType: 64-bit
C:\test>
The above results confirm that the size of long
varies between:
- 32-bit versus 64-bit compilation options, passed to the GCC compiler
- 64-bit GCC programs and 64-bit MSVC programs
In other words, long
is not portable.
Alright, so how do we fix this?
An obvious solution is to choose a more appropriate C integer type.
#include <stdio.h>
int main()
{
unsigned long long bigNum = 50000000000; // 50 billion
printf("%llu", bigNum);
}
This should work for 64-bit and 32-bit programs with any compiler because long long
is guaranteed
to be at least 64 bits.
An alternative solution would be to use explicit fixed-width types defined in the <inttypes.h>
header:
int8_t
- signed 8-bit integeruint16_t
- unsigned 16-bit integerint32_t
- signed 32-bit integeruint64_t
- unsigned 64-bit integer- … and so on.
These types explicitly tell us how many bits represent the type, leaving no mystery as to their actual size.
These sizes will not change across compilers either, which addresses our portability problem.
We can re-purpose our sizeof
program to confirm this:
#include <stdio.h>
#include <inttypes.h>
int main ()
{
printf("int8_t %zu\n", sizeof(int8_t));
printf("int16_t %zu\n", sizeof(int16_t));
printf("int32_t %zu\n", sizeof(int32_t));
printf("int64_t %zu\n", sizeof(int64_t));
printf("uint8_t %zu\n", sizeof(uint8_t));
printf("uint16_t %zu\n", sizeof(uint16_t));
printf("uint32_t %zu\n", sizeof(uint32_t));
printf("uint64_t %zu\n", sizeof(uint64_t));
}
Output of 32-bit executable built with x86-64 gcc 11.2, using -m32
compilation option
From online result: https://godbolt.org/z/MTd8nf1n1
int8_t 1
int16_t 2
int32_t 4
int64_t 8
uint8_t 1
uint16_t 2
uint32_t 4
uint64_t 8
Output of 64-bit executable built with x86-64 gcc 11.2, using -m64
compilation option:
From online result: https://godbolt.org/z/TPrq6TYYW
int8_t 1
int16_t 2
int32_t 4
int64_t 8
uint8_t 1
uint16_t 2
uint32_t 4
uint64_t 8
Output of 64-bit executable built with MSVC for x64 on Windows 10:
C:\test>test
int8_t 1
int16_t 2
int32_t 4
int64_t 8
uint8_t 1
uint16_t 2
uint32_t 4
uint64_t 8
C:\test>c:\sigcheck\sigcheck.exe test.exe
Sigcheck v2.82 - File version and signature viewer
Copyright (C) 2004-2021 Mark Russinovich
Sysinternals - www.sysinternals.com
C:\test\test.exe:
Verified: Unsigned
Link date: 12:01 AM 9/17/2021
Publisher: n/a
Company: n/a
Description: n/a
Product: n/a
Prod version: n/a
File version: n/a
MachineType: 64-bit
C:\test>
The above 3 results are all the same, which should give us confidence that the definitions from <inttypes.h>
are
portable.
References
C Data Types - Wikipedia
https://en.wikipedia.org/wiki/C_data_types
Integer Overflow - Wikipedia
https://en.wikipedia.org/wiki/Integer_overflow
64-bit Data Models - Wikipedia
https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models
GNU Compiler Collection (GCC) Primitive Data Types
https://www.gnu.org/software/gnu-c-manual/gnu-c-manual.html#Primitive-Types
GNU C Library Important Data Types
https://www.gnu.org/software/libc/manual/html_node/Important-Data-Types.html