C++ - the fastest integer type? - 【StackMirror】|c++|performance

I've being benchmarking an algorithm, it's not necessary to know the details. The main components are a buffer(raw array of integers) and an indexer (integer - used to access the elements in buffer).

The fastest types for the buffer seem to be unsigned char, and both signed and unsigned versions of short, int, long. However char/signed char was slower. Difference: 1.07x.

For the indexer there was no difference between signed and unsigned types. However int and long were 1.21x faster than char and short.

Is there a type that should be used by default when considering performance and not memory consumption?

NOTE: The operations used on the elements of the buffer and the indexer were assignment, increment, decrement and comparison.

2012-04-04 16:49
by NFRCR

How are you measuring this? Is it on a system with next to no other processing running? Are you counting it using timers, or are you using a JTAG connection to a dev board and counting CPU cycles - DevNull 2012-04-04 16:52

Yes it's important to know the details, because you're probably actually measuring memory bandwidth and type conversion at some point - Jem 2012-04-04 16:53

Have a look at stdint.h. You might be interested in the int_fast32_t type. (or whatever size you prefer - Mysticial 2012-04-04 16:53

I'm using std::clock() to measure the clocks taken to run the algorithm. The algorithm runs long enough for the results obtained with std::clock() to be valid - NFRCR 2012-04-04 16:55

Is your algorithm multi-threaded - Branko Dimitrijevic 2012-04-04 17:36

No, it's single threaded - NFRCR 2012-04-04 17:46

Generally the biggest win comes from cacheing.

If your data values are small enough that they fit in 8 bits then you can fit more of the data in the CPU cache than if you used ints and wasted 3 bytes/value. If you are processing a block of data you get a huge speed advantage for cache hits.

The type of the index is less important, as long as it fits in a CPU register (ie don't try using a long long on an 8bit CPU) it will have the same speed

edit: it's also worth mentioning that measuring speed is tricky. You need to run the algorithm several times to allow for caching, you need to watch what else is running on the CPU and even what other hardware might be interrupting. Speed differences of 10% might be considered noise unless you are very careful.

2012-04-04 16:53
by Martin Beckett

Could it be that for the indexer unsigned int is faster than unsigned short, because on x86 the array [] operator expects an unsigned int - NFRCR 2012-04-04 16:57

The OP is noticing that the larger datatypes are faster. Wouldn't that work against your argument that the difference is caused by cache - Mysticial 2012-04-04 17:02

@Mysticial "The fastest types for the buffer seem to be unsigned char". The OP is also talking about 7% differences, unless you are really careful this is difficult to measure on a general purpose OS+P - Martin Beckett 2012-04-04 17:06

@NFRCR - any decent compiler will handle indexing arrays very well, anything that matters is done at compile time - Martin Beckett 2012-04-04 17:07

Oh right, I overlooked the first part of the question. I retract my statement. : - Mysticial 2012-04-04 17:10

If I read the question correctly, we have the following situation: unsigned char < int < signed char (where "<" means "faster"). This can't be explained by caching. Your point about measurement is valid though - Branko Dimitrijevic 2012-04-04 17:34

once I measured 4 times optimisation replacing char by int in crc8 implementation (32bit system). I think the native type (with size == size of pointer) is fastest in a case when no caching is involve - Andriy Tylychko 2015-07-15 17:11

It depends heavily on the underlying architecture. Usually fastest data types are those that are word-wide. In my experience with IA32 (x86-32), smaller/bigger than word data types incur in penalties, sometimes even more than one memory read for one single data.

Once on the CPU registers, usually data type length doesn't matter (if the whole data fits in one register, that is) but what operations you accomplish with them. Of course floating point operations are the most costly; the fastest being adding, subtracting (which is also comparing), bit-wise (shift and the like), and logical operations (and, or...).

2012-04-04 18:09
by m0skit0

Generally use int for local temporary variables (unless you need them to be wider in 64bit, then use size_t or whatever). Loading/storing to char or short is near-free. (movzx / movsx loads don't take an ALU uop at all, and are handled in the load port.) So arrays should use narrow types to minimize cache consumption - Peter Cordes 2016-03-01 04:35

There are no promises about which type is faster or slower. int is supposed to represent the natural word length of the machine, whatever that might mean, so it might go faster. Or slower, depending upon other factors.

2012-04-04 16:54
by Robᵩ

Can you please comment some of those factors that might make word-wide types slower - m0skit0 2012-04-04 18:10

Not with any authority - Robᵩ 2012-04-04 18:13

@m0skit0: Bus speed, caching, or if the CPU can't naively handle that type, probably other factors - Mooing Duck 2012-04-04 18:13

Bus speed and cache should be optimized for word-size types in a well designed architecture. The CPU must excel at that data type because I said "word-size" types - m0skit0 2012-04-05 10:52

@m0skit0: Bus speed and cache are often optimized for multiples of the word-size types rather than actually the word-size type, and even if it were 1:1, using data types that are half the word size causes them to load twice as fast through the cache/bus. On my pc, I can load ~8 chars in about the same speed as a single int - Mooing Duck 2017-09-21 22:29

The following are typedefs of fundamental integral types or extended integral types.

check fast mod. you can find out for other types(char) fast mod as well.

library is :: cstdint

uint_fast8_t :: my suggestion

http://www.cplusplus.com/reference/cstdint/

??you may need to know about the architecture of machine you are using!!

2017-05-14 08:59
by rulf

-1

As it was said int in most cases represent the machine word. So int will have the same length as processor register has, so no additional actions won't be done to put int to register and than back to RAM.

While if you use char it is 4 times smaller (on x86 systems) than int and also 4 times smaller than processor register. So before it will be put to RAM it should be truncated. As a result more time is used.

Furthermore, processor which has 32bits register can't perform operations with 8bits number. If char is add to char they both are put to register. So the each register will have 8bits of char value and 24bits of trash. Two 32bits values will be added and then the result will be back truncated to 8bits. The reason why char and short works the same time is the fact that the same number of additional operations is used. While for int additional operations are not done.

I would like to add that for processor int and unsigned int is completely the same as it treats them in the same way. For some compilers int and long int also may be the same.

So the fastest integer type is the type which length is the same as machine word. If you use types with smaller size than machine word the program will work slower.

2012-04-04 17:06
by Seagull

None of this is actually true on most modern machines. Machine words are 64 bits, but int is 32. Intel architecture has instructions to load and store bytes, and to operate on bytes, so truncation has 0 cost. Intel architecture also supports direct operations on bytes. And the "fastest integer type" will depend on what you're doing; on a modern machine, memory accesses and locality generally play a predominant role - James Kanze 2012-04-04 18:06

movzx / movsx zero or sign-extending loads to integer registers are nearly free. On Intel CPUs, they don't even take an ALU uop; it's all done in the load unit. And even if you do have high garbage in a register, that doesn't stop you from doing an add/sub or whatever and then storing the low 8 of the result, which won't be affected by high garbage - Peter Cordes 2016-03-01 04:30