Windows C++ nanosecond timing?

Go To StackoverFlow.com

2

Is there a way in C++ on windows to measure time in nanoseconds?

All i can find are linux solutions.

2012-04-04 21:56
by user997112
See Boost.Chrono - ildjarn 2012-04-04 21:59
QueryPerformanceCounter is Windows, though Boost is as good and is also portable - Mooing Duck 2012-04-04 21:59
You can not accurately measure execution time on most systems beyond units of seconds - AJG85 2012-04-04 22:23
@AJG85 On Windows you can get down to ~10ns resolution in WinN - Mooing Duck 2012-04-04 22:26
If you're using VS11 you should use the chrono library, and you should go and upvote this issue on MS connect - bames53 2012-04-04 22:26
I highly doubt that you need nanoseconds. Thats mostly like writing down the results of a physical experiment with 20 digits. If you use nanoseconds you have to watch out for every memory access because a full cache miss can add 30ns just for a memory access if you total random - Lothar 2012-04-04 22:38
@bames53 : Thanks for the bug link, upvoted - ildjarn 2012-04-04 23:13
The Fastest timing resolution system thread at SO discusses this matter as well - Arno 2012-07-31 14:00


5

Use the QueryPerformanceFrequency function to see what speed the QueryPerformanceCounter runs at. I think it might be in the nanosecond range.

2012-04-04 21:59
by Mark Ransom
Very old API call and does not adjust for task switches - Lothar 2012-04-04 22:01
@Lothar: If you measure execution time in nanoseconds you will notice when a task switch occurs and you can skip that measurement sample - Andreas Magnusson 2012-04-04 22:23


4

Look into QueryPerformanceCounter on windows.

When timing code to identify performance bottlenecks, you want to use the highest resolution timer the system has to offer. This article describes how to use the QueryPerformanceCounter function to time application code

http://support.microsoft.com/kb/172338

2012-04-04 22:00
by Dan P
This is a comment, not an good answer - Mooing Duck 2012-04-04 22:01
@MooingDuck: Why not - Mehrdad 2012-04-04 22:02
@Mehrdad: Because when I read this, I have made no progress towards understanding the answer to the question. This is a link. It should have at least a summary - Mooing Duck 2012-04-04 22:08
@MooingDuck: That just makes it a small answer. You do have progress because now you have a resource to go to and read. I would fully expect readers of answers to be able to click a link and read. Summary might be nice, but it's not necessary, and certainly doesn't make this a comment. (What's not okay is a single link to something that might later die. - GManNickG 2012-04-04 22:11
Must not have clicked on the link or even bothered to read what it was about. Here is a summary. When timing code to identify performance bottlenecks, you want to use the highest resolution timer the system has to offer. This article describes how to use the QueryPerformanceCounter function to time application code - Dan P 2012-04-04 22:16
@DanP: That right there should have been part of the answer. Being "correct" does not mean something is a "good answer". Links are great, but they are not everything - Mooing Duck 2012-04-04 22:19
Added summary to the answe - Dan P 2012-04-04 22:21
@MooingDuck: I think clicking links and reading documentation is a skill which this answer (correctly) assumed, and/or tried to teach - Mehrdad 2012-04-04 22:31


1

If you can run your own assembly, you could read the CPU's cycle counter and divide a cycle difference it by the CPU's clock rate:

static inline uint64_t get_cycles()
{
  uint64_t t;
  __asm__ __volatile__ ("rdtsc" : "=A"(t));
  return t;
}
2012-04-04 21:59
by Kerrek SB
IIRC, there's might be a gotcha with this... it's either only available on newer CPUs, or it's only available in kernel mode, or something like that.. - Mehrdad 2012-04-04 22:02
@Mehrdad: rdtsc has been available since P6-family CPUs, possibly even the original Pentium. It can be restricted to kernel mode, but i don't know if Windows does that - cHao 2012-04-04 22:03
@Mehrdad: Nope, it's not restricted to kernel mode on Windows. If you use VC the syntax is long long ticks() { __asm {rdtsc}; } And if by newer CPUs you mean Pentium, yeah, then it's only available on "newer" CPUs. Personally, though, it's been a while since I coded for the 486 and earlier - Andreas Magnusson 2012-04-04 22:21
There is however an issue on multi cores, since the value returned will not be syncronized between CPUs or CPU-cores. But if you only run on a single core/CPU, you'll be fine - Andreas Magnusson 2012-04-04 22:29
@Mehrdad: You'll also need to get the CPU ID and somehow manage a global association of CPU ID and tick counter, but of course if you want to time an operation that gets moved across CPUs you're in trouble. A good OS shouldn't do that, though, since it wouldn't want to spoil the hot cache - Kerrek SB 2012-04-04 22:40


1

Use Windows7 and the Hardware Counter Profiling API http://msdn.microsoft.com/en-us/library/windows/desktop/dd796395(v=vs.85).aspx

Both rdtsc and QueryPerformanceCounter/QueryPerformanceFrequency are not accurate enough because of the large overhead, interrupts and task switches.

[EDIT]: Sorry mixed up the link for PerformanceCounter with Hardware Counters. Sorry have used it only once and this was a quick answer.

2012-04-04 22:01
by Lothar
How does one use that? I can't figure it out - Mooing Duck 2012-04-04 22:04
Here's something: http://msdn.microsoft.com/en-us/magazine/cc163996.asp - Mooing Duck 2012-04-04 22:06
And if you want compare the runtime for different code implementations calculate use the execution cycles not the absolute times - Lothar 2012-04-04 22:06
Why -1 on the answer - GManNickG 2012-04-04 22:09
@GManNickG: Because there was no code, no description, and the link was (past tense) absolutely useless - Mooing Duck 2012-04-04 22:21
@Lothar: Actually the problem with both rdtsc and QueryPerformanceCounter (et al) (most likely QPC is implemented in terms of rdtsc) isn't in the overhead, it's in the synchronization across CPU-cores. Each core has its own time stamp counter and there are no syncronization between them - Andreas Magnusson 2012-04-04 22:38
The good with Hardware counters is also that you can use it to measure the L1 and L2 Cache miss rate. With nanoseconds this is important to keep in mind - Lothar 2012-04-04 22:39
Yes @Andreas, but if for example you just want to add a nanosecond timing around each function it is an overhead problem. QueryPerformanceCounter takes a few thousand clock cycles. The synchronisation between CPUs can be set by setting the thread/CPU affinity. Writing a small program setting the affinity of all other running processes/threads to other CPUs help a lot but it's still a hack. It's a 1996 (Windows 2000) state of API - Lothar 2012-04-04 22:43
@Lothar: Firstly, I'm not saying that anyone should (or shouldn't) use rdtsc (or QPC). It's a tool and as most tools it has pros and cons. It's up to the reader to weigh them and make a judgement. Secondly, calling rdtsc twice in a row takes 24 cycles (and that includes the necessary instructions to save the value from the first call), not so much of an overhead IMHO. Thirdly, last I checked W2K was released in 2000, you must be thinking of Windows NT4 - Andreas Magnusson 2012-04-04 23:14
Ads