c - cudaEventRecord() Does not time correctly on Visual Studio CPU code -


while doing basic examples of cuda made nvidia copied code test speedup cpu gpu computing matrix multiplication.

after 30 minutes looking results , seeing cpu (yes cpu) doing 1000 times faster computations gpu realised timing not working correctly. snipped of code looks (this code nvidia):

//create timers cudaevent_t start; cudaevent_t stop; float simplekerneltime; float optimisedkerneltime;  //start timer cudaeventcreate(&start); cudaeventcreate(&stop); cudaeventrecord(start, 0);  matrixmultkernel<<<grid, block >>>(a_d, b_d, c_d, n);  cudaeventrecord(stop, 0); cudaeventsynchronize(stop); cudaeventelapsedtime(&elapsedtime, start, stop);  // print time , other things  cudaeventrecord(start, 0);  matrixmultcpu(a_h, b_h, d_, n);  cudaeventrecord(stop, 0) cudaeventsynchronize(stop); cudaeventelapsedtime(&elapsedtime, start, stop);  // print time 

this code works fine on linux machine (i copied same code person next me , getting timing) on windows 8 machine visual studio 2013, timing on cpu part (second half of snipped) not working (always gave ~0.003ms).

why happening? fixed using <time.h> (removing cudaeventrecord() calls , using standard c code timing approaches), don't want know how fix it, more why happening.

from understand cuda events not designed measure cpu-only (host-only) time per se, rather kernel execution , cuda api calls. cuda c programming guide 3.2.5.6. events (emphasis mine):

the runtime provides way closely monitor the device's progress, perform accurate timing, letting application asynchronously record events @ point in program , query when these events completed.

i suprised time (kernel launches asynchronous), code missing cudaeventsynchronize():

cudaeventrecord(stop, 0); cudaeventsynchronize(stop); cudaeventelapsedtime(&elapsedtime, start, stop); 

see how implement performance metrics in cuda c/c++.

for cpu-only time measurement see this thread.

edit:

to correct time matrixmultcpu() need add synchronization start event:

cudaeventrecord(start, 0); cudaeventsynchronize(start); 

Comments

Popular posts from this blog

How has firefox/gecko HTML+CSS rendering changed in version 38? -

javascript - Complex json ng-repeat -

jquery - Cloning of rows and columns from the old table into the new with colSpan and rowSpan -