c - cudaEventRecord() Does not time correctly on Visual Studio CPU code -
while doing basic examples of cuda made nvidia copied code test speedup cpu gpu computing matrix multiplication.
after 30 minutes looking results , seeing cpu (yes cpu) doing 1000 times faster computations gpu realised timing not working correctly. snipped of code looks (this code nvidia):
//create timers cudaevent_t start; cudaevent_t stop; float simplekerneltime; float optimisedkerneltime; //start timer cudaeventcreate(&start); cudaeventcreate(&stop); cudaeventrecord(start, 0); matrixmultkernel<<<grid, block >>>(a_d, b_d, c_d, n); cudaeventrecord(stop, 0); cudaeventsynchronize(stop); cudaeventelapsedtime(&elapsedtime, start, stop); // print time , other things cudaeventrecord(start, 0); matrixmultcpu(a_h, b_h, d_, n); cudaeventrecord(stop, 0) cudaeventsynchronize(stop); cudaeventelapsedtime(&elapsedtime, start, stop); // print time
this code works fine on linux machine (i copied same code person next me , getting timing) on windows 8 machine visual studio 2013, timing on cpu part (second half of snipped) not working (always gave ~0.003ms).
why happening? fixed using <time.h>
(removing cudaeventrecord()
calls , using standard c code timing approaches), don't want know how fix it, more why happening.
from understand cuda events not designed measure cpu-only (host-only) time per se, rather kernel execution , cuda api calls. cuda c programming guide 3.2.5.6.
events (emphasis mine):
the runtime provides way closely monitor the device's progress, perform accurate timing, letting application asynchronously record events @ point in program , query when these events completed.
i suprised time (kernel launches asynchronous), code missing cudaeventsynchronize()
:
cudaeventrecord(stop, 0); cudaeventsynchronize(stop); cudaeventelapsedtime(&elapsedtime, start, stop);
see how implement performance metrics in cuda c/c++.
for cpu-only time measurement see this thread.
edit:
to correct time matrixmultcpu()
need add synchronization start
event:
cudaeventrecord(start, 0); cudaeventsynchronize(start);
Comments
Post a Comment