Diferență între revizuiri ale paginii „PC Lab 5”
Cbira (discuție | contribuții) |
Cbira (discuție | contribuții) |
||
(Nu s-au afișat 20 de versiuni intermediare efectuate de același utilizator) | |||
Linia 2: | Linia 2: | ||
'''Session 5''' | '''Session 5''' | ||
− | '''Task: run an open-source profiler (valgrind & | + | '''Task: run an open-source profiler (valgrind & gprof or visual studio) and improve performance of keypoint extraction in ASIFT C++ code''' |
− | + | '''1.''' Download ASIFT project from here: http://www.ipol.im/pub/art/2011/my-asift/ | |
− | + | ||
− | + | '''2.''' Run demo_ASIFT with the two included Adams as input images from the Sixtine Chapel. Horizontal result should look like this: [[Fișier:Hadam.png]] | |
− | + | ||
+ | '''3.''' Modify code to only do "compute_asift_keypoints" (matching is not interesting, since it was covered in the previous session) | ||
+ | |||
+ | '''4.''' Run the valgrind profiler | ||
+ | |||
+ | eg, for dummy program: | ||
+ | |||
+ | g++ -std=c++11 dummy.cpp -o dummy (compile program dummy) | ||
+ | |||
+ | valgrind --tool=callgrind ./dummy (run the program with callgrind; generates a file callgrind.out.12345 that can be viewed with kcachegrind) | ||
+ | |||
+ | kcachegrind whateverprofile.callgrind // open profile.callgrind with kcachegrind | ||
+ | |||
+ | '''5.''' Look over the report, and propose 3 leaf-functions (functions that do not call other functions) for offloading towards a coprocessor. Write the reason for choosing each of them, and how much time is gained by offloading them. Assume coprocessor works at infinite clock, but data is transferred at 200 GB/s. Hint: Use the callgraph (by installing the graphiviz package). Keep a snapshot with the analysis report as proof. Send results/comments/snapshot(s) by e-mail to the teacher. | ||
'''Note''': Valgrind is also great for checking memory leaks: | '''Note''': Valgrind is also great for checking memory leaks: | ||
Linia 16: | Linia 29: | ||
− | Points (out of 10) vs. expected performance () | + | '''Points (out of 10) vs. expected performance''': |
+ | |||
+ | 10 points for identifying 3 most-heavy leaf-functions and correct (within 10%) computation for offloading impact. DESCRIBE_INSTR. | ||
+ | |||
+ | 9 points for identifying 3 most-heavy leaf-functions and acceptable (within 20%) computation for offloading impact. DESCRIBE_INSTR. | ||
+ | |||
+ | 8 points for identifying 2 most-heavy leaf-functions and correct (within 10%) computation for offloading impact. DESCRIBE_INSTR. | ||
+ | |||
+ | 7 points for identifying 2 most-heavy leaf-functions and resonable (within 30%) computation for offloading impact. DESCRIBE_INSTR. | ||
+ | |||
+ | 6 points for identifying 1 most-heavy leaf-functions and resonable (within 30%) computation for offloading impact. DESCRIBE_INSTR. | ||
+ | |||
+ | 5 points for identifying 1 most-heavy leaf-functions and coarse (within 50%) computation for offloading impact. DESCRIBE_INSTR. | ||
+ | |||
+ | DESCRIBE_INSTR = Write as function prototype with result, name, operand number, types and size (similar to Intel Intrinsics Guide). Write a natural-language description of the behaviour (or alternatively, the formal description as in Intel Intrinsics Guide) | ||
+ | |||
− | + | [[Fișier:Callgrind.out.20485.zip]] |
Versiunea curentă din 19 aprilie 2018 16:04
Session 5
Task: run an open-source profiler (valgrind & gprof or visual studio) and improve performance of keypoint extraction in ASIFT C++ code
1. Download ASIFT project from here: http://www.ipol.im/pub/art/2011/my-asift/
2. Run demo_ASIFT with the two included Adams as input images from the Sixtine Chapel. Horizontal result should look like this:
3. Modify code to only do "compute_asift_keypoints" (matching is not interesting, since it was covered in the previous session)
4. Run the valgrind profiler
eg, for dummy program:
g++ -std=c++11 dummy.cpp -o dummy (compile program dummy)
valgrind --tool=callgrind ./dummy (run the program with callgrind; generates a file callgrind.out.12345 that can be viewed with kcachegrind)
kcachegrind whateverprofile.callgrind // open profile.callgrind with kcachegrind
5. Look over the report, and propose 3 leaf-functions (functions that do not call other functions) for offloading towards a coprocessor. Write the reason for choosing each of them, and how much time is gained by offloading them. Assume coprocessor works at infinite clock, but data is transferred at 200 GB/s. Hint: Use the callgraph (by installing the graphiviz package). Keep a snapshot with the analysis report as proof. Send results/comments/snapshot(s) by e-mail to the teacher.
Note: Valgrind is also great for checking memory leaks:
valgrind --leak-check=full <path>
valgrind --tool=memcheck <path>
Points (out of 10) vs. expected performance:
10 points for identifying 3 most-heavy leaf-functions and correct (within 10%) computation for offloading impact. DESCRIBE_INSTR.
9 points for identifying 3 most-heavy leaf-functions and acceptable (within 20%) computation for offloading impact. DESCRIBE_INSTR.
8 points for identifying 2 most-heavy leaf-functions and correct (within 10%) computation for offloading impact. DESCRIBE_INSTR.
7 points for identifying 2 most-heavy leaf-functions and resonable (within 30%) computation for offloading impact. DESCRIBE_INSTR.
6 points for identifying 1 most-heavy leaf-functions and resonable (within 30%) computation for offloading impact. DESCRIBE_INSTR.
5 points for identifying 1 most-heavy leaf-functions and coarse (within 50%) computation for offloading impact. DESCRIBE_INSTR.
DESCRIBE_INSTR = Write as function prototype with result, name, operand number, types and size (similar to Intel Intrinsics Guide). Write a natural-language description of the behaviour (or alternatively, the formal description as in Intel Intrinsics Guide)