Performance analysis and optimization: Diferență între versiuni

De la WikiLabs
Jump to navigationJump to search
Fără descriere a modificării
Fără descriere a modificării
 
(Nu s-au afișat 15 versiuni intermediare efectuate de același utilizator)
Linia 1: Linia 1:
Session 1, x86 optimization:
= Results =


C/C++: increase the execution speed for the code that reverses the order of the bits composing a 100 M esantioane of unsigned 32-bit (eg. 10111...11 -> 11..11101)
https://docs.google.com/spreadsheets/d/1GvZ-P-MEA9iPuBFx2onQ8qg0o2cDQSoWFV2Y0T11Fls/edit?usp=sharing


Expected/Presented techniques:
= Support materials =


first implementation ~ 10 seconds
http://www.agner.org/optimize/optimizing_cpp.pdf


compiler optimized ~ 5 seconds
https://www.arm.com/files/pdf/AT_-_Better_C_Code_for_ARM_Devices.pdf


loop unrolling
= Lab sessions =


bit-tricks
# [[PC Lab 1]] - x86 C++ optimizations
# [[PC Lab 2]] - 8-bit MCU: PIC 10F200 asm optimizations
# [[PC Lab 3]] - x86 ???
# [[PC Lab 4]] - x86 U8 SAD/SSD
# [[PC Lab 5]] - x86 ASIFT C++ profiling and optimizations
# [[PC Lab 6]] - OpenCL


optimizing variables into registers
Contact: calin.bira_AT_upb.ro
 
256-entry 8-bit table of 1 KB
 
16-bit table of 64 k entries (256 KB)
 
32-bit table of 4 G entries (16 GB)

Versiunea curentă din 26 aprilie 2018 16:29

Results

https://docs.google.com/spreadsheets/d/1GvZ-P-MEA9iPuBFx2onQ8qg0o2cDQSoWFV2Y0T11Fls/edit?usp=sharing

Support materials

http://www.agner.org/optimize/optimizing_cpp.pdf

https://www.arm.com/files/pdf/AT_-_Better_C_Code_for_ARM_Devices.pdf

Lab sessions

  1. PC Lab 1 - x86 C++ optimizations
  2. PC Lab 2 - 8-bit MCU: PIC 10F200 asm optimizations
  3. PC Lab 3 - x86 ???
  4. PC Lab 4 - x86 U8 SAD/SSD
  5. PC Lab 5 - x86 ASIFT C++ profiling and optimizations
  6. PC Lab 6 - OpenCL

Contact: calin.bira_AT_upb.ro