Diferență între revizuiri ale paginii „PC Lab 6”
De la WikiLabs
Jump to navigationJump to searchCbira (discuție | contribuții) |
Cbira (discuție | contribuții) |
||
(Nu s-au afișat 5 versiuni intermediare efectuate de același utilizator) | |||
Linia 16: | Linia 16: | ||
[ 0.9487 0.8944 ] | [ 0.9487 0.8944 ] | ||
− | That is: 0.3162 * 0.3162 + 0.9487 * 0.9487 = 1 and of course, 0.3162 / 0.9487 is kept as 1 / 3 ratio | + | * That is: 0.3162 * 0.3162 + 0.9487 * 0.9487 = 1 and of course, 0.3162 / 0.9487 is kept as 1 / 3 ratio |
− | That is: 0.4472 * 0.4472 + 0.8944 * 0.8944 = 1 and of course, 0.4472 / 0.8944 is kept as 2 / 4 ratio | + | * That is: 0.4472 * 0.4472 + 0.8944 * 0.8944 = 1 and of course, 0.4472 / 0.8944 is kept as 2 / 4 ratio |
# Install opencl drivers for your platform | # Install opencl drivers for your platform | ||
# Check what opencl-capable devices with command '''clinfo''' | # Check what opencl-capable devices with command '''clinfo''' | ||
− | # Run the VectorAddOpenCL app [[http://wiki.dcae.pub.ro/images/4/4a/VectorAddOpenCL.cpp]] to see that all works ok | + | # Run the VectorAddOpenCL app [[http://wiki.dcae.pub.ro/images/4/4a/VectorAddOpenCL.cpp]] to see that all works ok. |
− | # Implement the normalization operation on a CPU, for reference. | + | # Implement the normalization operation on a CPU, for reference purposes. |
− | # Implement the normalization operation across 1 OpenCL thread of a single device. Check the result. | + | # Implement the normalization operation across 1 OpenCL thread of a single device. Check the result against CPU. |
− | # Implement the normalization operation across multiple OpenCL threads of the same device. Check the result. | + | # Implement the normalization operation across multiple OpenCL threads of the same device. Check the result against CPU. |
# How much faster is the OpenCL op performed on all threads vs. 1 thread on the same Open CL device ? | # How much faster is the OpenCL op performed on all threads vs. 1 thread on the same Open CL device ? | ||
+ | # Send e-mail to the teacher, with subject PAO_Lab_6, x86 CPU configuration (eg. i7-2670QM 4C/8T @ 2.2 GHz) and GPU configuration (nVidia GT 540M / 96 CudaCores @ 1344 MHz) | ||
+ | |||
+ | '''Note''' In order to use the ACS GPGPU Cluster see [[Using ACS Cluster]] | ||
+ | |||
+ | Please read chapters 1,2, 4,5 (skip Ch3 which is very particular to CUDA) | ||
+ | ['''GP-GPU Programming guide''' [https://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf]] | ||
+ | |||
'''Points (out of 10) vs. expected performance''': | '''Points (out of 10) vs. expected performance''': |
Versiunea curentă din 26 aprilie 2018 17:43
Session 6
Task: run matrix-column normalization using OpenCL (https://www.khronos.org/opencl)
Matrix-column normalization means that, at the end of the process, every sum of squared elements on the same column is 1.
Example: Assuming matrix is
[ 1, 2 ] [ 3, 4 ]
the result of normalization is :
[ 0.3162 0.4472 ] [ 0.9487 0.8944 ]
- That is: 0.3162 * 0.3162 + 0.9487 * 0.9487 = 1 and of course, 0.3162 / 0.9487 is kept as 1 / 3 ratio
- That is: 0.4472 * 0.4472 + 0.8944 * 0.8944 = 1 and of course, 0.4472 / 0.8944 is kept as 2 / 4 ratio
- Install opencl drivers for your platform
- Check what opencl-capable devices with command clinfo
- Run the VectorAddOpenCL app [[1]] to see that all works ok.
- Implement the normalization operation on a CPU, for reference purposes.
- Implement the normalization operation across 1 OpenCL thread of a single device. Check the result against CPU.
- Implement the normalization operation across multiple OpenCL threads of the same device. Check the result against CPU.
- How much faster is the OpenCL op performed on all threads vs. 1 thread on the same Open CL device ?
- Send e-mail to the teacher, with subject PAO_Lab_6, x86 CPU configuration (eg. i7-2670QM 4C/8T @ 2.2 GHz) and GPU configuration (nVidia GT 540M / 96 CudaCores @ 1344 MHz)
Note In order to use the ACS GPGPU Cluster see Using ACS Cluster
Please read chapters 1,2, 4,5 (skip Ch3 which is very particular to CUDA) [GP-GPU Programming guide [2]]
Points (out of 10) vs. expected performance:
[[]]