Diferență între revizuiri ale paginii „PC Lab 6”
De la WikiLabs
Jump to navigationJump to searchCbira (discuție | contribuții) |
Cbira (discuție | contribuții) |
||
Linia 16: | Linia 16: | ||
[ 0.9487 0.8944 ] | [ 0.9487 0.8944 ] | ||
− | That is: 0.3162 * 0.3162 + 0.9487 * 0.9487 = 1 and of course, 0.3162 / 0.9487 is kept as 1 / 3 ratio | + | * That is: 0.3162 * 0.3162 + 0.9487 * 0.9487 = 1 and of course, 0.3162 / 0.9487 is kept as 1 / 3 ratio |
− | That is: 0.4472 * 0.4472 + 0.8944 * 0.8944 = 1 and of course, 0.4472 / 0.8944 is kept as 2 / 4 ratio | + | * That is: 0.4472 * 0.4472 + 0.8944 * 0.8944 = 1 and of course, 0.4472 / 0.8944 is kept as 2 / 4 ratio |
# Install opencl drivers for your platform | # Install opencl drivers for your platform |
Versiunea de la data 26 aprilie 2018 17:16
Session 6
Task: run matrix-column normalization using OpenCL (https://www.khronos.org/opencl)
Matrix-column normalization means that, at the end of the process, every sum of squared elements on the same column is 1.
Example: Assuming matrix is
[ 1, 2 ] [ 3, 4 ]
the result of normalization is :
[ 0.3162 0.4472 ] [ 0.9487 0.8944 ]
- That is: 0.3162 * 0.3162 + 0.9487 * 0.9487 = 1 and of course, 0.3162 / 0.9487 is kept as 1 / 3 ratio
- That is: 0.4472 * 0.4472 + 0.8944 * 0.8944 = 1 and of course, 0.4472 / 0.8944 is kept as 2 / 4 ratio
- Install opencl drivers for your platform
- Check what opencl-capable devices with command clinfo
- Run the VectorAddOpenCL app [[1]] to see that all works ok
- Implement the normalization operation on a CPU, for reference.
- Implement the normalization operation across 1 OpenCL thread of a single device. Check the result.
- Implement the normalization operation across multiple OpenCL threads of the same device. Check the result.
- How much faster is the OpenCL op performed on all threads vs. 1 thread on the same Open CL device ?
Note In order to use the ACS GPGPU Cluster see Using ACS Cluster
Please read chapters 1,2, 4,5 (skip Ch3 which is very particular to CUDA) [GP-GPU Programming guide [2]]
Points (out of 10) vs. expected performance:
[[]]