, A GPU is composed of several Streaming Multiprocessor (SMX) (15 on NVIDIA Tesla, vol.40, 2013.
, A SMX executes several threads blocs. The number of threads blocks executed on a SMX is not directly available to the programmer. Shared memory and register usage by the kernel are the two factor limiting the number of blocks executed on a SMX. On the Kepler architecture 65 536 registers and 48 Kbytes of shared memory are available on a SMX
, The L2 cache is common to all SMX and the L1 cache is common to all threads within a SMX
, A CUDA block contains several threads (up to 1 024). The block size is set by the programmer in 3 dimensions. It is a parameter to optimize. Shared memory is a memory bank
Polynomials and polynomial inequalities, vol.161, 2012. ,
File searching using variable length keys ,
Nvidia tesla: A unified graphics and computing architecture, IEEE micro, vol.28, issue.2, 2008. ,
, , pp.2018-2026
, , pp.2018-2026
, INTEL. Intel R Xeon R Processor E5-2600 v2 Product Family, 2012.
, , pp.2018-2026
, NVIDIA. Tesla K40 GPU active accelerator, 2013.
Mathematical foundations of automata theory, Lecture notes LIAFA, 2010. ,
, , 2011.