, A GPU is composed of several Streaming Multiprocessor (SMX) (15 on NVIDIA Tesla, vol.40, 2013.

, A SMX executes several threads blocs. The number of threads blocks executed on a SMX is not directly available to the programmer. Shared memory and register usage by the kernel are the two factor limiting the number of blocks executed on a SMX. On the Kepler architecture 65 536 registers and 48 Kbytes of shared memory are available on a SMX

, The L2 cache is common to all SMX and the L1 cache is common to all threads within a SMX

, A CUDA block contains several threads (up to 1 024). The block size is set by the programmer in 3 dimensions. It is a parameter to optimize. Shared memory is a memory bank

P. Borwein and T. Erdélyi, Polynomials and polynomial inequalities, vol.161, 2012.

R. De and L. Briandais, File searching using variable length keys

L. Erik, N. John, O. Stuart, and M. John, Nvidia tesla: A unified graphics and computing architecture, IEEE micro, vol.28, issue.2, 2008.

. Google and . Sparsehash, , pp.2018-2026

F. Hivert and . Hpcombi, , pp.2018-2026

, INTEL. Intel R Xeon R Processor E5-2600 v2 Product Family, 2012.

J. D. Mitchell and M. Torpey, , pp.2018-2026

, NVIDIA. Tesla K40 GPU active accelerator, 2013.

-. Jean and . Pin, Mathematical foundations of automata theory, Lecture notes LIAFA, 2010.

W. , , 2011.