c++ - Why 4 process better than 4 thread? -
void task1(void* arg) { static volatile long res = 1; (long = 0; < 100000000; ++i) { res ^= (i + 1) * 3 >> 2; } }
4 threads, working simultaneously, perform task1 193 times in 30 seconds. 4 process, working simultaneously, perform task1 348 times in 30 seconds. why such big difference? tested on [mac os x 10.7.5, intel core i5 (4 logical cores)]. think, same difference in windows , linux.
the res
variable static
, means shared of threads in same process. means in case of 4 threads, each modification of res
variable in 1 thread has made available other threads, involves sort of locking on bus, invalidation of level 1 cache , reload in other cpus.
in case of 4 processes, variable not shared different processes, can run in parallel without interfering on each other.
note main difference not thread/process, fact in 1 case accesses same variable while in other access different ones. also, in case of threads, real issue not performance, fact final result incorrect:
res ^= x;
that not atomic operation, processor load old value of res
, modify in register , write back. without synchronization primitives, multiple threads can load same value, modify independently , write variable, in case work of of threads overwritten others. end result depend on execution pattern of different threads, not on code of program.
to simulate non-sharing of variables need make sure in threads access different cache-lines. simplest change drop static
qualifier variable, each thread update variable inside it's own stack, in different memory address variables of other threads, , map different cache line. option creating 4 variables together, adding enough padding between them spread different cache lines:
struct padded_long { volatile unsigned long res; char [cache_line_size - sizeof(long)]; // find in processor documentation }; void f(void *) { static padded_long res[4]; // detect thread running based on argument , use res[0]..res[3] // different threads
Comments
Post a Comment