I did some benchmarking for a dictionary app a while back, and the fastest I cou...

cozzyd · on April 28, 2018

Note that strstr will stop as soon as you have a 0 byte so this is not a fair comparison.

Uninitialized (practically often zeroed) also produces different results from random and explicitly zeroed, since I assume it affects how often it can fail fast.

    #define _GNU_SOURCE
    #include <stdlib.h> 
    #include <stdio.h> 
    #include <string.h> 
    #include <time.h> 

    int main(int nargs, char ** args) 
    {
      struct timespec start; 
      struct timespec end; 
      int mode = atoi(args[1]); 
      const char * needle = "788bbd6b12d79a4ee18e36a8a264fb496465c71b7932a965ed731397b7c97d14";  
      const int N = 200*1024*1024 ;
      char * haystack = malloc(N + sizeof(needle)); 
      strcpy(haystack+N,needle); 

      printf("Mode is %s\n" , mode == 0 ? "zeroed" : 
                              mode == 1 ? "random"        : 
                              "uninitialized"); 

      if (mode == 0) 
      {
        memset(haystack,0,N); 
      }
    
      if (mode == 1) 
      {
        FILE * urandom = fopen("/dev/urandom","r"); 
        int nread = fread(haystack,1,N,urandom); 
        fclose(urandom); 
        if (nread < N) 
        {
          fprintf(stderr,"read returned %d\n",nread); 
        }
      }
    
      clock_gettime(CLOCK_REALTIME, &start); 
      void * found = memmem(haystack,N+sizeof(needle),needle,sizeof(needle)); 
      clock_gettime(CLOCK_REALTIME, &end); 
    
      printf("Took %g secs to find string at offset of %llu\n", end.tv_sec - start.tv_sec + 1e-9*(end.tv_nsec -    start.tv_nsec), found-(void*)haystack); 
      return 0; 
    }



    $ gcc test_find.c -o test_find
    $ ./test_find 0
      Mode is zeroed
      Took 0.0147023 secs to find string at offset of 209715200
    $ ./test_find 1
      Mode is random
      Took 0.129951 secs to find string at offset of 209715200
    $ ./test_find 2
      Mode is uninitialized
      Took 0.0371922 secs to find string at offset of 209715200

kazinator · on April 28, 2018

Note that if you malloc that big of a block, it's going to come directly from a MAP_ANONYMOUS mmap, and so it will be zero initialized!

Your uninitialized mode is slower than zeroed, but nowhere near as slow as random.

The difference between initialized and uninitialized may be due to page faults: in the uninitialized case, the search is touching the memory for the first time. Maybe we're seeing TLB misses?

cozzyd · on April 28, 2018

Yeah I was expecting the uninitialized to be the same as the zeroed case for that reason, but was surprised to see a (slightly) different result.

Edit: you are almost certainly right. I added a getrusage call before and after, and we have ~50k soft page faults in the uninitialized case and ~2 in the zeroed case during the memmem.

vardump · on April 28, 2018

Reading uninitialized allocation triggers undefined behavior. The compiler is free to do whatever it pleases. Don't do that.