I did some benchmarking for a dictionary app a while back, and the fastest I could get was strstr(), if I recall correctly. It substantially beat std::string::find(). Using a loop is bound to give terrible performance; for large data you should be using one of the string-search algorithms (which strstr() surely does). And given the very large search string, those string-search algorithms should be able to skip large chunks at a time.
Here's a small version that allocates 200 MB of uninitialized memory.
#include <stdlib.h>
#include <string.h>
int main(int argc, char *argv[])
{
int size = 200 * 1024 * 1024;
char *bytes = malloc(size);
bytes[size - 1] = '\0';
strstr(bytes, "788bbd6b12d79a4ee18e36a8a264fb496465c71b7932a965ed731397b7c97d14");
return 0;
}
> gcc test.c
> time ./a.out
real 0m0.006s
user 0m0.002s
sys 0m0.002s
(Since the author ignores time to read off disk, I figure uninitialized memory should be comparable. The chances of strstr() actually finding anything are pretty slim, but I ran it a number of times just to be sure, with the same results.)
Note that strstr will stop as soon as you have a 0 byte so this is not a fair comparison.
Uninitialized (practically often zeroed) also produces different results from random and explicitly zeroed, since I assume it affects how often it can fail fast.
#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <time.h>
int main(int nargs, char ** args)
{
struct timespec start;
struct timespec end;
int mode = atoi(args[1]);
const char * needle = "788bbd6b12d79a4ee18e36a8a264fb496465c71b7932a965ed731397b7c97d14";
const int N = 200*1024*1024 ;
char * haystack = malloc(N + sizeof(needle));
strcpy(haystack+N,needle);
printf("Mode is %s\n" , mode == 0 ? "zeroed" :
mode == 1 ? "random" :
"uninitialized");
if (mode == 0)
{
memset(haystack,0,N);
}
if (mode == 1)
{
FILE * urandom = fopen("/dev/urandom","r");
int nread = fread(haystack,1,N,urandom);
fclose(urandom);
if (nread < N)
{
fprintf(stderr,"read returned %d\n",nread);
}
}
clock_gettime(CLOCK_REALTIME, &start);
void * found = memmem(haystack,N+sizeof(needle),needle,sizeof(needle));
clock_gettime(CLOCK_REALTIME, &end);
printf("Took %g secs to find string at offset of %llu\n", end.tv_sec - start.tv_sec + 1e-9*(end.tv_nsec - start.tv_nsec), found-(void*)haystack);
return 0;
}
$ gcc test_find.c -o test_find
$ ./test_find 0
Mode is zeroed
Took 0.0147023 secs to find string at offset of 209715200
$ ./test_find 1
Mode is random
Took 0.129951 secs to find string at offset of 209715200
$ ./test_find 2
Mode is uninitialized
Took 0.0371922 secs to find string at offset of 209715200
Note that if you malloc that big of a block, it's going to come directly from a MAP_ANONYMOUS mmap, and so it will be zero initialized!
Your uninitialized mode is slower than zeroed, but nowhere near as slow as random.
The difference between initialized and uninitialized may be due to page faults: in the uninitialized case, the search is touching the memory for the first time. Maybe we're seeing TLB misses?
Yeah I was expecting the uninitialized to be the same as the zeroed case for that reason, but was surprised to see a (slightly) different result.
Edit: you are almost certainly right. I added a getrusage call before and after, and we have ~50k soft page faults in the uninitialized case and ~2 in the zeroed case during the memmem.
Here's a small version that allocates 200 MB of uninitialized memory.
(Since the author ignores time to read off disk, I figure uninitialized memory should be comparable. The chances of strstr() actually finding anything are pretty slim, but I ran it a number of times just to be sure, with the same results.)This is on a 2012 MacBook Pro.