c - What's the runtime of this algorithm? -
i wrote answer question:
longest common subsequence: why wrong?
this function supposed find longest substring between 2 strings, when tried figure out worst-case runtime , input cause that, realized didn't know. consider code c pseudocode.
// assume shorter string passed in int lcs(char * a, char * b) { int length_a = strlen(a); int length_b = strlen(b); // holds length of longest common substring found far int longest_length_found = 0; // each character in 1 string (doesn't matter which), // incrementally larger strings in other // once longer substring can no longer found, stop (int a_index = 0; a_index < length_a - longest_length_found; a_index++) { (int b_index = 0; b_index < length_b - longest_length_found; b_index++) { // check next letter until mismatch found or 1 of strings ends. (int offset = 0; a[a_index+offset] != '\0' && b[b_index+offset] != '\0' && a[a_index+offset] == b[b_index+offset]; offset++) { longest_length_found = longest_length_found > offset ? longest_length_found : offset; } } } return longest_found_length; } here's thinking far:
below, i'll assuming , b equivalent size not have aba, i'll n^3. if terribly bad, can update question.
without of optimizations in code, believe runtime aba n^3 runtime.
however, if strings dissimilar , long substring never found, inner-most loop drop out constant leaving a*b, right?
if strings same, algorithm takes linear time, there 1 simultaneous pass through each of strings.
if strings similar, not identical, longest_length_found become significant fraction of smaller of or b, divide out 1 of factors in n^3 leaving n^2, right? i'm trying understand happens when remarkably similar not identical.
thinking out loud, if on first letter, find substring length around half of length of a. mean run a/2 iterations of first loop, b-(a/2) iterations of second loop, , a/2 iterations in third loop (assuming strings similar) without finding longer substring. assuming length strings, that's n/2 * n/2 * n/2 = o(n^3).
sample strings show behavior:
a a b a b a b a b a a b a a b a a b am close or missing or misapplying something?
i'm pretty sure better using trie/prefix tree, again, i'm interested in understanding behavior of specific code.
i think roliu said in comments bang on money. think algorithm o(n3) best-case of o(n2).
what wanted point out over-indulgence of algorithm. see, every possible starting offset in each string, test every subsequent matching character count number of matches. consider this:
a = "01111111" b = "11111110" almost first thing find maximum matching substring starting @ a[1] , b[0], , later on test parts of exact overlap, beginning @ a[2], b[1] , on... what's important here relative offset. can drop n3 part of algorithm realising this. becomes matter of shifting 1 of arrays beneath other.
a 01111111 b 11111110 b 11111110 b 11111110 b ... --> b 11111110 to make code less complicated, can test half of system, swap arrays , test other half:
// shift b under a 01111111 b 11111110 b ... --> b 11111110 // shift under b b 11111110 01111111 ... --> 01111111 if this, have o((a+b-2) * min(a,b) / 2), or more conveniently o(n2)
int lcs_half(char * a, char * b) { int maxlen = 0, len = 0; int offset, i; for( offset = 0; b[offset]; offset++ ) { len = 0; for( = 0; a[i] && b[i+offset]; i++ ) { if( a[i] == b[i+offset] ) { len++; if( len > maxlen ) maxlen = len; } else len = 0; } } return maxlen; } int lcs(char * a, char * b) { int run1 = lcs_half(a,b); int run2 = lcs_half(b,a); return run1 > run2 ? run1 : run2; }
Comments
Post a Comment