Interview Questions

I need some code to do regular expression and wildcard matching.

C Interview Questions and Answers


(Continued from previous question...)

I need some code to do regular expression and wildcard matching.

Make sure you recognize the difference between:
* Classic regular expressions, variants of which are used in such Unix utilities as ed and grep. In regular expressions, a dot . usually matches any single character, and the sequence .* usually matches any string of characters. (Of course, full-blown regular expressions have several more features than these two.)
* Filename wildcards, variants of which are used by most operating systems. There is considerably more variation here (in particular, MS-DOS wildcards are somewhat stunted), but it is often the case that ? matches any single character, and * matches any string of characters.
There are a number of packages available for matching regular expressions. Most packages use a pair of functions, one for ``compiling'' the regular expression, and one for ``executing'' it (i.e. matching strings against it). Look for header files named <regex.h> or <regexp.h>, and functions called regcmp/regex, regcomp/regexec, or re_comp/re_exec. (These functions may exist in a separate regexp library.) A popular, freely-redistributable regexp package by Henry Spencer is available from ftp.cs.toronto.edu in pub/regexp.shar.Z or in several other archives. The GNU project has a package called rx.
Filename wildcard matching (sometimes called ``globbing'') is done in a variety of ways on different systems. On Unix, wildcards are automatically expanded by the shell before a process is invoked, so programs rarely have to worry about them explicitly. Under MS-DOS compilers, there is often a special object file which can be linked in to a program to expand wildcards while argv is being built. Several systems (including MS-DOS and VMS) provide system services for listing or opening files specified by wildcards. Check your compiler/library documentation.
Here is a quick little wildcard matcher by Arjan Kenter:
int match(char *pat, char *str)
{
switch(*pat) {
case '\0': return !*str;
case '*': return match(pat+1, str) ||
*str && match(pat, str+1);
case '?': return *str && match(pat+1, str+1);
default: return *pat == *str && match(pat+1, str+1);
}
}

With this definition, the call match("a*b.c", "aplomb.c") would return 1.

(Continued on next question...)

Other Interview Questions