Tokenized Search

What does the user want?

We need to have an understanding of what we think the user means when then enter multiple search terms into a single form field. For example, what do you think the user is searching for when the enter the following into the Easy Search field?

Search anywhere  

I personally think that the natural sense is that all of these search terms will be found in any matching items. That is, the user is looking for entries that match Blue and Foxtrot and IV.

We are unlikely to find any database record that has the exact string Blue Foxtrot IV in any of the table fields, so instead, we search token by token instead of searching for the whole string. So a record matches if it has the string Blue, the word Foxtrot, and the word IV, but these words need not be contiguous, and they do not even need to be in the same data field.

Easy Search

This requires a little more sophisticated MySQL query. We need to search for each search term, which I call a token, in each form field. For a single search token, we expect a positive match if the token matches any database field. For all the tokens together, we expect a positive match if each one of the tokens matches. So we combine the search fields by OR and the search tokens by AND.

So, for the search above, we need to construct a MySQL query that looks like this:

  my $sql = "SELECT * FROM gen44_rom WHERE " . 

    "(DanceName LIKE '%Blue%' OR Rhythm LIKE '%Blue%' OR " .
    "Phase LIKE '%Blue%' OR Choreographer LIKE '%Blue%' OR " .
    "Label LIKE '%Blue%' OR Record LIKE '%Blue%' OR " .
    "Year LIKE '%Blue%' OR Extra LIKE '%Blue%') AND "

    "(DanceName LIKE '%Foxtrot%' OR Rhythm LIKE '%Foxtrot%' OR " .
    "Phase LIKE '%Foxtrot%' OR Choreographer LIKE '%Foxtrot%' OR " .
    "Label LIKE '%Foxtrot%' OR Record LIKE '%Foxtrot%' OR " .
    "Year LIKE '%Foxtrot%' OR Extra LIKE '%Foxtrot%') AND "

    "(DanceName LIKE '%IV%' OR Rhythm LIKE '%IV%' OR " .
    "Phase LIKE '%IV%' OR Choreographer LIKE '%IV%' OR " .
    "Label LIKE '%IV%' OR Record LIKE '%IV%' OR " .
    "Year LIKE '%IV%' OR Extra LIKE '%IV%') ";

Note that the query has several larger clauses connected by AND, because we want to match all the tokens they entered. Each larger clause in in parentheses so we can preserve the logic. Each larger clause consists of several smaller clauses connected by OR, because we need to find the token in only one of the database fields in order to find a match.

How to construct this query? Patiently and methodically. I constructed the larger clause for each token in a subroutine, passing each token to the subroutine. The calling function connected all the resulting clauses together with AND keywords. If the user put many search tokens in the box, this query could become quite long. MySQL is designed to be efficient at executing long complicated queries like this.

Advanced Search

Advanced search does something similar. The logic is a little more complicated, but the resulting MySQL queries are actually simpler, because each search terms applies to only one database field. For example, if I do this search:

Dance Name  

The resulting MySQL query will be:

  my $sql = "SELECT * FROM gen44_rom WHERE " . 
    "(DanceName LIKE '%Blue%' AND DanceName LIKE '%Eyes%');

Other possible search features

It is possible to add more search features. For example, we could use a pipe symbol (|) to indicate or, and a minus sign (-) to indicate not. For example, the following could indicate a search for the word Blue or the word Eyes, but not the word Foxtrot. I did not implement this kind of search in this project.

Dance Name