Friday, December 9, 2011

Re: help with full-text 3 letter "words"

My current solution is the following.
1. Added new column in table called "searchable" that has all of the
words from the columns that used to be full-text searched. This new
column has all words with less than 4 letters suffixed with
underscores (_). The Python code I used to do that is:
temp = temp.replace(',', ' ').split()
searchable = [word.ljust(4, '_') for word in temp]
searchable = ' '.join(searchable)
2. Before querying Cloud SQL using MATCH I put the user entered query
through the same Python code above to add the underscore suffix to
short words being searched for.
3. Now I can find words that are less than 4 letters without needing
to modify the Cloud SQL configuration file for my instance which is
currently not supported.
So, to summarize, I am syncing data in Cloud SQL with the Google
datastore that needs geo and full-text searching (to get around the
datastore not supporting full-text or geo searches). I am making a
custom searchable column in Cloud SQL (to get around the 4 characters
minimum for full-text search) that Cloud SQL has as the non
configurable default. This new column is the only column I am full-
text indexing. When a user runs a query, I pre-process it to have it
match the underscore suffix I am using on short words stored in the
searchable column.
Example: "zap" becomes "zap_" in both the searchable column and the
query that is used in MATCH.
Now I can do multi-word (phrase) full-text searches even with short
words (i.e. "zip zap") which in my real app are abbreviations.

No comments:

Post a Comment