 |
|
 |
Yes, PicoSearch now offers the most powerful feature yet to control your
user queries. We call it Query Rules, and it offers the possibility to
rewrite your user queries to your specifications. This is an admittedly
advanced option that most people will never want to touch, but in rare
cases it can offer the perfect fix. Common uses would include
normalizations/enforcing of both punctuation and vocabulary, although
the easier tools for User Synonym Groups and spelling
corrections/variations should be considered first for vocabulary. All of
these related features for augmenting queries can be found together in
the "Synonyms, Spelling & Query Rules" section of your account
manager.
Query Rules are input to your account manager in a text entry box. Each
rule is a before and after pattern that follows the standard syntax of
regular expressions. A tutorial on regular expression syntax is beyond
the scope of this FAQ, but we will provide some examples. Because Query
Rules will modify what the user types for a search, you should know what
you're doing so you don't break your searches. Regular expressions are
common in popular languages like Perl, so you can find plenty of
information on the internet, and we welcome you to contact us for guidance in your use of this feature.
Basic syntax
Each Query Rule has a before pattern and after pattern. Wherever the
before pattern matches the current query, then the after pattern will
replace the matched section. The patterns are delineated by forward
slashes (/) between and on either end. Here is a simply example to
replace every - in your queries with space. The backslash (\) is
commonly used to precede literal punctuation that might have another
meaning in regular expressions. Dash happens to be safe without
backslash but using it is good practice, since backslash is needed for
period, question mark, etc.
/\-/ /
This Query Rule might be useful on a product catalog search where dashes
separate catalog numbers that can combine in many ways. Turning dash
into space will find the independent parts of whatever catalog numbers
the user types. This would be far more flexible than user synonyms,
which can't specify all combinations, or hyphen breaking, which can only
find adjacent parts in the same order as typed with or without the
dashes.
Suppose you want to make sure that the dashes are on the inside of a
search term, and only between numbers. Being on the inside is important
if you want to allow the NOT logic that excludes a term, commonly typed
as a minus sign (dash) before a word. For example, [paint -spray] means
search for all pages with the word "paint" but not "spray". And then
requiring the match between numbers would work if you know that your
catalog numbers have no end letters. Making your rules as specific as
possible is a good approach to avoid unintended side effects on other
searches.
/(\d)\-(\d)/$1 $2/
Here we're getting into regular expression tools. \d is a shorthand for
the digits [0-9]. Parentheses that are functional (without backslash)
perform a capture, bringing the matching contents to the replacement
side in the order of parentheses with $1, $2, etc. Captures are
necessary because the match is fully replaced, so the digits would get
lost otherwise. Thus a search for [product a12-34x9] would match, and
the "2-3" is replaced by "2 3", resulting in the query [product a12
34x9].
MATCHn for number of matches
Query Rules support regular expression syntax inside the /, but not
outside. So some regular expression controls that you might find in
languages like Perl are more limited and controlled. Most notably, the
Query Rule by definition runs on every match it can find going left to
right through the query. If you want to only match n times, you can
specify a prefix MATCHn like this:
MATCH1 /(\d)\-(\d)/$1 $2/
Now a search for [product a12-34x9-5] would match once and become [product a12 34x9-5]
INCL_PHRASES or ONLY_PHRASES
Another important modifier to consider is whether the match will be
performed inside quotes. Quoted searches are supposed to find exact
phrases only, so it could be confusing to change what exactly was typed.
Regular expressions aren't going to easily know whether they're
matching inside quoted spans or not, so by default the Query Rules do
not act on phrases. We provide the modifiers INCL_PHRASES to
additionally match within quoted spans, and ONLY_PHRASES to match only
within quoted spans. So for example, the following rule would remove
question marks in all searches:
INCL_PHRASES /\?//
Speaking of phrases, here's an example of a different kind of
normalization. Suppose that the phrase "roll over" is very important on
your website, but while you always type it as two words, you want to be
sure to match even if the user types one word or hyphenated. Also you
want to search for the quoted exact phrase because you don't want to
find pages with the fairly common words "roll" or "over" in different
uses. Here's a rule to enforce exact phrases, using the | inside
parentheses to list alternate matches in the before pattern. We'll make
two rules, one for needing the quotes because it's not in a phrase (the
default match), and one for not needing quotes because it's already
inside a phrase.
/(roll over|roll\-over|rollover)/"roll over"/
ONLY_PHRASES /(roll over|roll\-over|rollover)/roll over/
Parentheses block ALL for alternatives
Here's another trick to consider. You could use a Query Rule to provide a
one-way transformation of certain words into several alternate words, a
process called query enrichment. This is what User Synonym groups
accomplish more automatically for you, but the User Synonym groups act
to enrich any one word with all others. A Query Rule could enforce that
only one word triggers the others and not vice-versa. So for example,
here we turn a search for auto or automobile into a search for that word
plus cars or trucks, but not vice versa, ie. trucks stays trucks. We
don't worry about the -s of plurals in the after pattern because
PicoSearch's automatic plural/singular matching will handle the matching
of car or cars, truck or trucks.
/(automobiles?|autos?)/\($1 car truck\)/
What's happening here is the parentheses in the before pattern act to
group alternates separated by | as well as provide the capture to $1.
The single question mark makes the -s optional. The longer automobile is
tried first so that auto isn't just chopped out of automobile. A \b
could also serve as a word boundary to prevent chopping, as in
/(autos?|automobiles?)\b/. Other boundary markers (anchors) supported
include ^ for start of query and $ for end, so that /^(autos?)$/ would
only replace the query word autos when it is by itself.
Now here is the important point: the parentheses in the after pattern
are literal so they get inserted into the query as part of PicoSearch's
own grouping syntax. Why would we want extra parentheses? We need the
parentheses because if the user is searching in ALL words mode, adding
alternate words at the top level will require them all to be present,
and that might ruin the search. But ALL mode will not distribute an
effective + inside parentheses. You will get the intended effect of
requiring that one (or more) of the alternates be found.
Alerting with Show and MSG messages
Okay, we promised this feature would be technical, so that's enough
about regular expressions. The other side of Query Rules to know about
is how they are alerted to the user. There is a checkbox option in the
account manager to "Show the Adjusted Query", and if this is off then
the adjustments you make to the query are silent and do not change what
the user typed visibly. The search is performed on the adjusted query
however, and that is what shows up in searches and concordance (the text
excerpts), so depending on your Query Rules you may want to keep the
option on. Because this option is useful for debugging, there's also a
tester link in the account manager's feature so you don't have to turn
Showing on globally just to see the effects of your rules.
The other way that the Query Rule can alert the user is through
triggered messages, and in fact this ability could be used to not change
the query at all but only detect patterns and then signal messages. In
the following example, we detect the first dash without actually
changing it by omitting the after pattern, for the purpose of alerting
the user to consult an FAQ. Other punctuation can be detected in the
same rule with the | syntax in the before pattern, and the capture can
also go into the message.
MATCH1 /(\-|\=|\*)/ MSG "You typed a $1, so if you're trying to search
for parts numbers please check our <a
href="http://www.mysite.com/syntaxfaq.html">FAQ on syntax</a>"
The message after the MSG will plug into the search results page on the
line about what was found, similar to messages about ignored stopwords
or synonyms. But if you want it to plug elsewhere in your template,
position it with the keyword _PICO_QUERYMSG_REPLACEMENT. If you want the
message to plug into both the default and your additional locations,
use the keyword _PICO_QUERYMSG_ADDITIONAL.
The contents of the message are pretty flexible so it could include tags
and classes, but there's also a template code to just set a class span
on all messages where-ever they are plugged in, called
PICO_CLASS_SPAN_QUERYMSG. See the template codes FAQ for details.
Messages also benefit from two more special plugin values. _QBEFORE in
the message will get the entire before pattern match, and _QAFTER will
get the entire after pattern match. So for example, the following rule
would change the search [apple] into [apple pie] and say the message:
Yum, we recommend our apple pie
/\b(apple|peach)\b/$1 pie/ MSG "Yum, we recommend our _QAFTER"
If you're writing a lot of rules with shared messages, you can also
collect your messages in one area and refer to them by number with MSGn.
So the above rule is equivalent to:
/\b(apple|peach)\b/$1 pie/ MSG1
MSG1 "Yum, we recommend our _QAFTER"
Commenting
With the complexity of Query Rules, you are weclome to add comments. To
add your own comments to your rules input text, precede a comment line
with either #COMMENT# or <!-- like an HTML comment but it doesn't
need the end closing -->. For example:
#COMMENT# this is a query rule comment
<!-- this is another query rule comment
Well, that's enough to either get you started if you're brave and technically minded, or just feel free to contact us for assistance in applying the power of Query Rules. We don't expect
most users to try or even need query rules, and to a large extent we
expect to suggest and tweak the feature for customers with needs that we
recognize could be solved this way. We just wanted to expose the
feature as well as use it in our own technical support, to stay in the
tradition of PicoSearch's empowering users with their own deeply
customizable PicoSearch search engine.
|
|
 |
|
 |