Help with PicoSearch

Can I adjust the query for effects specific to the words on my site?

Yes, PicoSearch now offers the most powerful feature yet to control your user queries. We call it Query Rules, and it offers the possibility to rewrite your user queries to your specifications. This is an admittedly advanced option that most people will never want to touch, but in rare cases it can offer the perfect fix. Common uses would include normalizations/enforcing of both punctuation and vocabulary, although the easier tools for User Synonym Groups and spelling corrections/variations should be considered first for vocabulary. All of these related features for augmenting queries can be found together in the "Synonyms, Spelling & Query Rules" section of your account manager.

Query Rules are input to your account manager in a text entry box. Each rule is a before and after pattern that follows the standard syntax of regular expressions. A tutorial on regular expression syntax is beyond the scope of this FAQ, but we will provide some examples. Because Query Rules will modify what the user types for a search, you should know what you're doing so you don't break your searches. Regular expressions are common in popular languages like Perl, so you can find plenty of information on the internet, and we welcome you to contact us for guidance in your use of this feature.

Basic syntax

Each Query Rule has a before pattern and after pattern. Wherever the before pattern matches the current query, then the after pattern will replace the matched section. The patterns are delineated by forward slashes (/) between and on either end. Here is a simply example to replace every - in your queries with space. The backslash (\) is commonly used to precede literal punctuation that might have another meaning in regular expressions. Dash happens to be safe without backslash but using it is good practice, since backslash is needed for period, question mark, etc.

/\-/ /

This Query Rule might be useful on a product catalog search where dashes separate catalog numbers that can combine in many ways. Turning dash into space will find the independent parts of whatever catalog numbers the user types. This would be far more flexible than user synonyms, which can't specify all combinations, or hyphen breaking, which can only find adjacent parts in the same order as typed with or without the dashes.

Suppose you want to make sure that the dashes are on the inside of a search term, and only between numbers. Being on the inside is important if you want to allow the NOT logic that excludes a term, commonly typed as a minus sign (dash) before a word. For example, [paint -spray] means search for all pages with the word "paint" but not "spray". And then requiring the match between numbers would work if you know that your catalog numbers have no end letters. Making your rules as specific as possible is a good approach to avoid unintended side effects on other searches.

/(\d)\-(\d)/$1 $2/

Here we're getting into regular expression tools. \d is a shorthand for the digits [0-9]. Parentheses that are functional (without backslash) perform a capture, bringing the matching contents to the replacement side in the order of parentheses with $1, $2, etc. Captures are necessary because the match is fully replaced, so the digits would get lost otherwise. Thus a search for [product a12-34x9] would match, and the "2-3" is replaced by "2 3", resulting in the query [product a12 34x9].

MATCHn for number of matches

Query Rules support regular expression syntax inside the /, but not outside. So some regular expression controls that you might find in languages like Perl are more limited and controlled. Most notably, the Query Rule by definition runs on every match it can find going left to right through the query. If you want to only match n times, you can specify a prefix MATCHn like this:

MATCH1 /(\d)\-(\d)/$1 $2/

Now a search for [product a12-34x9-5] would match once and become [product a12 34x9-5]

INCL_PHRASES or ONLY_PHRASES

Another important modifier to consider is whether the match will be performed inside quotes. Quoted searches are supposed to find exact phrases only, so it could be confusing to change what exactly was typed. Regular expressions aren't going to easily know whether they're matching inside quoted spans or not, so by default the Query Rules do not act on phrases. We provide the modifiers INCL_PHRASES to additionally match within quoted spans, and ONLY_PHRASES to match only within quoted spans. So for example, the following rule would remove question marks in all searches:

INCL_PHRASES /\?//

Speaking of phrases, here's an example of a different kind of normalization. Suppose that the phrase "roll over" is very important on your website, but while you always type it as two words, you want to be sure to match even if the user types one word or hyphenated. Also you want to search for the quoted exact phrase because you don't want to find pages with the fairly common words "roll" or "over" in different uses. Here's a rule to enforce exact phrases, using the | inside parentheses to list alternate matches in the before pattern. We'll make two rules, one for needing the quotes because it's not in a phrase (the default match), and one for not needing quotes because it's already inside a phrase.

/(roll over|roll\-over|rollover)/"roll over"/
ONLY_PHRASES /(roll over|roll\-over|rollover)/roll over/

Parentheses block ALL for alternatives

Here's another trick to consider. You could use a Query Rule to provide a one-way transformation of certain words into several alternate words, a process called query enrichment. This is what User Synonym groups accomplish more automatically for you, but the User Synonym groups act to enrich any one word with all others. A Query Rule could enforce that only one word triggers the others and not vice-versa. So for example, here we turn a search for auto or automobile into a search for that word plus cars or trucks, but not vice versa, ie. trucks stays trucks. We don't worry about the -s of plurals in the after pattern because PicoSearch's automatic plural/singular matching will handle the matching of car or cars, truck or trucks.

/(automobiles?|autos?)/\($1 car truck\)/

What's happening here is the parentheses in the before pattern act to group alternates separated by | as well as provide the capture to $1. The single question mark makes the -s optional. The longer automobile is tried first so that auto isn't just chopped out of automobile. A \b could also serve as a word boundary to prevent chopping, as in /(autos?|automobiles?)\b/. Other boundary markers (anchors) supported include ^ for start of query and $ for end, so that /^(autos?)$/ would only replace the query word autos when it is by itself.

Now here is the important point: the parentheses in the after pattern are literal so they get inserted into the query as part of PicoSearch's own grouping syntax. Why would we want extra parentheses? We need the parentheses because if the user is searching in ALL words mode, adding alternate words at the top level will require them all to be present, and that might ruin the search. But ALL mode will not distribute an effective + inside parentheses. You will get the intended effect of requiring that one (or more) of the alternates be found.

Alerting with Show and MSG messages

Okay, we promised this feature would be technical, so that's enough about regular expressions. The other side of Query Rules to know about is how they are alerted to the user. There is a checkbox option in the account manager to "Show the Adjusted Query", and if this is off then the adjustments you make to the query are silent and do not change what the user typed visibly. The search is performed on the adjusted query however, and that is what shows up in searches and concordance (the text excerpts), so depending on your Query Rules you may want to keep the option on. Because this option is useful for debugging, there's also a tester link in the account manager's feature so you don't have to turn Showing on globally just to see the effects of your rules.

The other way that the Query Rule can alert the user is through triggered messages, and in fact this ability could be used to not change the query at all but only detect patterns and then signal messages. In the following example, we detect the first dash without actually changing it by omitting the after pattern, for the purpose of alerting the user to consult an FAQ. Other punctuation can be detected in the same rule with the | syntax in the before pattern, and the capture can also go into the message.

MATCH1 /(\-|\=|\*)/ MSG "You typed a $1, so if you're trying to search for parts numbers please check our <a href="http://www.mysite.com/syntaxfaq.html">FAQ on syntax</a>"

The message after the MSG will plug into the search results page on the line about what was found, similar to messages about ignored stopwords or synonyms. But if you want it to plug elsewhere in your template, position it with the keyword _PICO_QUERYMSG_REPLACEMENT. If you want the message to plug into both the default and your additional locations, use the keyword _PICO_QUERYMSG_ADDITIONAL.

The contents of the message are pretty flexible so it could include tags and classes, but there's also a template code to just set a class span on all messages where-ever they are plugged in, called PICO_CLASS_SPAN_QUERYMSG. See the template codes FAQ for details.

Messages also benefit from two more special plugin values. _QBEFORE in the message will get the entire before pattern match, and _QAFTER will get the entire after pattern match. So for example, the following rule would change the search [apple] into [apple pie] and say the message: Yum, we recommend our apple pie

/\b(apple|peach)\b/$1 pie/ MSG "Yum, we recommend our _QAFTER"

If you're writing a lot of rules with shared messages, you can also collect your messages in one area and refer to them by number with MSGn. So the above rule is equivalent to:

/\b(apple|peach)\b/$1 pie/ MSG1
MSG1 "Yum, we recommend our _QAFTER"

Commenting

With the complexity of Query Rules, you are weclome to add comments. To add your own comments to your rules input text, precede a comment line with either #COMMENT# or <!-- like an HTML comment but it doesn't need the end closing -->. For example:

#COMMENT# this is a query rule comment
<!-- this is another query rule comment


Well, that's enough to either get you started if you're brave and technically minded, or just feel free to contact us for assistance in applying the power of Query Rules. We don't expect most users to try or even need query rules, and to a large extent we expect to suggest and tweak the feature for customers with needs that we recognize could be solved this way. We just wanted to expose the feature as well as use it in our own technical support, to stay in the tradition of PicoSearch's empowering users with their own deeply customizable PicoSearch search engine.

Back to FAQs