Help with PicoSearch

Can I set stopwords to ignore common words or handle questions in whole sentences?

Yes, in your account manager, you can set several categories of stopwords. Stopwords are the query terms that will be thrown out. Stopwords are commonly used to filter out noise words from a search, like "the" and "a". Noise words aren't generally a problem because most people don't usually search these words anyway, and even if they do include them, then the search ranking will naturally demote noisy words.

However, if you have certain common words on your site that people might often type, such as "hair" on a hair products store, then you could try adding these as stopwords to get overall fewer and more focussed search results. In contrast, setting "hair" as a less important word would not return fewer results, but could bring other words in the search higher in the results. For more ideas about influencing search results, see the influence FAQ.

Another reason to use stopwords is to encourage your users to type questions in whole sentences, because then only the most important words get searched. For example, just by setting the common stopword presets for English that we provide, querying [What is a carburetor?] will search for "carburetor", and [Tell me about the history of the presidency] will search for just "history" and "presidency". Remember, this is a presentation trick and not real natural language parsing, so don't mislead your visitors into thinking that you really understand what they're saying! But within reason, this stopword setting technique can be used to simulate an FAQ querying system.

There is also the option to Report Ignored Stopwords to the user so they will know what happened. And if you're worried about what the user will see if they only type a stopword, or if they type some important usage of a stopword combined with other terms, then you can also set the Direct Searches feature. A Direct Search rule can send the search directly to a certain page on a site for specified keywords, and the rule will match before stopwords are removed from the search. So for example on a hair care site, if "hair" is a stopword to remove noise for searches like [hair spray] and [hair mousse], you could still use Direct Searches to send [hair] by itself and [hair today] (say it's the name of a product) to specific result pages. You can cover all your bases this way.

So there are three cases in which stopwords will remain visible: (1) The concordance (sentence excerpts) will still include all stopwords for readability, although they will not be bolded like searched terms, (2) Queries in quotes (phrases) will still be searched for all words, and (3) the stopword reporting option will tell the user which words have been ignored in the search.

Stopwords do allow wildcards. ? matches one optional character, * matches any number of optional characters. So a stopword of auto? would match "auto" or "autos" but not "automobile", whereas auto* would match all of these plus "automation", etc. Multiple word stopwords are not allowed.

Back to FAQs