 |
|
 |
The essential function of a search engine is to take the word or words
that a user is searching for, and return the search results, which lists
the documents that contain the user's word or words. Links from the
list of search results take the user to the specific documents, and the
information in the search results (title, meta description, concordance,
etc.) help the user decide which links to follow. PicoSearch does all
this, and gives you tools to control which documents are searched and
what order they may appear in. The Promotions feature is a particularly powerful way to make sure that certain
documents appear first for certain searches in the search results list.
But what if you would like to supplement your search engine with some
Direct Searches? That is, for particular searches, you would like to
bypass the search results list and just go directly to a particular page
on your site. In a search engine that searches the entire internet, it
would be risky to assume that the first search result is the only
document you want to see. But in your customized PicoSearch search
engine, you may know very well if a certain search is best answered by
immediately going to a certain page.
PicoSearch offers the Direct Searches feature in your account manager's
Designing Section to enable immediate linking to a particular internet
address. In the PicoSearch tradition of giving you maximum power, Direct
Searches also support some very interesting capabilities, including:
- multiple direct search statements that will be applied in the order of
entry in your account manager
- run-time checking for valid URLs to jump to (look before you leap)
- keyword patterns with
wildcards to match the user's searches for greater flexibility
- plug matching pieces of the user's search into the URL to create a whole range of target URLs, effectively supporting look-up dictionaries
- have wildcards in the URLs of the targeted pages to make the direct
search conditional, so that the user will go to that page only if it was
in the search results already
- specify which partitions the direct search can trigger within
- go straight to the first search result (if you're feeling lucky!)
Unconditional Direct Searches
An unconditional direct search is one that goes to a fully
specified URL whenever the keyword pattern matches the user's search.
For example:
cars = http://www.mysite.com/list_of_cars.html
Now whenever the user of your search engine types cars, they will
be sent directly to the URL for the list_of_cars.html. This URL must
be complete and fully qualified, meaning that it starts with http://,
because the link comes from the PicoSearch site and must get to the new
page exactly. Notice that this URL doesn't have to even be in your
search engine, so you could potentially send people to reference pages
outside of your site, if you're confident that certain searchers should
go there.
As an unconditional rule, the direct search will go straight to the URL
at searching time. You should therefore make sure that this URL is
reliable so that PicoSearch doesn't send your users to a broken link. If
you control the page, you might want to add a comment in the HTML so
that future webmasters know that your PicoSearch is depending on the
page. To help ensure that your searches don't go directly to a broken
link, PicoSearch will check your Direct Searches at indexing time, and
invalidate any broken ones with an initial # (see the Mainenance and
Statistics topic below).
Look Before You Leap
Before we get into the various tricks that direct searches can do, including creating patterns of possible searches and URL targets, here's an important concern: what if you want to make absolutely sure the URL exists before you send the user to it? For this, replace the = with a =?= to mean "look before you leap". PicoSearch will do a run-time fetch of the head of the URL as fast as it can, and if it's not a valid page then PicoSearch will consider the next matching direct search, or just go to the regular search results page if nothing else.
A head fetch is not a full page fetch so it's the most efficient look-ahead technique there is on the internet, but still it will obviously take some amount of time longer than not checking at all. So don't burden your direct searches unless you need to, in which case don't worry, because the user won't notice a trivial delay but certainly will notice a page error! For example, if you're relying on the Plugins technique to generate ranges of URLs on your site that might not exist, here's a pattern that will go to a definition page only if it's available, whenever the user types define before something else.
define {*} =?= http://www.mysite.com/definition_{*}.html
Keywords and Wildcards
When we say keyword, we really mean a flexible pattern of word
or words for the direct search to match. The match is case insensitive,
so Cars will match cars in the example above. The match is
insensitive to double quotes, since double quotes are optionally used
to hold words together in exact phrase searching. The match will also be
accent insensitive if you have the Use Accent Insensitivity option
turned on in your Alternate Character Options section in your account
manager (it is on by default). All other keyword variations will not
automatically match however, such as plurals/singulars, synonyms, and
being part of a larger phrase. To help with these issues, the keywords
can use commas (,) spaces ( ) and the wildcards asterisk (*) and
question mark (?).
You can add multiple keywords to a direct search, separated by commas.
Multiple word phrases with spaces are allowed. For example:
car,cars,list cars = http://www.mysite.com/list_of_cars.html
This will cause anyone who types car or cars or list cars to go directly to the page list_of_cars.html. The match must be still
be exact, so for more flexibility you can use the wildcards * and ? in
the keywords. A * will match any number of optional non-space
characters, and ? matches at most one optional non-space character. For
example:
car*,list car? = http://www.mysite.com/list_of_cars.html
Now you'll match more searches, but be careful to consider what your
users will be reasonably searching for given the topic of your website.
The pattern car* will match car and cars but also match carpool and careful. The pattern car? will match car and cars but also carp and care.
Of course, it's unlikely that a site about cars will also discuss carp,
but carpools could be an issue. The keyword patterns for Promotions have the same potential for possible overmatching, but direct searches
are most important to anticipate because the user is going directly to a
URL and won't ever see the list of other search results.
The default behavior of direct search keyword patterns is that they must
match all the terms that the user types, and in the same order, just to
trigger the direct search. Therefore, if the user types list of cars then this will not match the direct search example above. To help with
matching more of the user’s search terms, the wildcard * takes on a
special meaning when it is not connected with a word, i.e. it is
separated by spaces at the ends or in the middle of keywords. A single *
means one optional word, and a double ** means any number of optional
words. For example:
* car*,list ** car? = http://www.mysite.com/list_of_cars.html
Now the first keyword pattern with a single * by itself will match if the user types car or all cars, but it won’t match show me all cars because that’s more than one word. The second keyword pattern with a double ** will match if the user types list cars or list of cars or even list of all your cars.
So at this rate, you can see that if you wanted to match the word car
anywhere in what the user types, you could use the pattern:
** car ** = http://www.mysite.com/list_of_cars.html
Conditional Partitions
If you want a direct search to only activate when the user is searching in one or more certain partitions, you can prefix the pattern with partitionNAME:. For example, to only jump to the page about car(s) when in the CARS and VEHICLES partitions, use the following:
partitionCARS: partitionVEHICLES: car? = http://www.mysite.com/cars.html
Conditional Direct Searches
Direct searches become even more interesting when you add the wildcard *
to the right side of the direct search, that is within the URL. The
meaning is to use the direct search only if the entire URL pattern
matches one of the actual search result URLs that come from running the
user’s search. This behavior not only allows for more generalized direct
searches, it also helps to ensure that the direct search really is
relevant for what the user typed. The more wildcards you have in the
left keyword side of the direct search, the more you might want to make
the direct search conditional to avoid overmatching and confusing the
user. And if you don’t want the wildcard to ever match more than one
URL, you can always add * to the beginning of a fully specified URL
(since nothing comes before the http anyway). For example:
** car ** = *http://www.mysite.com/list_of_cars.html
With the double ** on both sides of car, whenever the user types the
word car in any search then the direct search would normally have gone
straight to the list_of_cars.html. But by putting a * at the beginning
of the URL, the direct search becomes conditional, thus dependent upon
the list_of_cars.html being an actual search result in the first page of
results for what was typed. So if you are returning ten results for a
site about cars, then a search for car list will trigger the
direct search if list_of_cars.html was in the top ten results anyway.
This seems likely, especially if you set Promotions on that URL for some words like list. If however the user types Ford car parts,
it’s more likely that other pages will come first based on the words
Ford and parts, thus blocking the direct search which would have been
inappropriate.
If you’re more interested in using conditionality to match multiple URLs
with a single direct search statement, then you’re free to use the *
for complex URL patterns. Notice however that ? cannot be a wildcard in a
URL as it was in keywords, because ? is the first character of cgi
arguments in URLs. So for example:
** list ** = http://www.mysite.com/list_of_*.html
This pattern could work well to go directly to the list of whatever the user is finding, as long as they use the word list somewhere in their search. The conditional search is making sure that
there is a page of form list_of_*.html that is being returned in the
search results anyway, so since it’s relevant the user will go to it
directly. Thus, a search for list of cars would likely go to list_of_cars.html, and list of trucks would likely go to list_of_trucks.html, assuming you’ve made those
pages on your site and they’re in the first page of search results.
Furthermore, as with Promotions, there is a Maximize Scope option on the
Direct Searches feature that will make your conditional searches match
beyond the first page of results.
There is one more interesting feature of conditionals involving anchors.
Anchors allow browsers to jump down to a pre-specified section of a
page depending on the URL. The URL must have #text at the end of it, and
the HTML of the page must have <a name="text">...</a>
within it, where "text" is any text. This is all standard HTML.
If you have an anchor on the end of your conditional direct search,
PicoSearch will not include the anchor in the requirement to match for a
result URL. This is good, because the URLs in your search engine are
not likely to have anchors anyway. But when the URL does match a search
result, the anchor will be used in the direct search. So to build on the
previous example:
** list ** = http://www.mysite.com/list_of_*.html#listings
Now when the user has the word list in their search, not only
will they jump to the first list page that matches, but also they will
jump directly down to the #listings anchor if there is one. This could
be handy to skip the top part of long pages.
First Result Searches (If You Dare)
If you don’t specify a URL for a keyword pattern and only use a
*, then you will see another special behavior that we call First Result
Searching. This has been referred to in some search engines as being
lucky or instant searching. It’s kind of risky, but the meaning is to go
directly to the first search result, if there is one, no matter what
it is. In the context of your custom search engine, combined with
keyword patterns, this could still be fairly predictable and useful. For
example:
** warranty ** = *
Now whenever a user has the word warranty anywhere in their
search, they will go directly to the first search result. If you had
only one warranty page on your site, then you might have wanted to spell
out the URL in the direct search pattern, either conditionally or
unconditionally (see above). But if you have several warranty pages, and
in general the word warranty is rare on your site, you might feel
confident enough to just send the user straight to the first page the
search returns.
For the extreme case, a universal First Result Searching pattern on one
word searches would be *=* and for any number of words would be **=* on a
line by itself. When specified in the Direct Searches feature, the one
word case might be useful if you're confident that single word searches
get the right page first, while multiple word searches are less
predictable and should display all the results. The all searches line
of **=* pretty much short circuits your search engine to never show
lists of results. Either case could come after other more specific
Direct Searches lines however, since the order of Direct Searches as
entered is applied at searching time.
PicoSearch also takes a run-time argument so you can play with First
Result searching from only certain search boxes on your site. This
argument has the effect of doing a First Result search as a last resort
after running through the direct searches that are already in your
account manager. The HTML code to add to your search box for one word
searches to go to the First Result is the following:
<input type="hidden" name="ds" value="one">
And for all searches (any number of words) add this within your search box code:
<input type="hidden" name="ds" value="all">
Capture Plugins for URL patterns
Curly braces can be used to capture all or parts of the matching query that triggers a direct search, and plug those parts into the URL. This can have the effect of creating a series of direct search results that map queries to URLs on your site, thus triggering a kind of question-answer searching that doesn't even have to rely on your indexer! For example, if we want to respond directly to a search for some word of the form definition WORD by going to a URL of form definition_WORD.html we could do the following:
definition {*} =?= http://www.mysite.com/definition_{*}.html
So now if a user types definition spoongate you need to have a URL on your site for PicoSearch to visit at http://www.mysite.com/definition_spoongate.html. We use the =?= syntax for "look before you leap" behavior so the URL doesn't have to exist.
You can have multiple captures too that will plug in order of appearance. For example if you wanted to enforce that definition be part of the URL, as well as constrain the words to starting with spoon, you could use this rule:
{definition} spoon{*} =?= http://www.mysite.com/{*}_spoon{*}.html
You can further constrain the matching search to a limited set of possibilities separated by a vertical bar, like this rule to only go to the definition pages for spoongate, marshjam, and wigphone. We stopped using the =?= here because we can be sure of how many URLs we need to support on the site (3).
{definition} {spoongate|marshjam|wigphone} = http://www.mysite.com/{*}_{*}.html
Capture plugins can combine with regular wildcards of * or ? on the left, and * on the right side to make the direct search conditional for only when the result is actually in the results of PicoSearch's indexed search. Conditionality would be another way to ensure that the URL is only used if it actually exists. For example, in this rule we allow a single optional letter on the end of the defined word in order to match a plural -s, plus any number of words thereafter, but only go to the matching URL if it is actually in the search. Again we stopped using the =?= here because if it's in the search, the URL must have been indexed and exist already.
{definition} {spoongate|marshjam|wigphone}? ** = *http://www.mysite.com/{*}_{*}.html
Maintenence and Statistics
To help ensure that your unconditional direct searches really will go to
valid URLs, PicoSearch tests the URLs each time it indexes your site.
If an unconditional direct search URL is down, PicoSearch will
inactivate your direct search with a # at the beginning of the line. This indexing-time check cannot be done for URLs with plugins or wildcards, since there is a pattern and not a single URL to check. If you worry about these existing, you can use the "look before you leap" syntax.
The # should not be used by you as a way to comment out direct searches
however, since PicoSearch will test every unconditional search again on
the next indexing, and potentially reactivate lines as they become
available. To add your own comments to your direct searches input,
precede a comment line with either #COMMENT# or <!-- like an HTML
comment but it doesn't need the end closing -->. For example:
#COMMENT# this is a direct searches comment
<!-- this is another direct searches comment
Because Direct Searches are so distinct in the user’s experience, they
will be recorded in the statistics separately. If you have direct search
statements in your account manager, then your first page of statistics
(the one with the general categorical totals) will include a running
count of all triggered direct searches, and another total of just the
conditional direct searches. The difference between the two will thus be
the number of unconditional direct searches that user searches have
triggered.
Free accounts get just 5 direct search lines to play with, while paying
accounts get virtually unlimited (thousands). Furthermore, paying
accounts get an additional page of statistics just for their top direct
searches. This view is similar in format to the top pages and top
partitions statistics. You will see the top direct search URLs that your
users saw, and the top searches that triggered these. Direct Searches
will not be otherwise logged in your statistics, and in particular will
not be included in the top pages view, so you can pretty much isolate
and understand how your direct searches are being seen by your visitors.
|
|
 |
|
 |