 |
|
 |
PicoSearch Professional and Premium accounts can get their search
results delivered in datafeed formats for no additional fee, just call the regular search with an argument to turn on the type of feed you want. Full directions are below.
Current formats available are XML and JSON (and its variant JSONP). With webservice datafeed support, you can integrate the search results any way that you wish into your website, providing that you write your own code to do so (thus not a technique for beginners). XML typically is called from server scripts and JSON called from client-side javascript.
With a datafeed, you could mix the search results with other dynamic information sources available to you. Or you could generate different results page designs beyond the abilities of PicoSearch's template system, which is
powerful but essentially supports one main design with some dynamic
tweaking abilities, see Template FAQ.
If you're thinking of using a datafeed just for the effect of hiding the
PicoSearch links behind your own scripts, you could be better
off with the Private Domain feature.
XML datafeed usage is ideal for designing server-side scripts that combine your PicoSearch search engine results with other information in displays that are customized to your website's particular needs. This approach is not for beginners however, since it takes some
programming experience to make server scripts that consume XML, using whatever tools you
prefer. You might rely on a standard XML parser, or you could parse the
XML on your own. Either way it's important to look at the DTD specs and
usage as described below.
- There are two versions of the XML feed, either you can use.
- Once the XML option has been activated, then XML
results are returned whenever the xml=1 or xml=2 argument is supplied
with the search call, where the number is the version. The following is an example URL call for your account number as the index argument and the user's query. For more examples of calling and initializing PicoSearch, see this FAQ.
http://www.picosearch.com/cgi-bin/ts.pl?index=123456&query=news&xml=2
-
If your xml parser is being fussy about character encoding of the data feed, you can force the results to be in UTF-8 by caling with &outc=utf8 as another argument. Normally the datafeed will be in the encoding of your index (as indicated in the "Alternate Character Options" section of your account manager) because that will get displayed consistently by the browser in HTML within your template, for example as ISO-8859-1.
JSON / JSONP datafeed usage is great for client-side javascript in HTML pages that need to make custom displays of PicoSearch search results data. JSON is a light-weight data specification that is human readable and compatible with javascript; in fact, since it is javascript it can be consumed by an eval, but the recommended method is by a JSON parser so that malicious code will not be allowed to run. JSONP is just a syntactic wrapper on the JSON data format, in which a function name is provided for callback purposes.
- To call JSON use the argument json=1, to call JSONP for callback function myfunc use the argument jsonp=myfunc. The following is an example URL call for your account number as the index argument and the user's query. For more examples of calling and initializing PicoSearch, see this FAQ.
http://www.picosearch.com/cgi-bin/ts.pl?index=123456&query=news&json=1
- The JSON/JSONP datefeed will return the latest information available from PicoSearch. Note that because JSON tends towards single lists of things, while XML accumulates sequences of tags, therefore the key names of some of the data elements are plural. Thus, expect to see the list of Tokens, SpellSuggs, SpellCorrects, and Matches.
JSON's javascript data syntax is summarized below; for more details, please consult JSON documentation on the internet.
Arrays: Lists of comma-separated values are used whenever the value of a key is a simple sequence of data items. AvailablePartitions is an array of strings, so are Tokens, SpellSuggs, SpellCorrects, and Concordance. Matches is an array of match objects.
Objects: The entire search result is returned as an Object, which is a list of key:value pairs separated by commas and enclosed by curly braces. The value of RangeShown is also an object, and each document match is an object to hold a set of Title, Meta, etc.
Here is a simplified search result example for JSON. It all runs together in one line, but remains fairly readable in steps.
{"NumDocs": 1,"RangeShown": {"lower": 0, "upper": 20}, "Query": "news", "Tokens": ["news", "new"], "Matches": [{"Title": "my page", "Meta": "my description", "URL": "http://mysite.com/mypage.html", "Concordance": ["...about <b>news>..."]}]}
Latest information available from the datafeeds is summarized below. For specific datafeed notes see the sections for XML (version 1 lacks some data, version 2 is complete) or JSON / JSONP.
- All Results Data, Always Present: the following information is found in the datafeed to describe the total search result and is always present.
- NumDocs: Total number of documents found for the search result, which is anywhere from 0 to the total number of documents in the index collection. The number of search results actually returned is indicated by the RangeShown.
- RangeShown: The window of actual search results contained in this datafeed, within the total found NumDocs. RangeShown is defined by a lower and upper value, where lower+1 is the first actual document number from the total set. You refer to RangeShown to display views that cycle through all search results, so for example if NumDocs is 211 then lower=0 upper=10 means you've got results 1-10 to show the user.
The range of results returned is set by two things, (1) the doc0 arg
in the call, which defaults to 0 if not present, and (2) the number of results per page that's set in your account manager's
Ranking Options feature, which can also be overridden per call by the nr argument (number results). You can see doc0 in use in the next pages links of PicoSearch's standard search results display, which you'll have to mimic from your scripts that use the datafeed. The connection with the datafeed is
illustrated below, for an XML example call:
- &xml=1&doc0=0 your search call includes these arguments
10 results per page selected in the account manager's Ranking Options
<RangeShown lower="0" upper="10"/> returned in the XML datafeed
Showing results 1-10 your results page will say this
- &xml=1&nr=5 doc0 defaults to 0, nr overrides any account settings
<RangeShown lower="0" upper="5"/>
Showing results 1-5
- &xml=1&doc0=10 back to 10 results per page in account settings
<RangeShown lower="10" upper="20"/>
Showing results 11-20
So from RangeShown, lower=doc0, and (upper - lower) = number of results.
- Opt: This value will be the type of search that was just performed, either "any" for results with any of the search words, "all" for results with all of the search words, or "exact" for results that have the exact phrase. The difference only matters when the user typed more than one search word, but the value is always returned so that any user interface settings can be updated. This data return matches the opt argument that can then go into the next search - see Initializing the ANY/ALL/EXACT state.
- Within and Lastq: The value for Within is 1 or 0 for whether the search was performed within the search before that. This value is suitable to feed into the next search's within argument, as well as to update your interface for the option. The value of Lastq is the last query performed, formatted for submitting with the next query to support the within=1 search within feature. See Initializing the Search Within.
- Sortsel: The value of Sortsel is either "relevancy" or "date" to show the kind of sorting that was done on the last search (additional values may be supported in the future). The value is suitable to feed into the next search's sortsel argument, as well as to update your interface for the option. See Initialiizing the Sort selection.
- Language: The Results Language setting from your account manager.
- Query: What the user typed and is searching for.
- All Results Data, May Be Present: the following information is found in the datafeed to describe the total search result and is present when appropriate.
- DirectURL: If you are using your account manager's Direct Searches feature and a match triggers, this is the URL that you can go directly to for the search result.
- AvailablePartitions: If you are using your account manager's Search Partitions feature, here is a list of the partitions available to query.
- SpellSugg: If you have your account manager's Spelling Corrections turned on, this will list any pairs found for SpellOriginal and SpellCorrect, which are what the user typed and what could be suggested as a spelling correction to submit for a corrected search. It's up to you whether and how you use this information in the interface that your scripts generate from the datafeed.
- Token: This is a list of all the individual terms that PicoSearch looked for in the index. You will find what the user typed, plus any grammatical expansions based on your account manager settings, including possessives, singular/plural, British/American spelling variants, and synonyms. Quoted phrases (Exact Phrase searches) are not broken apart.
- Match: A Match is a set of information about each matching document in the search results. The following pieces are returned depending on your account manager settings:
- Title: Document title. May contain HTML markup for bolding search hits, which you are welcome to strip out.
- Meta: Document meta description. May contain HTML markup for bolding search hits, which you are welcome to strip out.
- Concordance: Excerpted matching lines from the document, width depending on your account settings. May contain HTML markup for bolding search hits, which you are welcome to strip out.
- URL: Document URL.
- Date: Document date, if known and formatted according to your account settings.
- Size: Document size, if known and formatted according to your account settings.
- LinkPic: Full HTML link to a product photo or page icon, which can be associated with a search result as described in this FAQ.
- ContentInfo: Content information, if available. May include:
- ContentType: the file's mime content-type from the server's data field, examples: text/html, text/plain, application/pdf
- LinkIcon: an HTML formatted link to an icon on PicoSearch's servers suitable to display for this file type. Example for ContentType application/pdf: <img src="//www.picosearch.com/images/pdficons.gif" height="16" width="16" alt="PDF file" border="0"/>
- UserData: Extra data can be returned with each search result by the PICOSEARCH_DATA tag, equivalently spelled as PICODATA. Just put the following tag in your originating HTML page as an enrichment for the PicoSearch datafeed, and the UserData field will come back with your value whenever that page is found.
<!-- PICOSEARCH_DATA name="UserData" value="my alphanumeric information..." -->
- Partitions: List of Partitions that this document is found within, as defined in the Search Partitions section of your account manager. You can use this information to effectively categorize your documents and display them in groups or under different tabs.
|
|
 |
|
 |