Help with PicoSearch

How can I combine sites in my search engine to get a mini web portal effect?

PicoSearch offers to everyone extensive controls for building multi-site search engines, called Guest URLs and Portal Lists. For an introduction to what these can do for you, read the corresponding announcement in What's New: Include Guests and become a mini Web Portal! Below we will talk in detail on how and when to make use of these exciting new features.
 
Entry points: First there was the entry point. Free Accounts get 3 and Professional and Premium Accounts get an unlimited number. Entry points register with your search engine the start of your website or websites. When the indexer runs, it will visit all of the links of all your website(s) starting from the entry points. If your sites are big or many, you may run out of the number of pages allowed for your account type. Even if you do not run out, the entry points determine the bounds of your search engine. When the indexer comes across a link, it will not go outside of the sum total of your entry points' directories, servers, or domain names, respectively, depending on your link-following restriction settings. So you see, while entry points are the way to go for packaging your website content, they are insufficient for the precise inclusion of windows into other websites, like you would need to become a mini web portal. Even if you have the entry points to spare to include all your friends, you never know when you might run out of pages for everyone!
 
Guest URLs: If you check this option in your account manager and update, you will get input boxes for a maximum number of pages per entry point. This is good for adding guest URLs to your search engine, so they don't eat up your whole account. Entry points that have no maximum set will still be collected up to your total maximum of pages. But guest entry points with a maximum will have their links followed "first come first serve" until the guest's maximum is reached. The guests will still be fully functional entry points however, meaning that their domain boundaries will be in effect for the entire indexing process. Anytime a document anywhere (on your site or theirs) contains a link to a guest within bounds, more pages will be collected, up to the guest's total maximum. This helps to enforce a fairness across your index, so everyone will be indexed up to their limits.
 
The technical term for how PicoSearch collects all the links of a document before going deeper into the links is called "breadth first" - this assures that your URLs will get collected evenly going into everyone's sites, even if the sites link back and forth a lot. If you impose small maximums on your guest URLs, your guests will have to make sure that their text to be indexed will be found in the nearest links to their entry point. But what Guest URLs are good for is giving your friends a couple of hundred pages - enough to please them for now while ensuring that they won't suddenly cut you off if they double or triple the size of their site. If your maximums are very small and you have many guests, then maybe what you really want to use is a Portal List.
 
Portal Lists: A Portal List is a file on your site that lists URLs that you want to follow for one or more documents each. One use is to list exactly which documents you want in your index, rather than relying on link following from your entry points. Or you can effectively instruct your search engine to sample the internet at given points, with explicit instructions to get at most a certain number of pages and/or files, then come right back again. This way you can add a little bit of many sites to your search engine, so that you can function just like a mini web portal for the best the internet has to offer your visitors.
 
The links in the Portal List are not like your fully registered entry points. If the indexer comes across a link from your portal list on your entry point web pages first, then the portal link will be ignored as duplicate. But like an entry point with Directory Restriction, a portal link restricts the links obtained from it to be within the same directory. PicoSearch is effectively sampling sites from the Portal List, making it perfect for mini portal control.
 
The Portal List resides on your site so that you can control the links on it at all times. The Portal List probably should not be linked from the rest of your site, so that PicoSearch doesn't first find it and store it as a normal page. The Portal List is supposed to be a special document just for PicoSearch to get links from, where each link is a normal HTML link with a fully-qualified URL, like this:
<a href="http://www.thatsite.com/">That Site Description</a>
 
You supply PicoSearch with the full URL to your Portal List file, along with the maximum number of pages and/or documents per link to index. Although the maximums apply to all links, by setting them separately you have more control than an entry point Guest URL which only sets the page limit. The portal maximums can also be left blank for no limiting, so you can combine them for effects like indexing up to 10 documents from each site for a maximum of 20 pages total (in case there are multi-page PDFs or long files), or indexing up to 10 documents per site with no page limit.
 
The Portal List document itself will not be indexed. Thus the default option to "Use text of each link in: list only" means that the names you give to the links will not be searched. If you want your link names to be searchable, choose "searchable bodies" for the link text, and then your descriptive text for each link will be added to the start of each linked document's searchable body text. That way you can be sure of a name for each document to be found under. Choose "subsequent titles" or "subsequent meta descs" if you want your link text to go into the title or meta description of the portal documents. Any of these choices will be affected by your current settings for title, meta, and alt indexing, because these options will be in effect for all portal documents as well as your own site's documents. See the 'Index Modes' section in your Account Manager - all accounts start with titles only on, in addition of course to the entire document's body text.
 
There! Now you know all there is to know, and you too can customize your own multi-site search and mini web portal!

Back to FAQs