 |
|
 |
PicoSearch offers to everyone extensive controls for building multi-site
search engines, called Guest URLs and Portal Lists. For an
introduction to what these can do for you, read the corresponding
announcement in What's New: Include Guests and become a mini Web Portal! Below we will talk in detail on how and when to make use of these exciting new features.
Entry points: First there was the entry point. Free Accounts get
3 and Professional and Premium Accounts get an unlimited number. Entry
points register with your search engine the start of your website or
websites. When the indexer runs, it will visit all of the links of all
your website(s) starting from the entry points. If your sites are big
or many, you may run out of the number of pages allowed for your account
type. Even if you do not run out, the entry points determine the
bounds of your search engine. When the indexer comes across a link, it
will not go outside of the sum total of your entry points' directories,
servers, or domain names, respectively, depending on your link-following
restriction settings. So you see, while entry points are the way to go
for packaging your website content, they are insufficient for the
precise inclusion of windows into other websites, like you would need to
become a mini web portal. Even if you have the entry points to spare
to include all your friends, you never know when you might run out of
pages for everyone!
Guest URLs: If you check this option in your account manager and
update, you will get input boxes for a maximum number of pages per entry
point. This is good for adding guest URLs to your search engine, so
they don't eat up your whole account. Entry points that have no maximum
set will still be collected up to your total maximum of pages. But guest
entry points with a maximum will have their links followed "first come
first serve" until the guest's maximum is reached. The guests will still
be fully functional entry points however, meaning that their domain
boundaries will be in effect for the entire indexing process. Anytime a
document anywhere (on your site or theirs) contains a link to a guest
within bounds, more pages will be collected, up to the guest's total
maximum. This helps to enforce a fairness across your index, so
everyone will be indexed up to their limits.
The technical term for how PicoSearch collects all the links of a
document before going deeper into the links is called "breadth first" -
this assures that your URLs will get collected evenly going into
everyone's sites, even if the sites link back and forth a lot. If you
impose small maximums on your guest URLs, your guests will have to make
sure that their text to be indexed will be found in the nearest links to
their entry point. But what Guest URLs are good for is giving your
friends a couple of hundred pages - enough to please them for now while
ensuring that they won't suddenly cut you off if they double or triple
the size of their site. If your maximums are very small and you have
many guests, then maybe what you really want to use is a Portal List.
Portal Lists: A Portal List is a file on your site that lists
URLs that you want to follow for one or more documents each. One use is
to list exactly which documents you want in your index, rather than
relying on link following from your entry points. Or you can effectively
instruct your search engine to sample the internet at given points,
with explicit instructions to get at most a certain number of pages
and/or files, then come right back again. This way you can add a little
bit of many sites to your search engine, so that you can function just
like a mini web portal for the best the internet has to offer your
visitors.
The links in the Portal List are not like your fully registered entry
points. If the indexer comes across a link from your portal list on your
entry point web pages first, then the portal link will be ignored as
duplicate. But like an entry point with Directory Restriction, a portal
link restricts the links obtained from it to be within the same
directory. PicoSearch is effectively sampling sites from the Portal
List, making it perfect for mini portal control.
The Portal List resides on your site so that you can control the links
on it at all times. The Portal List probably should not be linked from
the rest of your site, so that PicoSearch doesn't first find it and
store it as a normal page. The Portal List is supposed to be a special
document just for PicoSearch to get links from, where each link is a
normal HTML link with a fully-qualified URL, like this:
<a href="http://www.thatsite.com/">That Site Description</a>
You supply PicoSearch with the full URL to your Portal List file, along
with the maximum number of pages and/or documents per link to index.
Although the maximums apply to all links, by setting them separately you
have more control than an entry point Guest URL which only sets the
page limit. The portal maximums can also be left blank for no limiting,
so you can combine them for effects like indexing up to 10 documents
from each site for a maximum of 20 pages total (in case there are
multi-page PDFs or long files), or indexing up to 10 documents per site
with no page limit.
The Portal List document itself will not be indexed. Thus the default
option to "Use text of each link in: list only" means that the names you
give to the links will not be searched. If you want your link names to
be searchable, choose "searchable bodies" for the link text, and then
your descriptive text for each link will be added to the start of each
linked document's searchable body text. That way you can be sure of a
name for each document to be found under. Choose "subsequent titles" or
"subsequent meta descs" if you want your link text to go into the title
or meta description of the portal documents. Any of these choices will
be affected by your current settings for title, meta, and alt indexing,
because these options will be in effect for all portal documents as well
as your own site's documents. See the 'Index Modes' section in your
Account Manager - all accounts start with titles only on, in addition of
course to the entire document's body text.
There! Now you know all there is to know, and you too can customize your own multi-site search and mini web portal!
|
|
 |
|
 |