 |
|
 |
Yes! After you have started an account, you can use the settings
in your account manager to assist PicoSearch to enter the password
protected areas of your website. Then you can reindex and successfully
search all of your protected pages. Your website's protection is still
effective because whenever a visitor clicks on a search result from the
protected directories, they will still have to login to see the full
pages. For additional privacy, you can choose to block the concordance
(text excerpts) of any protected pages in the search results. Or you if
you want to keep your members-only searches separate from public
searches, you could set Partitions so only the search boxes inside your password protection can search all
of your site, while the outside public search boxes access only the
public partition.
To help PicoSearch enter your password protected directories, go to the
Password Protection & Cookie Setting feature in your account
manager. Here you can set the logins for the two primary kinds of
password protection that your website is likely to use.
1) HTAccess: If your website pops up a standard grey box for
logging in, that would be the HTTP protection known as HTAccess. For
PicoSearch to get past this protection, simply enter the directory path
of a protected area, a login name, and a password. (If your grey box
also asks for a Domain, then you need to select for basic authentication
in your MicroSoft Internet Information Server.) Free accounts get just
one HTTP password entry combination, while paying accounts can have
three. But if you need more entries, you can also list multiple paths
for one master login name and password. Just contact us if you need any assistance.
2) Cookies: If your website login page has a custom look with
entry boxes built into the website, then your protection system is based
on scripts running on your web server. These scripts verify that a
login is correct, and then somehow store the authorized state for the
user to continue through the site.
If the authorized state is explicitly stored on the URLs with extra
arguments, then the user is authenticated by the URL itself. An example
might look like
http://www.mysite.com/nextpage.asp?name=john&session=x3984f. In
this case, you may just need to copy a URL with valid arguments into
PicoSearch as the first Entry Point. PicoSearch has no problem following
URLs with extra arguments, which is why shopping carts and other stuff
are easily followed by PicoSearch. In the simplest cases, you won't even
be reading this FAQ, because PicoSearch already got around your site
just fine without knowing what all those URL arguments were for.
Cookies are another popular way that login states and additional
information are stored for the visitor. After your user logs in, your
website may give your visitor's browser a cookie, which is some
information that your visitor's browser then shows each time it returns
to see more protected pages.
PicoSearch fully supports the handling of any cookies which it is given
by a webpage during indexing, so if you can just start PicoSearch on an
Entry Point URL of your site that will set all the cookies required,
that will work fine. Such a URL could be the public login page but with
additional arguments to get in, looking something like
http:/www.mysite.com/login.asp?user=john&password=magic (this is
just an example, look in the code of your pages to see the actual arg
names involved). Or to avoid putting arguments on the URL, you might
make a special Entry URL as an orphan landing page that is not linked to
your site so only PicoSearch knows about it, i.e. a backdoor entrance
that gives the cookies for entry without question.
The tricky case is when you can't authorize PicoSearch with a single
URL, either because your system doesn' allow arguments on the login URL,
or (as is often the case) someone else set up your website and they
aren't around to help you make further changes. Here too, PicoSearch can
help by letting you initialize the cookies you need directly in your
account manager, thus making your website think that PicoSearch has
already logged in. And since cookies are name/value pairs of information
that could have many uses, by allowing the pre-setting of cookies,
PicoSearch supports any cookie-based initializations of your website
that are useful for indexing and search purposes.
A cookie initiliazation is entered in your account manager's cookie
entry box as follows. (Later we will look at how to make your browser
tell you what cookies your site is expecting.)
domain.com/optional_path cookie_name=cookie_value
Domain.com is required, and it is the website that wants the cookie.
This is not a full URL, so don't start it with http://, but it probably
should have at least two dots, so if your cookie is for
http://www.mysite.com then try .mysite.com as a setting. The
/optional_path is just that, an optional subdirectory, which will make
PicoSearch show the cookie only for those subdirectory's pages. Any
spaces in the path should be replaced by the encoded equivalent of %20.
Cookie_name is the name of the cookie, and cookie_value is the value.
Name is usually something simple and short, while value is usually a
long string of numbers and letters that tell your website's
authentication system that an authorized user is returning for another
webpage. Name and value will be divided at the first equal sign, in case
there are any in the value.
Cookies are generated by your website and quietly given to your
visitor's browser, to hold onto and show each time they return to your
site. Unless you programmed your website, you probably have no idea what
the underlying valid cookies are for your password protection. But with
a little snooping in your browser, you should be able to look at your
website's cookies before and after logging in, and thus tell which
cookies are vital for getting past the password protection. There could
well be many cookies associated with visiting your website, having to do
with a user id, shopping cart, etc. Since PicoSearch will accept and
hold cookies found during normal indexing, typically it is only the
password protection cookies that come from logging in which you need to
initialize manually. However some systems may require you to initialize
most or all of the cookies that the browser gets, which couldn't hurt,
so you may want to experiment and see what works.
Depending on what browser you're using, you may need to search the web
for instructions on seeing the cookie values. It should certainly be
possible. In Firefox, look under Tools - Options - Privacy - Show
Cookies. Scroll to your website's name and expand the tree to see
individual cookies, then you should be able to hilight values and
cut-and-paste direclty. MicroSoft's Explorer is a little harder; it gets
to cookies through Tools - Internet Options - General - Browsing
History - Settings - View Files. Then you scroll to find filenames that
start with "cookie" and have your site's name. You can either click on
these files or drag them onto the browser, to see a list that starts
with name then value. Feel free to contact us if you need any assistance.
Once you have your cookies set in PicoSearch, be sure to try indexing
both immediately and later, like the next day. Cookies often come with
expiration times that the browser is supposed to enforce. While
PicoSearch will always use the same cookie values that you initialize
and thus they won't expire, there's also the possibility that some
websites have ways for ignoring old cookies. This probably won't
usually be the case, even if the cookie values are made from session
numbers that are continually generated (PicoSearch will just always look
like an old user to your site). But if your site is so rigorous that it
actively refuses older cookies, you may have to manually set new
cookies in PicoSearch each time you need to reindex. This could be so
annoying that you'll be further motivated to force the people who made
your website to build a landing page for PicoSearch (as mentioned above
in the alternatives to cookie initializing).
If you have other kinds of ASP or cookie based
authentication that you can't get to work directly, you may still get
PicoSearch to index your site by one of the tricks used for SSL servers - see SSL FAQ
|
|
 |
|
 |