Help with PicoSearch

Can I search my password protected directories?

Yes! After you have started an account, you can use the settings in your account manager to assist PicoSearch to enter the password protected areas of your website. Then you can reindex and successfully search all of your protected pages. Your website's protection is still effective because whenever a visitor clicks on a search result from the protected directories, they will still have to login to see the full pages. For additional privacy, you can choose to block the concordance (text excerpts) of any protected pages in the search results. Or you if you want to keep your members-only searches separate from public searches, you could set Partitions so only the search boxes inside your password protection can search all of your site, while the outside public search boxes access only the public partition.

To help PicoSearch enter your password protected directories, go to the Password Protection & Cookie Setting feature in your account manager. Here you can set the logins for the two primary kinds of password protection that your website is likely to use.

1) HTAccess: If your website pops up a standard grey box for logging in, that would be the HTTP protection known as HTAccess. For PicoSearch to get past this protection, simply enter the directory path of a protected area, a login name, and a password. (If your grey box also asks for a Domain, then you need to select for basic authentication in your MicroSoft Internet Information Server.) Free accounts get just one HTTP password entry combination, while paying accounts can have three. But if you need more entries, you can also list multiple paths for one master login name and password. Just contact us if you need any assistance.

2) Cookies: If your website login page has a custom look with entry boxes built into the website, then your protection system is based on scripts running on your web server. These scripts verify that a login is correct, and then somehow store the authorized state for the user to continue through the site.

If the authorized state is explicitly stored on the URLs with extra arguments, then the user is authenticated by the URL itself. An example might look like http://www.mysite.com/nextpage.asp?name=john&session=x3984f. In this case, you may just need to copy a URL with valid arguments into PicoSearch as the first Entry Point. PicoSearch has no problem following URLs with extra arguments, which is why shopping carts and other stuff are easily followed by PicoSearch. In the simplest cases, you won't even be reading this FAQ, because PicoSearch already got around your site just fine without knowing what all those URL arguments were for.

Cookies are another popular way that login states and additional information are stored for the visitor. After your user logs in, your website may give your visitor's browser a cookie, which is some information that your visitor's browser then shows each time it returns to see more protected pages.

PicoSearch fully supports the handling of any cookies which it is given by a webpage during indexing, so if you can just start PicoSearch on an Entry Point URL of your site that will set all the cookies required, that will work fine. Such a URL could be the public login page but with additional arguments to get in, looking something like http:/www.mysite.com/login.asp?user=john&password=magic (this is just an example, look in the code of your pages to see the actual arg names involved). Or to avoid putting arguments on the URL, you might make a special Entry URL as an orphan landing page that is not linked to your site so only PicoSearch knows about it, i.e. a backdoor entrance that gives the cookies for entry without question.

The tricky case is when you can't authorize PicoSearch with a single URL, either because your system doesn' allow arguments on the login URL, or (as is often the case) someone else set up your website and they aren't around to help you make further changes. Here too, PicoSearch can help by letting you initialize the cookies you need directly in your account manager, thus making your website think that PicoSearch has already logged in. And since cookies are name/value pairs of information that could have many uses, by allowing the pre-setting of cookies, PicoSearch supports any cookie-based initializations of your website that are useful for indexing and search purposes.

A cookie initiliazation is entered in your account manager's cookie entry box as follows. (Later we will look at how to make your browser tell you what cookies your site is expecting.)

domain.com/optional_path cookie_name=cookie_value

Domain.com is required, and it is the website that wants the cookie. This is not a full URL, so don't start it with http://, but it probably should have at least two dots, so if your cookie is for http://www.mysite.com then try .mysite.com as a setting. The /optional_path is just that, an optional subdirectory, which will make PicoSearch show the cookie only for those subdirectory's pages. Any spaces in the path should be replaced by the encoded equivalent of %20. Cookie_name is the name of the cookie, and cookie_value is the value. Name is usually something simple and short, while value is usually a long string of numbers and letters that tell your website's authentication system that an authorized user is returning for another webpage. Name and value will be divided at the first equal sign, in case there are any in the value.

Cookies are generated by your website and quietly given to your visitor's browser, to hold onto and show each time they return to your site. Unless you programmed your website, you probably have no idea what the underlying valid cookies are for your password protection. But with a little snooping in your browser, you should be able to look at your website's cookies before and after logging in, and thus tell which cookies are vital for getting past the password protection. There could well be many cookies associated with visiting your website, having to do with a user id, shopping cart, etc. Since PicoSearch will accept and hold cookies found during normal indexing, typically it is only the password protection cookies that come from logging in which you need to initialize manually. However some systems may require you to initialize most or all of the cookies that the browser gets, which couldn't hurt, so you may want to experiment and see what works.

Depending on what browser you're using, you may need to search the web for instructions on seeing the cookie values. It should certainly be possible. In Firefox, look under Tools - Options - Privacy - Show Cookies. Scroll to your website's name and expand the tree to see individual cookies, then you should be able to hilight values and cut-and-paste direclty. MicroSoft's Explorer is a little harder; it gets to cookies through Tools - Internet Options - General - Browsing History - Settings - View Files. Then you scroll to find filenames that start with "cookie" and have your site's name. You can either click on these files or drag them onto the browser, to see a list that starts with name then value. Feel free to contact us if you need any assistance.

Once you have your cookies set in PicoSearch, be sure to try indexing both immediately and later, like the next day. Cookies often come with expiration times that the browser is supposed to enforce. While PicoSearch will always use the same cookie values that you initialize and thus they won't expire, there's also the possibility that some websites have ways for ignoring old cookies. This probably won't usually be the case, even if the cookie values are made from session numbers that are continually generated (PicoSearch will just always look like an old user to your site). But if your site is so rigorous that it actively refuses older cookies, you may have to manually set new cookies in PicoSearch each time you need to reindex. This could be so annoying that you'll be further motivated to force the people who made your website to build a landing page for PicoSearch (as mentioned above in the alternatives to cookie initializing).



If you have other kinds of ASP or cookie based authentication that you can't get to work directly, you may still get PicoSearch to index your site by one of the tricks used for SSL servers - see SSL FAQ

Back to FAQs