 |
|
 |
Yes, PicoSearch can search many formats.
To begin with, PicoSearch will index any plain text or HTML file. Technically, this means files that come over the http protocol
as being of content-type "text/html" or "text/plain". This should cover
pages that come from ASP and other scripts.
Additionally,
PicoSearch keeps getting smarter about more file types! Here is a
current list of what PicoSearch can index. Remember that your search
engine will also be limited to a maximum page limit, where one URL
document (file or webpage) generally equals one PicoSearch page
(extra-long HTML/text, or multi-page non-HTML formats like PDFs, may
yield more than 1 PicoSearch page per document). If you find yourself
needing more pages, see our plan rates and services.
For All Accounts
(Free as well as Professional and Premium Accounts!)
- HTML files (.html types including .htm and .shtml, or content-type "text/html")
PicoSearch
will index your HTML files, including any generated by addresses to
server scripts like ASP, CGI Perl, etc. This feature is on by default.
You can turn on/off the titles, meta-tags, image alt tags, and even the
whole page's body to get different searching effects - see the Index
Modes section of your Account Manager's indexing topics. (If your HTML
files aren't indexing as you expected, consider the FAQ on Finding all of your Pages
- Plain Text files (.txt, or content-type "text/plain")
PicoSearch
will index your plain text files. This feature is on by default, and
you can turn it off in the Index Modes section of your Account Manager's
indexing topics.
- XML files (.xml, or content-type "text/xml")
PicoSearch
will index the text (not tags) of your XML files. This feature is on by
default, and you can turn it off in the Additional Formats section of
your Account Manager's indexing topics.
Note: Delivery of results in XML with DTD is a separate feature that is available for paid accounts, see FAQ.
- Flash Shockwave Files (.swf or content-type "application/x-shockwave-flash")
PicoSearch
will follow your Shockwave Flash file links, so you can create exciting
navigation for your site using Adobe Macromedia Flash tools.
PicoSearch will also try to extract the text fields of your Shockwave files,
which may result in additional pages. We do not recommend building your
site entirely in Flash, because this often makes it impossible to
search, even though PicoSearch uses the latest official Flash software
to read your files. The best use of Flash for compatibility with search
engines is to make Flash parts within HTML files. If you do have a site that is all Flash and the text isn't being found, PicoSearch will search any HTML content found between the <noscript> ... </noscript> tags, so those tags are a good practice for a site that cannot be easily redesigned for greater compatibility. Flash Shockwave text
indexing can be turned on/off in the Additional Formats section of your
Account Manager's indexing topics.
- MP3 Files (.mp3 or content-type "audio/mpeg")
PicoSearch
will index the song title, artist, album, and other text tags in your
MP3 files which have been created by the ID3 tag format v1.0 and v1.1.
This feature is on by default, and you can turn it off in the Additional
Formats section of your Account Manager's indexing topics.
- MIDI Files (.midi or .mid or content-type "audio/midi")
PicoSearch will index your MIDI files in two ways. One, the name of the file will be indexed, as "song name: filename.mid".
Second, the text events of the MIDI standard will all be indexed.
These are the codes 1-7 respectively that are used for a general text
event, copyright info, track name, track instrument name, lyric, marker,
and cue. This feature is on by default, and you can turn it off in the
Additional Formats section of your Account Manager's indexing topics.
For Professional and Premium Accounts only
- Adobe PDF (.pdf or content-type "application/pdf")
PicoSearch
will index the text of your Adobe Acrobat PDF documents. Title and meta properties will be picked up where possible for display in search results, or can be controlled from the parent link. If the title property is blank then the file name
should get used. You may see strange default titles if you are exporting
from another application, in which case you need to set the Adobe title
property to something better. The Adobe title and keywords of the
document properties can also become part of the searchable document, see
the options in the Index Modes section of your account manager. PDF
indexing is on by default, and you can turn it off in the Additional
Formats section of your Account Manager's indexing topics.
Maximum
Pages: PDFs count
for the number of pages and can stretch account page limits. This policy
had to be enforced because too many sites were hosting massive PDFs of hundreds or
even thousands of pages each. To help you control your PDFs, there is
page limiter option available in the Additional Formats section of the
account manager. NOTE: If you are still having difficulty controlling your PDF sizes, please contact us for an alternate average page length counting formula that may be to your benefit.
Copy Protection: By default
PicoSearch will also honor the Acrobat security profile with which your
files have been saved, and will not index files that you have
copy-protected. You have a separate option to include copy-protected
PDFs. Thus, two common reasons for why a PDF file yeilds no content is
if it is all graphical, or it is copy-protected and the option to
include copy-protected PDFs is off.
- MS Word (.doc or content-type "application/msword")
MS Word Office Open XML (.docx "application/vnd.openxmlformats...")
PicoSearch
will index the text of your MicroSoft Word documents, including Office
2007. This feature is on by default, and you can turn it off in the
Additional Formats section of your Account Manager's indexing topics.
Title and meta properties will be picked up where possible for display in search results (subject
becomes meta description, keywords become meta keywords), or can be controlled from the parent link.
- MS Excel (.xls or content-type "application/msexcel")
MS Excel Office Open XML (.xlsx "application/vnd.openxmlformats...")
PicoSearch
will index the text of MicroSoft Excel spreadsheets, including Office
2007. This feature is on by default, and you can turn it off in the
Additional Formats section of your Account Manager's indexing topics.
Title and meta properties will be picked up where possible for display in search results (subject
becomes meta description, keywords become meta keywords), or can be controlled from the parent link.
- MS PowerPoint (.ppt or content-type "application/mspowerpoint")
MS PowerPoint Office Open XML (.pptx "application/vnd.openxmlformats...")
PicoSearch
will index the text of MicroSoft PowerPoint presentations, including
Office 2007. This feature is on by default, and you can turn it off in
the Additional Formats section of your Account Manager's indexing
topics. Title and meta properties will be picked up where possible for display in search results
(subject becomes meta description, keywords become meta keywords), or can be controlled from the parent link.
- Rich Text Format (.rtf or content-type "text/rtf" or "application/rtf")
PicoSearch
will index the rich text format, commonly used in MicroSoft
applications. This feature is on by default, and you can turn it off in
the Additional Formats section of your Account Manager's indexing
topics. Title and meta properties will be picked up where possible for display in search results, or can be controlled from the parent link.
- Adobe PostScript (.ps or content-type "application/postscript")
PicoSearch
will index the text of your Adobe PostScript documents. This feature is
on by default, and you can turn it off in the Additional Formats
section of your Account Manager's indexing topics. Title and meta properties will be picked up where possible for display in search results, or can be controlled from the parent link.
Title and Meta Trick: For non-HTML documents you may have
special application attributes that are not in the body of the document.
PicoSearch can usually index PDF titles and metas with no problem, but for other
formats it may not pick up such things. Then the title will default to
the URL (see switch "Show Just File Name if Title Defaults to URL" under
Configure Results in account manager), the meta description will be the
first few lines of the document, and the keywords found will be those
in the text. But since you can also set titles and metas from the parent link, you can make a reference page of links to non-HTML formats with titles and metas as you want them to be displayed in PicoSearch, and make this the first Entry Point for full central control.
|
|
 |
|
 |