Appendices
Appendix A: Estimated vs. Actual Number of Results
You can use the rc search parameter to request an accurate result count for up to 1M documents, but it might introduce high latency.
Counting Results in Secure Search
How the Google Search Appliance Determines the Number of Results to Return
Navigation
Automatic Filtering
When the automatic filtering feature is active, the number of results returned is significantly reduced. Automatic filtering reduces undesirable results such as duplicate entries. You can disable this feature using the instructions in Automatic Filtering.
Filtered search results are identified in the returned results. For example, the <FI/> XML tag is present in XML search results where automatic document filtering occurs.
In order to show you the most relevant results, we have omitted some entries very similar to the search results already displayed. If you like, you can repeat the search with the omitted results included.
This is the behavior you see in the default output format of the Google Search Appliance.
The underlined text in the message should be a hypertext link to submit the same search again with the parameter filter=0. Google finds that this method of informing users about automatic document filtering is effective. This method is used on the Google Internet search site.
Appendix B: URL Encoding
The HTTP URL syntax specifies that only alphanumeric characters, the special characters $-_.+!*’(),and the reserved characters ;/?:@=& can be used as values within an HTTP URL request. Since reserved characters are used by the search engine to decode the URL, and some special characters are used to request search features, all non-alphanumeric characters used as a value to an input parameter must be URL-encoded.
To URL-encode a string, replace each non-alphanumeric character with its hexadecimal ASCII value, in the format of a percent sign (%) character followed by two hexadecimal digits. Such an ASCII value may be referred to as an escape code. Spaces can be replaced by the plus sign (+) character for query parameters except when requesting search results by meta name or values.
If you are using the search box on the search appliance, you single-encode the special characters $-.+!*’(). Underscores (_) do not need to be URL-encoded in the search box.
If you are using special characters in a search query, you double-encode the special characters $-.+!*’().
Underscores (_) do not need to be URL-encoded in the search box or in a search query.
Some input parameters require that the values passed to Google search are double-URL-encoded. This requirement means that you must apply the URL encoding to the string twice in succession to generate the final value. See the input parameter descriptions (Search Parameters) for more information.
Special characters in a query are the ones described as query term separators (see Special Characters: Query Term Separators) and meta tags names and values. Special characters within the document content do not get indexed so they are not searchable. For example, an indexed document containing a paragraph ending with “the *end” is not searchable using query “%2Aend” in the GSA search box. Only ‘end’ is indexed.
For more information about URL encoding, see W3C (http://www.w3.org/TR/html401/interact/forms.html#form-content-type) and IETF (http://www.ietf.org/rfc/rfc1738.txt) web sites.
Examples
Appendix C: Date Formatting
Acceptable Date Formats
The following table lists date formats that you can use with the Google Search Appliance.
200903211642 (see Note 1 below) |
||
Date Formatting Notes
1.
|
The YYYYMMDDHH and YYYYMMDDHHmm patterns for specifying dates are supported, however, the search appliance has no notion of sorting search results based on the difference of time in document dates. For example, if a document has a meta tag with a value of 200910212150 and a second document with a value of 200910210900 then the search appliance discards both dates and sets document dates to their modification time (because the YYYYMMDDHHmm format does not get parsed).
|
3.
|
To specify rules for dates of documents:
5.
|
To add more rules, click the Add More Lines button.
|
6.
|
After all the rules are specified, click the Save Changes button.
|
Examples of Rules
Because the document http://www.foo.com/example/foo.html matches the URL pattern in rule 1, the search appliance first checks for the date in the title of the document. The URL doesn’t match rule 2, so the search appliance checks against rule 3. If the search appliance is unable to find a valid date in the title or the URL, the search appliance looks for the date in the meta tag named publication_date according to rule 3. If the search appliance is unable to find a valid date in the meta tag, the search appliance defaults to the last modified date of the HTTP server, according to rule 5.
The search appliance extracts the date from the http://www.foo2.com/archives/20040605/abc.html URL.
Because the document http://www.foo.com/foo.html does not match the URL pattern in rule 1, the search appliance looks for the date in the meta tag, according to rule 3 and defaults to rule 5 if the search appliance cannot find a valid date in rule 3.
For the document http://www.foo2.com/foo.html, the search appliance looks for the date in the body and defaults to the last-modified date.
For the document http://www.foo3.com/foo.html, the search appliance looks for the date only on the last-modified header as it only matches the URL pattern of rule 5.
Appendix D: Compressed Results
The Google Search Appliance supports serving compressed results.
Accept-Encoding: gzip
The search appliance will then serve compressed results. The browser uncompresses the results.
This applies to both XML and XSLT-transformed results. If the Accept-Encoding: gzip header is not present, the results are not compressed.