Results Format
This section covers the following topics:
Custom HTML
This section describes the custom HTML results.
Custom HTML Output Overview
Google Search Appliance has a built-in XSLT (eXtensible Stylesheet Language Transformation) server, and can generate custom HTML using your XSL stylesheet. Search requests that include the output parameter set to xml_no_dtd and a valid proxystylesheet parameter value are automatically processed by the XSLT server as requests for custom HTML output.
Using the XSL stylesheet specified by the proxystylesheet parameter, the XSLT server applies the transformation rules found in the XSL stylesheet to the standard Google XML results. Although this document assumes that the output generated by applying the XSL stylesheet is HTML, almost any output format can be generated by using appropriate XSL stylesheet rules. For any front end, the default XSL stylesheet can be customized or replaced by the search administrator.
To customize the XSL stylesheet used to generate custom HTML output, see XML Output to determine the XML tags that may be transformed using a customized XSL stylesheet.
Additionally, you can leverage the proxycustom parameter to pass custom XML tags to the XSLT server. Because including custom XML does not generate search results, this feature is useful for implementing additional static search pages, such as an advanced search page.
•
|
XSL stylesheets used by the XSLT server are cached for 15 minutes. To force the XSLT server to use the latest version of an XSL stylesheet, set the proxyreload input parameter to a value of 1 in your search request.
|
•
|
•
|
When you request cached results in custom HTML output, the BLOB XML tag and associated value are automatically converted to the original text before the XSL stylesheet rules are applied. When using an XSL stylesheet that customizes cache results, simply use the values of the CACHE_LEGEND_TEXT, CACHE_LEGEND_NOTFOUND and CACHE_LEGEND_HTML XML tags directly instead of applying a rule on the BLOB subtag.
|
•
|
•
|
Internationalization
To support all the encoding schemes supported by Google, the XSLT server follows a process to ensure that the results are returned in the correct encoding scheme. When requesting search results through the XSLT server, the server translates the results to the UTF8 encoding scheme before applying the selected XSL stylesheet. After the XSL stylesheet rules are applied to generate the results, the results are converted to the encoding scheme that is specified by the output encoding parameter, oe. The one exception to this rule is cached result pages, which get converted to the encoding scheme of the cached document after XSLT processing.
Each front end for your search appliance is associated with an underlying stylesheet. All XSL stylesheets must be in latin1 or UTF8 formats.
XML Output
The description of the XML results format contains the following sections:
XML Output Overview
For maximum flexibility, Google provides search results in XML format. Using the Google XML results, you can use your own XML parser to customize the display for your search users. If you are using an XSL stylesheet to transform the XML results instead of developing your own XML parser, proceed to Custom HTML.
•
|
For custom parameters that contain spaces, each space is replaced with “_”. You can still retrieve the unmodified value from the original_value attribute. For example:
|
<param name="temp" value="token_ring" original_value="token+ring" />
Character Encoding Conventions
The first line of the XML results indicates which character encoding is used. See XML Standard for information about character encoding (http://www.w3.org/TR/1998/REC-xml-19980210#charencoding).
Certain characters must be escaped when they are included as values in XML tags. These characters are documented in XML Standard (http://www.w3.org/TR/1998/REC-xml-19980210#dt-escape), and are shown in the table that follows. All other characters in the XML results are presented without modification.
either < or < |
|
either & or & |
|
either > or > |
|
either ' or ' |
|
either " or " |
Google XML Results DTD
To get results in XML output format, use one of the following parameters in the search request:
•
|
output=xml_no_dtd (recommended), or
|
When you use the xml output format, the XML results include the line:
<!DOCTYPE GSP SYSTEM "google.dtd">
The DTD is available on the Google Search Appliance at http://<appliance_hostname>/google.dtd.
Google XML Tag Definitions
This section contains an index of Google’s XML tags.
BLOB
Format/Parent
CACHE_HTML, CACHE_LEGEND_NOTFOUND, CACHE_LEGEND_TEXT
Subtags
Definition
Attributes
The encoding scheme of the HTML data (See Internationalization for a list of common encoding values) |
C
Format/Parent
Subtags
Definition
Indicates that the “cache:” special query term is supported for this search result URL.
Cached results are suppressed and this element is not returned if the <head> tag of the document contains the following <meta> tag: <meta name="ROBOTS" value="noarchive">.
Attributes
"cache:" + CID text + ":" + encoded URL. The encoded URL is available in the UE tag. Send this search term normally, as you would type it into the search form. |
||
The encoding of the document in the cache. See Internationalization for a list of common values. |
CACHE
Format/Parent
Subtags
CACHE_URL, CACHE_REDIR_URL, CACHE_LAST_MODIFIED, CACHE_LEGEND_FOUND?, CACHE_LEGEND_NOTFOUND?, CACHE_CONTENT_TYPE, CACHE_LANGUAGE, CACHE_ENCODING, CACHE_HTML
Definition
Encapsulates the cached version of a search result.
Attributes
CACHE_CONTENT_TYPE
Format/Parent
Subtags
Definition
Attributes
CACHE_HTML
Format/Parent
Text (HTML) (Custom HTML output only)
Subtags
BLOB? (XML output only)
Definition
The cached version of the search result. All search results are stored in HTML format.
Attributes
CACHE_ENCODING
Format/Parent
Subtags
Definition
The encoding scheme of the cached result, as specified in the HTTP header that is returned when the document is crawled. (See Internationalization for a list of common values.)
Attributes
CACHE_LANGUAGE
Format/Parent
Subtags
Definition
The language of the cached result as determined by Google’s automatic language classification algorithm. The value of this tag is the same as the values used for the automatic language collections without the “lang_” prefix (see Automatic Language Filters).
Attributes
CACHE_LAST_MODIFIED
Format/Parent
Subtags
Definition
Attributes
CACHE_LEGEND_FOUND
Format/Parent
Subtags
Definition
Encapsulates query terms that are found in the visible text of the cached result returned.
Attributes
CACHE_LEGEND_NOTFOUND
Format/Parent
Text (Custom HTML output only)
Subtags
BLOB? (XML output only)
Definition
Details of any query terms that are not visible in the cached result returned.
Attributes
CACHE_LEGEND_TEXT
Format/Parent
Text (Custom HTML output only)
Subtags
BLOB (XML output only)
Definition
Attributes
CACHE_REDIR_URL
Format/Parent
Subtags
Definition
Final URL of cached result after all redirects are resolved.
Attributes
CACHE_URL
Format/Parent
Subtags
Definition
Attributes
CRAWLDATE
Format/Parent
Subtags
Definition
Attributes
CT
Format/Parent
Subtags
Definition
Example comment: Sorry, no content found for this URL
Attributes
CUSTOM
Format/Parent
Subtags
(Custom XML specified in the search request)
Definition
Encapsulates custom XML tags that are specified in the proxycustom input parameter.
Attributes
ENT_SOURCE
Format/Parent
Subtags
Definition
Identifies the application ID (serial number) of the search appliance that contributes to a result.
<ENT_SOURCE>S5-KUB000F0ADETLA</ENT_SOURCE>
Attributes
ENTOBRESULTS
Format/Parent
Subtags
Definition
Encapsulates the results returned by OneBox modules.
Attributes
FI
Format/Parent
Subtags
Definition
Indicates that document filtering was performed during this search.
See Automatic Filtering for more details
Attributes
FS
Format/Parent
Subtags
Definition
Additional details about the search result.
Attributes
GD
Format/Parent
Subtags
Definition
Contains the description of a KeyMatch result.
Attributes
GL
Format/Parent
Subtags
Definition
Contains the URL of a KeyMatch result.
Attributes
GM
Format/Parent
Subtags
Definition
Encapsulates a single KeyMatch result.
Attributes
GSP
Format/Parent
Subtags
(CT?, CUSTOM?, ENTOBRESULTS, GM*, PARAM+, Q, RES?, Spelling?, Synonyms?, TM) | CACHE
Definition
GSP = “Google Search Protocol”
Encapsulates all data that is returned in the Google XML search results.
Attributes
Indicates version of the search results output. The current output version is “3.2”. |
HAS
Format/Parent
Subtags
Definition
Encapsulates special features that are included for this search result.
Attributes
HN
Format/Parent
Text (URL-encoded web directory, see Appendix B: URL Encoding)
Subtags
Definition
Indicates that filtering has occurred and that additional results are available from the directory where this search result was found. The value of this tag is ready to be used with the site: query term (see Directory Restricted Search).
Attributes
L
Format/Parent
Subtags
Definition
Indicates that the “link:” special query term is supported for this search result URL.
Attributes
LANG
Format/Parent
Subtags
Definition
Indicates the language of the search result. The LANG element contains a two-letter language code. See Automatic Language Filters for language codes.
Attributes
M
Format/Parent
Subtags
Definition
The estimated total number of results for the search.
The estimate of the total number of results for a search can be too high or too low. See Appendix A: Estimated vs. Actual Number of Results.
Attributes
MT
Format/Parent
Subtags
Definition
Meta tag name and value pairs obtained from the search result.
Only meta tags (see Meta Tags) that are requested in the search request are returned.
Attributes
NB
Format/Parent
Subtags
Definition
Encapsulates the navigation information for the result set.
The NB tag is present only if either the previous or additional results are available.
Attributes
NU
Format/Parent
Subtags
Definition
Contains a relative URL pointing to the next results page.
The NU tag is present only when more results are available.
Attributes
OBRES
Format/Parent
Subtags
The contents of the OBRES element are provided by the OneBox module, and must conform to the OneBox Results Schema. See the specific OneBox module’s documentation for details. See also the Google OneBox for Enterprise Developer’s Guide.
Definition
Encapsulates a result returned by a OneBox module.
Attributes
OneSynonym
Format/Parent
Subtags
Definition
A related query for the submitted query, in HTML format.
Attributes
The URL-encoded version of the related query (see Appendix B: URL Encoding) |
PARAM
Format/Parent
Subtags
Definition
Attributes
Original URL-encoded version of the input parameter value (see Appendix B: URL Encoding) |
PARM
Format/Parent
Subtags
Definition
Encapsulates all dynamic navigation results.
Attributes
PC
Format/Parent
Subtags
Definition
Indicates whether the counts are exact or partial. 0-exact, 1-partial.
PMT
Format/Parent
Subtags
PV+
Definition
Attributes
Attribute type: 0-String, 1-Integer, 2-Float, 3-Currency, 4-Date |
PU
Format/Parent
Subtags
Definition
Contains relative URL to the previous results page.
The PU tag is present only if previous results are available.
Attributes
PV
Format/Parent
Subtags
Definition
Encapsulates one value count information.
Attributes
Q
Format/Parent
Subtags
Definition
The search query terms submitted to the Google search appliance to generate these results.
Attributes
R
Format/Parent
Subtags
CRAWLDATE, FS?, HAS, HN?, LANG, MT*, RK, S?, T?, U, UD, UE
Definition
Encapsulates the details of an individual search result.
Attributes
The recommended indentation level of the results. This value is 1 unless Duplicate Directory Filtering occurs (see Automatic Filtering). In this case, the second directory result has a value of 2. |
||
RES
Format/Parent
Subtags
Definition
Encapsulates the set of all search results.
Attributes
The index (1-based) of the first search result returned in this result set. |
||
Indicates the index (1-based) of the last search result returned in this result set. |
RK
Format/Parent
Text (Integer in the range 0-10)
Subtags
Definition
The RK parameter assigns a ranking score to each page on a scale from 0 (least important) to 10 (most important) based on how well the result matches the query. When search results are sorted by relevancy, the RK value is in decreasing order (highest to lowest).
To see the RK values, you must view search results in raw XML, as described in the following steps:
2.
|
If not already selected, click on Sort by relevance.
|
3.
|
On the Advanced Search page, edit the query parameters:
|
a.
|
b.
|
Remove &proxystylesheet=default_frontend
|
c.
|
Add &getfield=*
|
The XML results show the RK parameter for each result, for example: <RK>10</RK>.
Attributes
S
Format/Parent
Subtags
Definition
The snippet for the search result.
Query terms appear in bold in the results. Line breaks are included for proper text wrapping.
Attributes
SCOREBIAS
Format/Parent
Subtags
Definition
The SCOREBIAS tag can appear zero or more times as a child of the R tag (see R) for each result. The SCOREBIAS tag appears for each result biaser that is applied.
The NAME attribute is the name of the result biaser.
The VALUE attribute indicates the effect of the biaser. For biasers where the strength is expressed symbolically, such as source or collection biasing and metadata biasing.
The following example indicates a medium increase in the PatternScorer result biaser:
<SCOREBIAS NAME="PatternScorer" VALUE="2">
Attributes
For biasers that do not use a symbolic change, such as date biasing, VALUE has these numerical values:
Spelling
Format/Parent
Subtags
Definition
Attributes
Suggestion
Format/Parent
Subtags
Definition
An alternate spelling suggestion for the submitted query, in HTML format.
Attributes
Synonyms
Format/Parent
Subtags
Definition
Attributes
T
Format/Parent
Subtags
Definition
The title of the search result.
Attributes
TM
Format/Parent
Subtags
Definition
Total server time to return search results, measured in seconds.
Attributes
U
Format/Parent
Subtags
Definition
Attributes
UD
Format/Parent
Text (URL to display for non-ASCII URLs)
Subtags
Definition
The URL string to display when the URL that is in the U parameter is non-ASCII. Displays UTF-8 characters and IDNA domain names properly.
Attributes
UE
Format/Parent
Text (URL-encoded version of the URL)
Subtags
Definition
The URL-encoded version of the URL that is in the U parameter.
Attributes
XT
Format/Parent
Subtags
Definition
Indicates that the estimated total number of results specified in this search result is exact.
See Automatic Filtering for more details.