Table of Contents |
When you search for a word or phrase, AV Search examines the selected index to determine the location of the documents that contain the word or phrase you specified. AV Search lists the results as hyperlinks on the search page, with the most relevant content at the top of the list.
AV Search provides two methods for specifying queries:
| Simple search |
|
|---|---|
| Advanced search |
|
When your query results in a large number of matching documents, AV Search divides the results into separate pages for display. You can click on a specific page number to see the links on that page, or use the Next or Previous links to navigate through the pages in sequence.
For a simple search, or an advanced search with ranking, AV Search displays a maximum of 200 search results regardless of how many documents matched the query.
[Return to top of section] [Return to top of document]
The simple search interface allows you to search for documents containing a word or phrase.
Simple search supports the use of the + and - symbols to specify whether query terms must be included in, or excluded from, resulting documents. \ See Tips on Constructing Queries for Simple Searches for more information.
The results of the search are automatically ranked; see Ranking the Results of a Simple Search for more information.
To make your search more effective, consider the following tips:
Example: Querying for leather shoes footwear instead of increases the chance of finding documents about leather shoes.
Example: bicycle "for sale" finds documents that contain either the phrase for sale or the word bicycle.
Example: quilt* finds the words quilts, quilting, and quilted.
Example: +noir +film -"pinot noir" finds documents containing both noir and film but not the phrase pinot noir.
Note: The + and - operators are unary (not binary) operators; that is, if you want to ensure that two terms are included in returned documents, you must use the + operator on both terms. For example, noir +film is not valid syntax.
AV Search ranks the results of a simple search based on a computed score that includes the following criteria:
Whether the query terms are found in the first few lines of the document (for example, in the title of a Web page).
Whether all the query words or phrases appear in a document.
A document containing all three words specified in a three-word query would rank higher than a document containing only two or one of the words.
The frequency of occurrence of a query word or phrase relative to other words in the index.
Query terms that occur infrequently in the index are weighted more heavily than those that occur often.
Each document receives a weight which is the sum of the weights of each query word that appears in the document. Documents with the highest total weight are ranked highest in the results list.
The number of times a query term appears in the document does not affect its ranking; a document with a single occurrence of a query term would receive the same weight as a document with 50 occurrences of the same word.
If the documents returned by AV Search are not as relevant as you expect, try refining your search by using a more specific query, or using the include/exclude operators (+/-). For example, to refine the leather shoes footwear query to exclude references to men's shoes, add the term -men (shoes leather footwear -men).
[Return to top of section] [Return to top of document]
Advanced searches allow you to query with greater precision than is possible with simple searches. The following features are supported in the advanced search interface:
The advanced search interface allows you to use Boolean operators (AND, OR, NOT, NEAR) in the query string to obtain more precise results. For more information on Boolean operators, see Advanced Search Syntax.
For examples of advanced queries, see Tips on Constructing Queries for Advanced Searches.
You can optionally specify ranking rules to filter search results before they are displayed; see Ranking the Results of an Advanced Search for more information.
The advanced search interface allows you to restrict results to those documents that were created within a specified range of dates. See Querying Within a Range of Datesfor details.
Use the advanced search interface to obtain a count of the documents that meet your search criteria. Note: Do not specify ranking rules when attempting to obtain a count of results.
The Boolean operators recognized in an advanced query string are as follows:
| Operator | Symbol Equivalent | Action |
|---|---|---|
| AND | & | Finds only documents containing all the specified words or phrases. |
| OR | | | Finds documents containing at least one of the specified words or phrases. |
| NOT | ! | Excludes documents containing the specified word or phrase. |
| NEAR | ~ | Finds documents containing both specified words or phrases that are within 10 words of each other. |
You can enter an operator in all uppercase or all lowercase letters. Using uppercase letters allows you to more easily distinguish the operators from the query terms. You can use a symbol in place of an operator, as indicated in the table.
The following examples show how to use operators and parentheses to construct a query for an advanced search.
| Query | Description |
|---|---|
| (apple OR pear) AND (tart OR pie) | Either of the words apple or pear appear in the same document with either of the words tart or pie. |
| Krafft NEAR Ehricke | The operator NEAR ensures that both Krafft and Ehricke are within ten words of each other in any document resulting from the search. The NEAR operator is often useful in searching for names because of the possible different forms that a name can take, such as Krafft Ehricke; Ehricke, Krafft; and Krafft A. Ehricke. |
| vegetable AND (NOT broccoli) | When NOT appears in a position other than the beginning of a query, use AND to connect the NOT operator with the rest of the query. The syntax vegetable NOT broccoli (without the AND) returns a syntax error. vegetable OR NOT broccoli is valid syntax, but would probably return irrelevant results. |
Unlike simple searches, AV Search returns the results of an advanced search in no particular order unless you specify ranking rules. In most cases, it is helpful to filter the results of your search so that the most useful documents appear at the top of the list.
To rank results, enter words or phrases in the Ranking field, separated by spaces. The words or phrases you use to rank the results do not have to match the words or phrases you used in the query. For example, you can refine a search for COBOL AND programming by using advanced and experienced to rank the results.
Note:Ranking limits the search results to the top 200 documents. When all you are interested in is a count of results (for example, when you want a count of all Web pages that contain a link to your home page), do not specify any ranking criteria. For details about the factors that influence ranking, see Ranking the Results of a Simple Search.
You can restrict your search to a particular range of dates by entering dates in the Start Date and End Date fields in the advanced search interface. AV Search finds matches for the specified range based on the time that the document was last modified. The indexing software gets this information from the Web server on which the document is stored; the date might not be accurate.
Enter the date in the format dd/mmm/yy, where dd is the day of the month, mmm is an abbreviation for the name of the month, and yy is the last two digits of the year. Be sure to use the name of the month instead of a number, for example 09/jan/96.
Enter dates with four-digit years in the format dd/mmm/yyyy, where yyyy is the year, for example, 28/aug/2001.
If you omit the year, AV Search assumes the date is in the current year. If you omit both the year and the month and specify only numbers for days, AV Search interprets the query as belonging to the current month and year. For example, entering a Start date of 09/jan indicates that you want documents dated no earlier than 09 January of the current year. Entering a start date of 09 indicates that you want documents dated no earlier than the ninth day of the current month in the current year.
The years between 1970 and 2037 are internally encoded in the same format used in UNIX operating systems. The external representations of dates in the search interface can take either the four-digit or two-digit form.
When you use the four-digit form in the search interface, the value is interpreted as specified.
When you specify a two-digit year in the search interface, the value is evaluated as follows:
[Return to top of section] [Return to top of document]
The rules regarding phrasing, case sensitivity of query terms, and finding related words apply to both simple and advanced searches.
AV Search defines a word as any string of letters and digits that is separated by either white space (such as spaces, tabs, end-of-line characters, or document boundaries), or special characters and punctuation (such as %, $, /, #, and _).
For example, AV Search indexes HAL5000, 60258, www, http, and EasierSaidThanDone as single words. The software indexes all words that it finds in a Web document, regardless of whether the word exists in a dictionary or is spelled correctly.
AV Search ignores punctuation except to interpret it as a separator for words. Placing punctuation or special characters between each word, with no spaces, is another way to indicate a phrase. For example, consider searching for a telephone number. You can query for either 1-800-555-1212 or "1 800 555 1212". Hyphenated words, such as CD-ROM, are also treated as phrases.
The use of double quotation marks, rather than special characters between words, is recommended to indicate a phrase because some special characters have additional meaning:
A query string that is specified in all lowercase letters results in a case-insensitive search. For example, when you search for turkey, AV Search finds all occurrences of the word turkey, including those spelled TUrkey, TURKEY, turkey, and so on.
When the query string contains any uppercase letters, the search is case-sensitive. For example, when you search for Turkey, AV Search finds all occurrences of Turkey with initial capitalization only. It does not return documents containing the TUrkey, TURKEY, turkey, and so on.
AV Search supports exact-match searches for many characters in the languages supported by the search interface. You can enter a query string containing, for example, an accent or other diacritical mark, and AV Search finds only documents that contain the accented spelling of the word.
For example, if you search for the French word eléphant, AV Search finds only documents containing an exact match for the French spelling of the word. When you search for a word using mixed case and an accent (for example, Eléphant), AV Search produces results that match in terms of both case and accent.
AV Search maps special characters to the closest possible plain character or combination of characters. The software then indexes words in both forms: with special characters as they appear, and also with special characters replaced by the mappings. If your keyboard does not support the use of international characters, you can enter the query string without the diacritical marks. The following table lists the special characters and their mappings in AV Search:
| Character(s) | Mapping | Character(s) | Mapping |
|---|---|---|---|
| Æ | AE | æ | ae |
| Á Â À Å Ã Ä | A | á â à å ã ä | a |
| Ç | C | ç | c |
| Ð | D | ð | d |
| É Ê È Ë | E | é ê è ë | e |
| Í Î Ì Ï | I | í î ì ï | i |
| Ñ | N | ñ | n |
| Ó Ô Ò Ø Õ Ö | O | ó ô ò ø õ ö | o |
| Þ | TH | þ | th |
| Ú Û Ù Ü | U | ú û ù ü | u |
| Ý | Y | ý ÿ | y |
| ß | ss |
You can use the wildcard character (*) to search for a group of words that contain the same pattern. This is convenient for finding derivatives and spelling variants of a word.
For example, to look for the word sing and any derivatives (such as singer, singers, and singing), search for sing*. Searching for cantalo* produces matches for cantaloup, cantaloupe, and their plurals.
Ignored inte*: 4292323.
In this example, the message indicates that there are over four million instances in the index of words starting with inte.
[Return to top of section] [Return to top of document]
AV Search provides the following linguistics tools to help you correct or refine a query:
The linguistics tools analyze the query terms you enter, and offer alternative spellings, phrases, and synonyms in the form of links. To use any of the suggested query terms, click on the link for that term. The query string is modified to include the term you selected.
You can choose as many of the terms suggested by the linguistics tools as you want. When you are satisfied with the query string, click Search to submit the modified query.
Notes:
The Spell Check tool analyzes query terms and offers suggestions for alternative spellings. For example, if you search for evolutoin, the Spell Check tool responds as follows:
Spell Check: "evolutoin". Did you mean: evolution
The Spell Check tool also performs contextual spelling analysis. For example, if you enter skating rynk, Spell Check responds:
Spell Check: "skating rynk". Did you mean: skating rink
The Spell Check tool recognizes that the word "rynk" is used in the context of "skating", and offers skating rink (rather than rink, rank, or hockey rink).
The Phrase Detection tool surrounds with quotation marks any query terms that it recognizes as a phrase. For example, when you specify New York City, the Phrase Detection tool offers "New York City" as a possible query term. By searching for the phrase, instead of the individual words, documents that contain only part of the phrase (such as New Delhi or the City of Love) are omitted from the results.
Note: The Phrase Detection tool is disabled when the query string includes any of the following operators: +, -, " ", or :.
The Stemming tool expands a word stem into all words that can be formed from that stem. For example, when you enter skating, the Stemming tool offers the following additional terms that share the same stem: skated, skates, and All Stems.
[Return to top of section] [Return to top of document]
To query an index in which the content of a database has been indexed, you can use simple search syntax.
To search for contents of a field whose name contains a non-alphanumeric character (such as an underscore), omit the non-alphanumeric character from the field name in the query string and enter the next letter in the field name in upper case. For example, to query the field full_name for the value "John Smith", specify the following:
fullName:"John Smith"
Note that full_name is specified as fullName. The non-alphanumeric character is omitted and the character following it has been capitalized.
[Return to top of section] [Return to top of document]
When mail messages are included in an index, the message "envelope" information is indexed as the following fields:
When you use one or more fields to search mail folders for messages, you must enter the field names in lowercase letters. For example:
+from:david +attachment:FunctionalSpec.doc
+from:Valerie +"barbecue"
In an advanced search, you can specify a range of dates to restrict the search for mail to a particular period of time.
[Return to top of section] [Return to top of document]
Both the simple and advanced search interfaces allow you to restrict the results of your search to documents written in a particular natural language. You specify the language for search results by choosing from the Language pull-down menu.
When you select Any Language, which is the default, AV Search returns all documents that match your query, regardless of language. Depending on how your system administrator configured AV Search, not all supported languages may be available to you.
AV Search can gather pages and files from Japanese-, Korean-, and Chinese-language systems that use the following encoding standards:
| Language | Encoding standard |
|---|---|
| Japanese | SJIS, EUCJP |
| Chinese | BIG5, GB |
| Korean | EUCKR |
You must configure your browser to view URLs or documents containing multinational (non-ASCII) characters.
For more information, see the online help for your browser or visit one of the following pages:
To follow a link to a URL that uses the file:// protocol and that contains non-ASCII characters, the following must be true:
For example, if you choose Chinese GB through Internet Explorer as your query language, you must be running Simplified Chinese on your client system, and the appropriate Simplified Chinese browser fonts must be installed for use by Internet Explorer. Furthermore, the document you want to view must be located on a fileshare on a Simplified Chinese system. If any one of these conditions is not met, you will not be able to follow links on the results page to view documents.
Note: Some combinations of Asian-language character sets (for example, a user on a Korean system accessing a document on a Japanese system) might work.
[Return to top of section] [Return to top of document]
Both the simple and advanced search interfaces support the use of keywords to restrict your searches to Web pages that meet specific criteria regarding their structure or contents. Using keywords, you can search based on: a URL or portion of a URL; links; graphics; text; or HTML coding.
Using keywords, you can do useful things such as:
To search based on keywords, enter a query in the format keyword:search-criteria, where keyword is any of a list of special terms and search-criteria is the string or condition that you want to match.
Enter the keyword in lowercase, followed immediately by a colon. The conventions for specifying a phrase in the search criteria are the same as for specifying a phrase in a regular query; the most convenient method is to enclose a phrase in quotation marks.
Note that in the advanced search interface you can enter a logical expression (containing any combination of the AND, OR, NEAR and NOT operators) as the search criteria. For example, if you want to find a Web page whose title contains both the words spreadsheet and training, you could enter a query in the form title:(spreadsheet AND training).
For additional information on advanced search operators, see Using Advanced Searches.
The following table lists and describes the keywords that you can use to find Web pages:
| Keyword | Function |
|---|---|
| anchor:text | Finds pages that contain the specified word or phrase in the text of a hyperlink. |
| applet:class | Finds pages that contain a Java applet of the specified class. |
| domain:domain_name | Finds pages with the specified word or phrase in the domain name segment (for example, mycompany.com in the name myhost.mycompany.com) of the Web server where the page exists. |
| host:hostname | Finds pages with the specified word or phrase in the hostname segment (for example, myhost in the name myhost.mycompany.com) of the Web server where the page exists. |
| image:filename | Finds pages that contain the specified image file in an <img> tag specification. |
| link:URLtext | Finds pages that contain at least one link to a page with the specified text in its URL. |
| title:text | Finds pages that contain the specified word or phrase in the <title> tag. |
| url:text | Finds pages that contain the specified word or phrase in the URL. |
The url, host, and domain keywords all serve a similar purpose in that they search for URLs based on a specific portion of the URL itself, or on the hostname or domain name where the Web page exists.
The link and anchor keywords are similar in that they both look for information in links. The link keyword looks for text in a URL that is the target of a link (for example, http://www.abc.org/help.html), whereas the anchor keyword looks for the actual text of a hyperlink as users would see it on a Web page (for example, click here).
The title keyword restricts the search to text that the document's author coded as part of the <title> tag.
In addition, your system administrator can configure AV Search to recognize additional attributes of documents that have user-defined HTML META tags.
| Term | Definition |
|---|---|
| url:http://hostname.xyzagency.org/volunteer | Finds all pages with the words http://hostname.xyzagency.org/volunteer/ in the URL (the result is a listing of pages volunteer opportunities in the XYZagency organization). |
| host:hostname.xyzagency | Matches pages with hostname.xyzagency in the hostname of the Web server. |
| domain:org | Matches pages with the domain name org in the hostname of the Web server. |
| image:demo_screens.jpg | Matches pages that contain an <img> tag with a reference to demo_screens.jpg. |
| anchor:"click here" | Matches pages with the phrase click here in the text of a hyperlink or other anchor (<A>) tag. |
| link:http://www.abc.org/mypage.html | Matches pages that contain at least one link to a page with the URL http://www.abc.org/mypage.html. |
| link:http://myhost.abc.org/mypage.html -host:myhost.abc.org | Finds only external pages containing links to the specified URL (the - operator eliminates pages on the same Web server as the page of interest). |
| title:"The Wall Street Journal" | Matches pages with the phrase The Wall Street Journal in the text of the <title> tag. |
| applet:NervousText | Matches pages containing the Java applet class named NervousText. |
[Return to top of section] [Return to top of document]
This section provides answers to common questions that arise with using AV Search.
The AV Search index is a dynamic, organized collection of the contents of various types of documents (Web pages, files, mail messages, and database records) gathered from various sources (the Internet, public mail folders, corporate databases). The types and quantity of documents that are indexed depends on how the system administrator configures the software that retrieves these documents.
When you use the search interface to submit a query to the AV Search index, the entire index is examined to find the names and locations of the documents that contain the words or phrases you specify. This information is then returned to you as a list of hyperlinks to the physical documents.
A word is a combination of letters and numbers. You can separate words using spaces or tabs.
A phrase is a series of words that is treated as a single query term. You can group letters and numbers together into a single phrase if you want to find that specific combination of characters in the documents returned to you as search results. If you want to find an exact phrase, place quotation marks ("like this") around the phrase in the Search for field.
For example, to find information about the first moon landing, you can specify the phrase "Apollo 11 mission".
You can use punctuation or special characters such as dashes, underscores, commas, slashes, or dots to create phrases. For example, too search for a telephone number, use 1-800-555-1212 instead of 1 800 555 1212. The dashes group the numbers into a phrase.
Using the Language pull-down menu in the search interface, you can find all the documents about a given topic that are written in a specific language only.
For example, if you specify Italian as the display language and search for Roma, the results include only those documents written in Italian that contain the word Roma.
You can also specify the language in which to view the search interface and on-line help; specifying a language in which to work does not affect your ability to search for documents written in other languages.
When you specify a query term using all lowercase letters, AV Search finds all documents containing this term, regardless of case. When you enter a query term using any uppercase letters, only documents containing exact matches are returned.
For example, when you search for paris, you'll find Paris, paris, and PARIS in the results. However, when you search for Paris, you'll only see Paris in the results.
To make sure that a specific word is always included in any document returned in the results, place a plus sign (+) before that word in the Search for field. To make sure that a specific word is always excluded from returned documents, place a minus sign (-) before that word.
For example, to find recipes for cookies with oatmeal but without raisins, enter the query recipe cookie +oatmeal -raisin.
You can expand search results by using the wildcard character (*). When you add * to the end of a query term, AV Search queries the index for all variants of the word.
For example, the query term wish* finds documents containing wish, wishes, wishful, and wishbone.
Use advanced search to refine the search results (when a simple search returns too many documents), when you want to rank the results, or when you want a count of the results.
Try to refine your query by specifying more words or more precise words. Also, for multiple-word queries, be sure to use the appropriate syntax.
To indicate a phrase, enclose multiple words in quotation marks. Phrasing ensures that AV Search looks for occurrences of the phrase as a unit, rather than as individual occurrences of each word in the phrase. For example, a search for "Virgin Islands" is more useful than the same query without the quotation marks, which would produce documents containing the word Virgin and the word Islands, as well as documents containing Virgin Islands.
It is likely that the query term occurs too frequently relative to the total number of words in the index. AV Search reports the number of occurrences of the term in the index following the label Ignored:. When a word occurs more frequently than a certain percentage of the total number of indexed words, AV Search considers the word "noise" word and consequently ignores the word as a query term.
By default, the frequency threshold is five percent. For example, in an index wherein 100,000 words are indexed, a word that occurs more than 5000 times is considered too common to be useful, and is ignored when specified in a simple query (or as ranking criteria in an advanced query).
One solution is to enter a more specific query. For example, suppose a query for the word plan was ignored. Refine the query by using a simple query such as abc +project +plan.
When you want to see results from a query that is ignored in a simple search, use the advanced search interface. In the Selection Criteria field, specify the same query terms and, in the Results Ranking Criteria field, specify the same terms except those that were ignored by the simple search.
[Return to top of section] [Return to top of document]
Copyright © 2001 The AltaVista Company