USC Annenberg Online Journalism ReviewUSC

Sections
Article Archive
Readers' Blog
Wikis
Ethics
Events Calendar
Making Money
Reporting
Video
Writing
Resources
Register
About OJR
Privacy Policy
OJR Delivered
OJR by E-mail
RSS Article Feed
RSS Blog Feed
Search




Better Internet Search Engines (Part 2) : Traditional Agents
Search Techniques

In my last column I looked at the newer, 'intelligent' search engines that have significantly improved Internet searches.

But the 'old timers' among Web search engines -- AltaVista, Excite, HotBot, Yahoo, etc. -- can still be useful if you know which one is best suited for a particular kind of search.

Some search engines are good at finding information on a broad topic in the news, for example, while others are best at tracking down the answer to a very specific question.

That?s mainly because the search engines vary widely in how much of the Web each tries to catalog in its index of Web pages. The index is usually put together by a software program called a spider, robot or crawler, that travels the Internet compiling a record of the pages it encounters and the words on each page.

Some search engines catalog 150 to 200 million Web pages (out of a total of at least 800 million Web pages, according to a recent estimate by NEC Research Institute scientists).

These more comprehensive search engines include AltaVista, FAST, Northern Light, and Excite.

Other search engines are far more selective in what they index, such as HotBot and Lycos, which each index less than 50 million pages.

For more specific numbers on each search engine, check Search Engine Watch?s size comparison chart and Search Engine Showdown?s statistical analysis.

At the 'low' end of the spectrum is Yahoo, which relies not on a spider program but on human editors to catalog the Web. Yahoo has only about 500,000 sites in its directory.

Choosing a Search Engine

So which search engine is best suited for a particular search?

Here are some basic rules.

Use a more selective search engine, such as Yahoo, if: You?re looking for the home page of a particular government agency, private organization or company. A selective search engine is likely to yield a short list of Web sites, including the home page for the agency (rather than a long and confusing list of inside pages at the organization?s site or dozens of private Web pages created by people critical of an agency). HotBot is particularly good at putting the home page of an organization or agency at the top of its list of retrieved sites. You?re researching a general topic that has been in the news a lot, such as affirmative action, abortion or presidential politics. A selective search engine will eliminate a lot of personal home pages that are just rants on a controversial topic. If your initial search fails to retrieve what you want, then you should turn to a more comprehensive search engine. Yahoo gives you get the best of both worlds. If your search of the Yahoo directory comes up empty, Yahoo automatically taps a search engine called Inktomi -- which indexes a large portion of the Internet -- and return the results of that more comprehensive search.

On the other hand, start off with one of the more comprehensive search engines if: You are researching a very narrowly defined or obscure subject. The more selective search engines are apt to have very little if anything on your topic, and it?s probably more efficient to begin with a broader search engine. You?re trying to get an answer to a very specific question. Again, the less selective search engines will probably come up empty, while the more comprehensive ones are more likely to have indexed a page deep inside a Web site that has the answer to your particular question. You?re just trying to gather a wide variety of information on a topic, and not looking for the single best site. If one of the more comprehensive search engines still doesn?t yield the desired results, another option is to try a 'meta' search engine. That?s a Web site that allows you to type in some keywords and have them run simultaneously through a number of search engines.

Among the best of the meta search engines are MetaCrawler and SavvySearch. CNET also publishes a 'Search Engine Shoot-Out' that lists the major meta search engines and evaluates each of them.

Choosing Your Search Terms

The other key to improving your search results is knowing which words you should type into the search box.

Avoid words with double meanings (such as 'bill,' which could refer to legislation, currency, ducks or that rich guy at Microsoft) or words that are so common they are likely to appear on almost any Internet document (such as the word 'Web').

Instead try to think of words you would use if you were writing a page of information like the one you are seeking.

Thus if you were seeking background information on a person, include a word like 'biography' along with the person?s name in your search. Or if you want data on an issue, include search terms like 'research,' 'study,' 'statistics,' 'percentage,' -- words likely to appear in a statistical analysis of a subject.

Boolean Connectors

Most search engines also let you use the equivalent of Boolean connectors -- 'AND, OR, NOT' -- to refine your search.

Thus you include the word 'AND' between your search terms to narrow your search to Web pages on which all your search words appear.

You also can limit your search to pages on which one word but not another appears by separating the words with 'AND NOT.' This is especially useful if one of your search words has a double meaning. Thus a search for 'bill AND NOT dollar AND NOT gates AND NOT ducks' would eliminate sites on dollar bills, Bill Gates and ducks, and instead call up more sites having to do with legislation.

Alternatively, you can broaden your search by connecting your words with 'OR' to retrieve pages on which any of your words appear. This is helpful if your search word has a common synonym -- such as 'homicide OR murder.'

Finally, you can search for pages on which only an exact phrase appears. At most search engines you do this by putting your phrase in quotation marks.

For a good primer on Boolean connectors and other special search terms, try the Northwestern University Library page on search engines. And for a more detailed explanation of Boolean logic, go to the University at Albany?s library site.

Unfortunately, each search engine deals with these Boolean connectors slightly differently. At some you can just type in the words 'AND,' 'OR,' 'AND NOT' between your search terms. But most require that you type in a plus sign (+) for AND, and a minus sign (-) for NOT.

The search engines usually have little 'Help' links on their main pages that explain how they deal with Boolean connectors. Or you can go to Search Engine Showdown?s reviews of the major search engines, which detail the Boolean search features of each.

Another great resource is 'Search Engine Math,' at the Search Engine Watch Web site. This article provides a simple explanation of how to use the same plus and minus sign Boolean connectors at virtually all of the major search engines.

So how should you apply these Boolean connectors to a search?

Generally it?s a good strategy to start out with as detailed a search string as possible -- one that uses a number of specific words that are connected by AND or are grouped together as a phrase. Being as specific as possible will eliminate Web pages that are only peripherally related to your topic.

If that search doesn?t yield results, then try broadening it to pull in more Web pages. Thus if you searched for a phrase, try instead connecting the words with 'AND' to get pages with all those words, but in any order. Similarly, if you used a very uncommon word in your search, eliminate it from your string of search words or use the 'OR' connector with it.

One final tip -- try to get away from the search engine as quickly as possible during a search.

By that I mean if your search turns up a Web site that doesn?t exactly have the information you?re seeking but is pretty close, look on that site for a list of links to other Web pages on the same topic.

People who create Web sites often include a 'resources' section that lists other Web pages they think are particularly valuable or informative on a topic. Following those links -- which have been updated by human beings -- will usually get you to the information you want a lot faster than a computer driven search engine.

Curious about what others are searching for on the Internet?

Some search engines allow you to spy on those searches.

You can view the strings of keywords that people are typing into search engines in the pursuit of truth (or more often pornography). The identities of those doing the searches aren?t revealed, but the information they?re seeking is fascinating and often hilarious.

Excite has this feature -- called Search Voyeur -- and MetaCrawler has Metaspy.

AskJeeves displays people?s search queries in a box on its home page.

In my next column I'll look at some of the other special features that various search engines are offering to make your searches easier and more focused.

 

News briefs from around the world give you the latest developments that affect online journalism.
NEC Research Institute scientists
AltaVista
FAST
Northern Light
Excite
HotBot
Lycos
size comparison chart
statistical analysis
Yahoo
Inktomi
MetaCrawler
SavvySearch
'Search Engine Shoot-Out'
Northwestern University Library page
University at Albany?s library site
reviews
'Search Engine Math,'
Search Voyeur
Metaspy
home page