
How search engines and library catalogs work
You know how to use a search engine. Decide what keywords you want to search and type them into the search box. Then see if the results returned the information you expected.
Do you know how to use a library catalog? Even though you will probably see a single search box like a search engine’s, if you expect it to work the same way you will be frustrated.
That single search box is not the only way to search the catalog. It’s not even the best way.
If you see a link to “advanced search,” click on it. Once you understand the difference between a search engine and a catalog, “advanced search” will return better results with a lot less trouble.
Search engines
How does a search engine work? When you type your search, the search engine’s automated procedures have already performed two important tasks.
First, a “spider” has “crawled” the web. It has looked at every page on every server in the world (trillions of pages), unless the owner has forbidden access. The spider crawls a hundred billion times or so every month to see what’s new and if anything on a site has changed.
The spider determines the content of the page in preparation for the next important task. It keeps track of more kinds of information about the content than is worth trying to describe. It needs to assess what the page is about, what kinds of questions it can answer, and how much authority it has, among many other things.
Second, the system takes all of this information and indexes it. The entire index requires vast “server farms” to house enough computers to contain it all.
The index is actually a vast database. It organizes the data using various fields like any other database. But you’ll never find how they are defined, what form the data in them takes, or what process the system uses to retrieve information in response to your query.
The top of the results page brags about how many thousands (or billions) of search results returned in probably about half a second. No one goes through that many. Most people choose whatever is at the top of the first page.
How does the search engine know what the best result is for your question?
It doesn’t. It can’t.
If you need anything more serious than simple fact-checking, you might find information more suitable to your needs on the fifth page of results, or the tenth, or even later.
Basically, though, the search engine returns a gazillion results, and usually no one looks past the first ten. Webmasters therefore compete vigorously to get on that first page, using a process called search engine optimization (SEO).
Many SEO “experts” try to game the system with keyword stuffing, link building, and other techniques. The team that operates the search engine tries to find the most questionable techniques and tweak the system so they won’t work any more.
Online library catalogs
You may remember card catalogs. Before that librarians issued catalogs in book form. Now the entire catalog is online.
Whatever form a library catalog takes, each entry includes a bibliographic description and subject analysis. A person wrote the description and performed the subject analysis. Each entry in a library catalog is the fruit of someone’s personal examination of an item and whatever research correctly describing it requires.
The bibliographic description includes who created the item (author, for example), who else had a hand in creating it (performer, editor, publisher), what it’s called (the main title and perhaps alternative forms of the title), and when and where it was created). Subject analysis includes both subject headings and the classification (which you may know as call number.)
A library catalog is likewise a database. Unlike the search engine’s database, you know the names of some of the fields. That single search box requires you to enter keywords, just like the one in the search engine. But advanced search enables you to search only the title fields, or only the author fields, or only the subject fields.
But how do you know who the author is, what the title is, or what the subject headings are?
Controlled vocabulary (authors)
You might think that author and title, at least, are fairly obvious. They’re what’s on the title page of a book, for example. But in a world where dozens of people spread over several centuries have the same name, it helps to be able to tell John Adams the second President of the US from John Adams the contemporary composer from anyone else named John Adams.
In a world where publishers use different forms of the same name (John Robert Adams, John R. Adams, J. R. Adams, etc.) and people occasionally change their names, it helps to be able to put together everything they had a hand in, no matter how many forms a single person’s name may take.
In a world where literature, musical compositions, etc. are translated from one language to others and there might be more than one English name for something that originally appeared in German, identifying the title is no more straightforward than identifying authors.
Subject headings present similar difficulties.
So a cataloging record will contain two different kinds of information. There are descriptive fields that transcribe exactly what it says on the item. Then there is the so-called “controlled vocabulary” where the catalog refers to every person by one and one only form of his or her name.
Controlled vocabulary requires the creation of name authority records and subject authority records that the public never sees, but that the online catalog uses to keeps most of the John Adams’s separate using middle names, middle initials, dates of birth and death, and other unique information.
Or it does in principle, anyway. There might be cases where it is impossible to distinguish one John Adams from another and a few of them have to share an authority record until some librarian learns how to straighten them out.
People change their names for various reasons. Library catalogs always use the most recent form of the name. Jacqueline Bouvier became Jacqueline Kennedy, and then Jacqueline Onasis. Whichever form you search, everything by or about her is filed under “Onasis.”
Controlled vocabulary (titles)
The name authority file contains name/title records, because titles can take different forms, too. Consider, for example a book titled Milestones: Memoirs, 1927–1977 by Joseph Ratzinger, a German theologian and prolific author. He wrote in German, and his original title for this book is Aus meinem Leben: Erinnerungen (1927–1977). He later became Pope Benedict XVI.
When you look up Ratzinger’s name in the catalog, the catalog will look in the authority record and find that the correct heading is “Benedict XVI, Pope, 1927-” and take you there.
The bibliographic description includes both the English and German titles. If you search the English title, the catalog will look in the authority record, find that the authorized form of the title is “Aus meinem Leben: Erinnerungen (1927–1977). English.” and take you to the correct record for the book.
Try to do that with a search engine!
Controlled vocabulary (subjects)
There is only one name authority file for American libraries. The Library of Congress maintains it. There are numerous subject authority files, including Library of Congress Subject Headings (used in most academic libraries), Sears Subject Headings (used in most public libraries), Medical Subject Headings (maintained by the National Library of Medicine and used in medical libraries), and more.
Back where you can’t see it, the cataloging record you find identifies which subject authority file it uses. Like the name authority file, subject authority records contain cross-references so that you can find the subject heading you want even if you don’t quite get it right. Provided, that is, that whoever wrote the authority record anticipated whatever it is you search.
Finding what you want in the catalog
If you know the author and/or title you want, go to Advanced Search and fill in the author and/or title search box. In any of the resulting record the prescribed form of the author’s name (and any other personal names associated with the item) will be a hot link. Click on it, and you will get a list of everything by or about that person that the library owns.
You will also notice one or more subject headings, which are also hotlinked. Subject headings often have one or more sub-headings. For example:
Cancer — Alternative treatment — Research grants — United States.
If you see the subheadings displayed as individual links, you can click on “United States” and find everything else the library owns with that exact string of terms.
If you click on “Research grants” you will find all the records with the first three parts of the heading, including any that end with another country. You can find any books etc. about Canadian or French research grants, for example.
Clicking on “Alternative treatment” shows you all the records with the first two parts of the heading, with or without additional subheadings.
Unfortunately, the companies that sell cataloging software to libraries often do not provide a way to search each element of the subject string separately. Someone in the library’s IT department has to massage the programming to enable it. If you see the entire string as one link, you can’t search for anything less until you enter it manually in subject search.
If you are looking for information about a topic and don’t know the author or title of any of the information you need, just enter keywords in the single search box.
Most databases don’t work quite like Google. If you want to search with more than one word or phrase, you need to use Boolean operators (and, or, not). If you want to use an author’s name and a word from the title, Google automatically supplies the Boolean “and” between them. You have to type it yourself in most if not all online library catalogs and databases where you find newspaper, magazine, or journal articles. http://bit.ly/LIyYUl
Once you have entered your search, look at everything in the results that looks even remotely useful. Even something you obviously can’t use will probably have useful subject links.
Google is great, but it doesn’t have everything. For example, you can’t find any information on Google that isn’t online. Sometimes you can’t do without information printed on paper. You need a library catalog in order to find what you need. It’s online, so you don’t even have to leave the house to search it.
This article first appeared on Reading, Writing, Research. Check it out!