AMINO
A Stateful, Daemonless Web Interface

Karl Geiger
Amgen Libraries

BRS North American User Group Conference
15 November 1996


Today I'm going to describe our stateful, connectionless, daemonless web interface to BRS databases works. We have been using our an application at the Amgen Library in Thousand Oaks, California, since February 1996, and are about to bring out the next revision. A stateful web interface to BRS database services has been highly desired

For those of you who don't know us, we're Amgen, Inc. We make biopharmaceuticals using recombinant DNA technologies. Amgen's two flagship products are Epogen(tm) (epoetin alfa), and Neupogen(tm) (filgrastim). Epogen alleviates anemia by stimulating bone-marrow production of red blood cells. Neupogen stimulates white blood cell production which helps prevent bacterial infections in cancer chemotherapy, HIV/AIDS, and neutropenia patients.

Ten scientists and four venture capitalists founded the company in 1980. Last year's sales were about $US 2 billion, and the company has about 4,200 full--time employees world-wide. Besides the Thousand Oaks headquarters, Amgen has additional labs and production facilities in Boulder, Colorado, and offices in Louisville, Kentucky, and Europe, Japan, Australia, and China. If you want to know more, visit our web site at http://www.amgen.com.

Amgen Libraries serve the company by delivering information over our TCP/IP network. We serve the company's staff world-wide through our intranet to a variety desktop computing environments: Apple Macintosh, Windows 3.1, Windows 95, and Unix workstations. Although most people use Macintosh computers, the current desktop strategy is to migrate to Windows 95 backed up by Windows NT servers. Research scientists use powerful Unix workstations, and the central, host-based computing resources tend to run on heavy-duty Unix servers. To manage development costs in this environment the Library delivers information via web-based client-server applications and the one "universal" client, Netscape. We call our software system AMINO: AMgen INformation Online.

The information resources we distribute through AMINO are medical and scientific research databases (MEDLINE, Current Contents, EMBASE), news and corporate information databases (BioWorld Today, BioScan), our library catalog via Sirsi Webcat, and our product databases. Our product databases contain article abstracts from medical journals; we have reindexed these materials and linked scanned images of the source documents. Researchers have only to click an image icon to see the full article, delivered as an Adobe Acrobat PDF file.

AMINO also links its research databases to a document delivery system. After finding and selecting an article, the researcher can places it into the order system. When implemented in March 1996, this feature increased the number of document delivery requests by up to fifty percent.

The AMINO interface does its work as a connectionless, stateful, and daemonless web interface. What do these terms mean? Connectionless means the client (Netscape on the user's workstation) and the server (NCSA 1.5 or Netscape Enterprise Server v2) establish no permanent network link, a result of using HyperText Transport Protocol (HTTP), the lingua franca of the web. BRS NetAnswer is also connectionless because it too uses HTTP. In contrast, a BRS Client/Server application program connects to the server computer and keeps the connection open during the entire user session.

AMINO is stateful because it keeps track of what was done. Unlike NetAnswer, the end user can build a search history and issue commands that use back-references, for example, to AND two previous search statements together.

Finally, AMINO is daemonless, that is, no extra server program is running to manage or track search state on the host computer. A daemon is a server program that hangs around and waits for people or other programs to ask for something. A common example is the web server program. If you recall Paul O'Fallon's presentation from the 1995 BRS North American User Group conference, his BRS web server implementation at Georgia Tech relies on a daemon to keep track of which BRS/Search subprocesses belong to which Netscape clients. No "AMINO" process persists on the server computer; the interface uses Common Gateway Interface (CGI) programs to process transactions and communicate with brsnetd, the BRS network interface.

How does AMINO retain state despite a connectionless protocol such as HTTP? Since the client and server processes "hang up the phone" after processing a transaction, the BRS/Search program resets itself, losing the contents of the back references (search stack), root blocks, etc. If you have used BRS/Search you know you can save your session, but recovering the saved search information means re-executing the search strategy every time a web client sends a new transaction. For large databases containing millions of documents with complex search structures such as MEDLINE, the overhead of re-running the saved search strategy is too high, resulting in response times of minutes. Moreover, as the user's search history grows, so does the response time.

Happily BRS/Search has the "off-continue" feature. In Native mode the command is "..oc". Off-continue tells the engine to save its current state in a temporary, binary file before closing the database and terminating. This file's path is "/Search/Config/Files/.tmp". When the user reconnects the engine reports that a restart is possible and asks whether to restart the session. If the answer is yes, BRS reloads the session's state from the temporary file, and the session proceeds as if uninterrupted.

Saving state in the temporary file works fine, but anonymous users cannot use it. Anonymous sessions erase all state at sign-off time, regardless. Erasing state makes sense for anonymous users. If an anonymous BRS user session connected and reestablished a previous session's search state, the end-user's search results may be strange or irrelevant, and the privacy of individuals' work violated. BRS requires a real user ID in order to save state with off-continue.

We wish to avoid assigned user IDs before people can use the AMINO web interface, however. User IDs are necessary to use the BRS/Menus (MNS) terminal interface. Managing the user tables, disk files, and configuration information was costly and time-consuming. When people joined or left the company, someone had to go into the BRS system to setup or delete the user ID. To get around this problem, the AMINO web interface uses a pool of 100 pre-allocated user IDs. When a web client sends a transaction, the server CGI programs allocate one of the user IDs from this pool. The web client owns this user ID for the duration of the session; BRS stores the "off-continue" state in the user ID's temporary file.

The web client must keep track of which session it owns. It does so with a cookie. A "cookie" is a unique token. AMINO's cookie is the assigned BRS user ID. When the web client connects to the server it presents the cookie as part of the search transaction. At search start-up, the user logs with the ID present in the cookie and the BRS engine reads the temporary file to reestablish search state.

The system must guard against running out of cookies. There's only a limited amount of system resource and hence, user IDs, so AMINO must recover user IDs (cookies) from those sessions that are idle. Under the connectionless scheme afforded by HTTP, "idle" means "no transaction from this user" for a certain period of time. AMINO times-out sessions after an hour.

To manage the cookies, the host programs read and write a list of allocated cookies stored in a file. This file lists all the user IDs that the system has handed out as cookies along with time stamps for when they were allocated and last used. Because two transactions may come in at the same time, the file is locked and transaction processing programs enqueue on the lock before reading and writing the file. The lock file is very small, usually less than 512 bytes, and transaction rate is not high (typically we have fifteen to twenty sessions and process only a handful of transactions each minute), so lock file access has not proven a performance chokepoint.

When the server CGI programs receive a transaction they check for a cookie. If no cookie accompanies the transaction, the programs allocate an unused user ID from the pool, mark the user ID as "in use" in the cookie file, process the transaction, and finally return the new cookie with the transaction results. If the client sends a cookie along with the transaction, the programs check the cookie to insure it's still valid (hasn't timed out or expired), update the cookie's time stamp (so it won't time out), and return the updated cookie with the transaction results.

The production AMINO system stores the user cookie in a TYPE=HIDDEN field in HTML forms. In the revision under development, the cookie is kept as a true Netscape Persistent Client State HTTP Cookie, a feature supported by Netscape 2.x clients and Netscape servers. The advantage of using a Netscape Persistent Client State Cookie is that the cookie times itself out; the client never sends expired cookies. When the server programs receive the transaction they needn't check or update the time stamp. They merely allocate a new user ID and cookie, reset the engine, and proceed as if the session were a new connection (which it is). The session management code executes as part of the server CGI programs' initialization. For the most part, the cookie code returns only two pieces of information to the calling CGI module: the new cookie, and whether the session is new or a restart. The server program then decides whether to complete the transaction or issue a time-out warning message and reset the user's session. Of course the server side programs do much more; they translate transactions into BRS engine requests, check error returns, parse output, and generate HTML, and so forth.

That's about all there is to it. The scheme is simple and doesn't require any fancy programming. It brings together four pieces: the engine's ability to save state in a temporary file, a pool of non-anonymous BRS user IDs to preserve the temporary files, an identifying "cookie" to track the client session, and a server cookie management scheme based on Unix file locking. The BRS engine does most of the work in saving and restoring the search session's state. All we need to do is keep track of who's using what, when; the rest of the search system takes care of itself.