Reflections on the Beginnings of Dialog
The Birth of Online Information Accessby Roger Summit
In 1964 an event occurred that would alter computing and information retrieval forever — a third-generation computer was introduced by IBM, the IBM 360 series. Third-generation computers were the first computers that combined mass random access disks, CRT terminals, and telecommunications and as such, ushered in interactive computing. What this meant for information retrieval was that massive databases could be stored centrally and access could be offered worldwide. The idea of services to a global marketplace from an efficient, centralized computer facility was unheard of at the time but was exciting beyond belief.
In the early 1960s there were several information retrieval and SDI systems in place within IBM, government agencies, and various universities, but they were running on second-generation, batch technology. Queries were generally fed into the computer on punched cards and matched against document citations processed from tape. If a hit occurred, the result was printed out on paper.
Such was the environment and the opportunity we faced with the announcement of third-generation computing technology. As one of the earliest explorers in the online world (in fact, some have over-graciously credited me with creating it), I would like to describe the gestation and birth of Dialog, and how we leapfrogged over early information retrieval services by anticipating the application of new technology to this discipline. That Dialog is a service that has continued through two wars, six U.S. presidencies, an end to the Cold War, and the creation of the European Union attests to the soundness of the concept and the endeavor of a talented staff of employees.
The Formative Years
While a doctoral candidate at Stanford in 1960, I took a summer job at Lockheed Missiles and Space Company working for the director of information processing, E. K. Fisher. One of my first assignments was to investigate the use of the computer in information retrieval. A common statement around Lockheed at the time was that it is usually easier, cheaper, and faster to redo scientific research than to determine whether it has been done previously. It appeared possible that computer-based information retrieval had a chance to change the way scientific research would be managed in the future.A programmer named Peggy Don and I began some test programs to experiment with the application of computers to information retrieval (IR). It occurred to me that we should be able to simply parse the plain text statement of a query and match those words against a database of textual citations, identify the relevant items and then sort them according to word hit frequency (an idea that seems to have caught on with Internet search engine designers as well). The results of this process were disappointing. One of the main issues I recall had to do with the mystery of how to modify the query to obtain better results. Because the search and relevance algorithms within the search engine are unknown to the user, how to modify the query to improve the results is not apparent. We referred to this as black box searching and abandoned further work along these lines.
IBM and technical organizations of the day hosted conferences that I attended where man-machine interaction and IR were discussed. In addition, those of us interested in library applications formed a multi-company working group to share ideas. We met one particularly influential person at these discussions, H. Peter Luhn, who in the early 1960s invented and introduced Keyword In Context (KWIC) indexing and Selective Dissemination of Information (SDI) at IBM. Dr. Luhn had an extraordinary vision for the future of computers and information retrieval. SDI has been rediscovered by dot-commers as "push technology."
These were very stimulating times, and we were excited about developing information retrieval applications with this technology. It was during these meetings that I first got the idea of using computers to access technical literature on a global scale. Often a meeting can be as thought provoking as it is informative. Thus it was that in 1964 a colleague and I proposed that Lockheed establish a laboratory to explore application of this exciting new technology to information retrieval.
Herschel Brown, Lockheed's Executive Vice President at the time, supported our proposal. He had seen what was called the Red Book, a feasibility study for automation of the Library of Congress. It is fortunate for Dialog that Herschel was a visionary executive. He felt that information retrieval was a project consistent with Lockheed's innovational approach to new technology, and he approved the establishment of the laboratory.
The computer had internal memory of 32 kilobytes (note: kilo) and ran at an approximate 1.5 microsecond cycle time. Of course, none of us knew anything about programming this new technology but that's where IBM came in. It was said that a manager would never be embarrassed by going with IBM which they certainly proved to us. Their customer support was superb and served as a model for later Dialog customer services policies.
Several projects developed in the Information Systems Laboratory (as it was called) included work on speech recognition, automated flight planning, pattern recognition, language translation, an automatic bridge-playing project, and information retrieval. I was asked to head the information retrieval project.
There was enormous competition for computer time and it was rationed among the projects. The appeal of our project and the excitement among the group as we began to work in this area were intense. Not only did we envision that we could leapfrog second generation systems, it also seemed possible that with real-time programming, telecommunications and massive random-access memory, we could make information retrieval a human-machine process, command a worldwide market for our services, and provide access to massive amounts of the world's knowledge. We literally believed we could change the face of research and computing, and we had the skills and vision to do so.
Our main impetus, however, was to create a technology as opposed to a business. Nonetheless, we knew we were dealing with an intellectual tool akin to books and literacy (the benefits of literacy are lost if needed information cannot be found). So in the back of our minds was the thought that with this technology we might be able to substantially enhance the utilization of knowledge.
The project team consisted of six people:
Dexter Shultz — file loading software and operations
Jim Brick — telecommunications (with consultation from Len Fick)
Ken Lew — master applications programmer
Bob Mitchell — systems programmer
Ed Estes — system architect
With this group we set about designing the file structure and programming the system that was to become Dialog. Because of computer resource limitations, all coding had to be at machine language or assembler level.
Systems design. The original system design priorities we developed were as follows:
Two unique features of the design were search recursion and index word display.
Recursion. In my view of an interactive system, information retrieval should be thought of as a process, not as a probe (as is the case with batch systems). With the exception of simple, explicit searches, the searcher is neither completely aware of what is contained in the database, nor confident of just which words to use in the query to elicit a desired response. Because of this, there needs to be a high degree of interaction between the searcher and the database to gain the desired outcome.
Designing a system where searching is a process requires the saving of intermediate results of queries and allowing the use of these results as elements of subsequent queries. A particular query thus defines a concept within the search space, just as does a word or phrase, and that concept can be used for subsequent query formulations. With each query, the searcher is taught something about how close to finding relevant information they are, thus allowing for a new form of learning about how to find better information, while in fact finding it.
Recursion, embodying feedback and modification, is a powerful process. In the search process, recursion with feedback allows the user to modify the query during the process of the search based on feedback from the database. Moreover, recursion allows one to mentally break up a complex task into a series of connected simple tasks to obtain a desired result.
A good example of the power of recursion with feedback is the difference between a guided missile and a ballistic missile. A ballistic missile will only hit its target if all the variables affecting its flight course are known at the outset, whereas a guided missile can adapt its flight pattern to unknown environmental factors during the course of flight. Thus, during the course of a flight of a guided missile, the missile has the intelligence to adapt. During the course of an interactive search, there is also the very important process of adaptation.
I recall great debate among the design group with regard to saving intermediate sets. Computer memory was very expensive and saving intermediate results could mean a lower user capacity. Thus, there was a very real tradeoff between the number of simultaneous users that could be accommodated online and the amount of memory that could be devoted to a single user.
Index display. The idea of providing for the display of searchable terms came to me from a visit to one of the Stanford libraries. In utilizing the card catalog, I was totally frustrated trying to guess under what classification category my topic of interest might fall. After opening drawer after drawer and pawing through entry after entry, I approached one of the librarians to ask if there was a listing of the subject entries. I was told there was not and this would also be difficult because of frequent changes and additions.
When it came to designing Dialog, one of the early requirements became that of allowing the searcher to display an alphabetic list of searchable terms near a desired term. We also included with each displayed term the number of items in the database containing that term, and if there were a thesaurus associated with the database, the number of thesaurus entries associated with the term (which could themselves be displayed with posting counts). In effect, we tried to find everything relevant to the term being searched and feed that information back to the searcher in a usable form.
Index display is particularly useful in examining corporate names and personal names as these are often entered in a database in a great variety of forms and spellings. We called this command, Expand. To my knowledge, this feature is offered by very few Web or commercial search engines of today.
By 1965, the team had developed a small, working prototype of Dialog incorporating the design priorities into the following simple commands:
Searching Dialog is as simple in concept as remembering: B E S T.
The NASA Experience
Because we were being supported with Lockheed independent research funds, a highly sought-after, scarce and fickle resource, I knew that we had to move through proof-of-concept and into externally supported work rapidly if we were to survive.
There was an ideal database to test our proof-of-concept and that was the NASA Scientific and Technical Aerospace Reports (STAR) database. Not only was it the largest database around (200,000 citations), but access to it was in great demand. Though NASA was running searches against STAR on a batch, IBM 1401 computer, I knew we could surpass this effort with Dialog if given the chance. Mel Day of NASA was the key figure in this regard. He, along with Mortimer Taub of Documentation Inc., developed software to store the NASA STAR citations as a database. The announcement bulletin and catalog were printed directly from the database, which in turn was used for searching. This was an accomplishment in its own right in that it was one of the first systems in which a database was used for multiple purposes — printing and searching.
I arranged a meeting with Mel Day in Washington D.C. in 1965. During the meeting, Mel responded to my description of the utility of Dialog by explaining that he had a dozen or so people a week describing systems that could do most anything short of reading your mind. He said he had to see it in operation to believe its effectiveness. After further discussion I offered to submit an unsolicited proposal to install Dialog on the NASA database and conduct an evaluation of the approach at the Ames Research Center in Mt. View, California. He responded by issuing a request for proposal (RFP) in April of 1965 incorporating the features we had discussed. We submitted a bid.
Much to our chagrin and enormous disappointment, we learned that Bunker Ramo had also submitted a proposal and had been awarded the prototype contract. As this contract was to be our avenue to proof-of-concept as well as a vehicle for becoming independent of Lockheed independent research funding, I felt we had lost a major opportunity, and we needed to come up with another alternative. I decided we should submit a very low-cost proposal, one within Mel Day's discretionary funding limit, for a parallel experiment, arguing that this way NASA would have a backup in case the Bunker Ramo system didn't work out to their satisfaction.
A summary of the interesting bidding process is reported as follows:
Ames Research Center Prototype. Our proposal was minimal covering only the cost of the remote terminal equipment (an IBM 2260 display terminal with printer) and a 1200 baud, leased line between the Lockheed facility in Palo Alto and the Ames Research Center. We had proposed installing the leased line in order to support a CRT display system rather than the dial-up teletype system proposed by Bunker Ramo. We were awarded a contract from NASA in 1966 and were operational in January of 1967.
At NASA/Ames, Dialog was used both by end users and librarians (on behalf of end users). There was a single database, that of NASA, and the system allowed only a single person searching at a time. An analysis of the results later showed that end users spent significantly more time online in search formulation and viewing intermediate results with smaller printouts, whereas librarians behaved just the opposite — less search time and much larger printouts. This, of course, makes sense in that end users could better determine the online results they wanted whereas the librarian, as an intermediary, tended to be more exhaustive in searching. The only complaint we got from the service was from a librarian who said demand for her services had increased to the point that she had to cut short her coffee break.
Turnaround time for searching the NASA STAR database was thus reduced from 14 hours plus mail and handling when done on the NASA headquarters IBM 1410 computer to a few minutes at the remote site. Furthermore, the search could be modified during the process without having to reformulate the entire search.
Based on the success of the Ames implementation, we extended the contract on request to install remote terminals at three other NASA facilities. At this point Dialog provided for several simultaneous users but was still configured for a single database. This project, begun in 1966, marked the very first remote, interactive, information retrieval application utilizing real people doing real searches on a very large database. We were excited beyond words!
NASA RECON. In 1967 NASA issued a competitive RFP (request for proposal) for development of the NASA RECON system. We submitted a bid of $180,000 against a dozen or so prominent software companies including such giants as Informatics, Computer Sciences, IBM, and others. We received the award, which was our first major development. The contract specified several enhanced features but otherwise was very close to the original Dialog. The result was called NASA/RECON (Remote Console Information Retrieval System). In preparation for the work, we upgraded the laboratory computer to an IBM 360/40 computer that was faster and contained more internal memory. In the bid, we included a rights-in-data clause that gave Lockheed the right to use any software developed for our own purposes. This right proved invaluable to the future success of the business as will be seen.
Following successful installation of the NASA/RECON software on the NASA facility computer, our group was awarded contracts from the Atomic Energy Commission (AEC) and the European Space Research Organization (ESRO) to install Dialog on their computers.
In 1969 we negotiated a contract with the U.S. Office of Education, to provide them with a retrieval service on the ERIC database. I met with Harvey Marron and Lee Burchinal of the U. S. Office of Education to discuss installing Dialog on one of their computers (as we had done before with the other agencies). They indicated they had no interest in operating computers and asked whether we could simply mount their database on our computer and provide them access for searching. Of course we could! And so this became our first services contract and changed the group from a systems development/installation organization into a services organization.
What I learned from this transition proved profound. In the development/installation mode, one is effectively out of business at the conclusion of the contract and thus needs to scurry around for additional contracts or lay off people. In a services mode your customers become dependent on the continued supply of your service (if it's useful to them) and thus you tend to operate under renewable contracts. In this way you can build and accumulate business by adding new customers and adding useful services for existing customers.
Many businesses can operate either in a contract mode or a services mode. The latter is far more desirable from a business continuity point of view. Before its breakup by the justice department, IBM would not sell computers, it only leased them. This assured them a continuing, largely predictable revenue stream and discouraged competition. It was at this point in 1970 that I decided Dialog was to become a commercial services business.
Another event occurred at this time that could have changed the future of Dialog. My father-in-law, a successful patent attorney, suggested that we buy Dialog from Lockheed and set it up as an independent business. He offered to finance the $100,000 or so that we felt Lockheed would require. For various reasons my intuition dissuaded me from pursuing his offer. The decision was probably a good one in most ways other than the potential wealth we might have realized later on.
Dialog, the Business
With the demonstrated utility of Dialog during the NASA contract and as a result of requests from other organizations, in 1971 I proposed to Lockheed management that we launch a commercial business based on Dialog and the database services we were already supplying to government agencies. With a foundation of government services that covered most of our expenses, we could easily take on the risk associated with a commercial startup. I felt we had a real head start and could develop momentum as we progressed.
Lockheed management was reluctant for many reasons and deferred approval of the commercial program. Then came the trigger that spurred them into action. Carlos Cuadra of Systems Development Corporation (a Lockheed competitor) mailed a survey exploring the feasibility of establishing an information retrieval service similar to the one that I had in mind. This reassured Lockheed management that there really could be an opportunity, and they approved the commercial launch. We initiated the commercial phase of our online service in 1972 with a grand total of three databases: ERIC from the Educational Resources Information Center, NTIS from the National Technical Information Service, and PANDEX (a Science Citation Index look-alike created by Dick Kollin) from Crowell, Collier and Macmillan, together with half a dozen customers.
We had achieved our goal of becoming independent of independent research support, and the business was cash-flow positive from this time forward. Dialog illustrates that entrepreneurship (i.e., entrepreneurial but developed within a parent organization) can succeed with proper nurture and support.
What is presented here is in a sense the first chapter in a continuing story of the evolution and development of Dialog. The service in 2002 offers 531 databases and has been credited with changing the profession of reference librarianship.
Over the past 30 years, Dialog has changed its corporate name and parentage and now finds itself in a very solid position. Under the present management of Roy Martin, Dialog CEO, my dream of providing access to the world's important technical literature continues along its path toward fulfillment with its present custodian, The Thomson Corporation. I feel that Dialog now has a strong parent whose primary mission is consistent with my original vision. It is particularly gratifying to me to see this position stated so clearly by Richard Harrington, President and CEO of Thomson Corporation: "Our goal is to get the right information to the right people at the right time with the right applications and software, to enable our customers to make better decisions, faster."
Did You Know ...
The name for the system, "Dialog," occurred to me in 1966. My wife Ginger and I, with our two babies, Jennifer and Scott, were on our way to Portland to visit her parents. She was driving and I was dictating a project plan, for what was to become Dialog, into a small, voice-activated tape recorder. But what should the project be called? The system was to be interactive between human and machine. The searcher in a sense said, "This is what I want," and the machine replied in effect, "This is what I have." Described that way, we decided — why not call it "Dialog." And, that was it!
IN THIS ISSUE