ALA   American Library Association Search ALA      Contact ALA      Login     
Cover of ITAL. Information Technology and Libraries ISSN 0730-9295
 

LITA Publications
ITAL: Information Technology and Libraries
TER Technology Electronic Reviews
Current LITA Publications List
Publications Archive (Newsletter, JOLA)
Suggest a LITA Publication

An Architecture for Behavior-Based Library Recommender Systems

Andreas Geyer-Schulz, Andreas Neumann, and Anke Thede


Library systems are a very promising application area for behavior-based recommender services. By utilizing lending and searching log files from online public access catalogs through data mining, customer-oriented service portals in the style of Amazon.com could easily be developed. Reductions in the search and evaluation costs of documents for readers, as well as an improvement in customer support and collection management for the librarians, are some of the possible benefits. In this article, an architecture for distributed recommender services based on a stochastic purchase incidence model is presented. Experiences with a recommender service that has been operational within the scientific library system of the Universität Karlsruhe since June 2002 are described.

Almost all scientific libraries feature electronic library management systems. With their online public access catalogs (OPACs), they possess all the requirements in almost the same manner as digital libraries for electronic value-added services. A very promising add-on for traditional libraries are recommender systems, the necessity for which arises from the need of scientists and students for efficient literature research, as shown by the survey of Klatt et al.1 Due to—among other things—information overload and difficult quality assessment, information seekers are more and more incapable of compiling relevant literature from convential database-oriented catalog systems in a time-efficient manner. Therefore, as the survey reveals, they rely heavily on peers for recommendations. Considering the tight schedule of many students, university teachers, and researchers, it is worth the effort to free up the valuable time consumed in steering each other to the standard literature of their fields, which could be done easily by behavior-based expert advice services. Moreover, in this scenario, they can also profit from the combined knowledge of all library users in contrast to the more restricted knowledge within their personal networks. Consumer acceptance and convenience of recommender systems are shown by the huge success of the broad variety of different services offered at commercial bookstore sites (such as Amazon.com). People are getting used to these services and appreciate them. So the question to ask is: Why are these services not offered on a broader scale within scientific libraries? Discussing this question with librarians and computer scientists, the following reasons were discovered:

  • Privacy. Librarians are very considerate of the privacy of their patrons. Transaction-level data as well as reading histories must be protected.
  • Budget restrictions. Public libraries in general run under tight budget restrictions. New electronic services for millions of users might require prohibitively high additional information technology (IT)-investments.
  • Data size. The number of documents contained in many public or academic library systems is at least one order of magnitude higher than in most commercial organizations. This implies that transaction-level data is scattered on more documents.

While one would expect that more data implies a better chance for finding meaningful patterns, it becomes increasingly difficult to detect these patterns due to their sparsity, and because the computational complexity of counting such association rules is exponential in the number of objects. Standard association-rule algorithms reduce the complexity by deleting all objects that do not receive sufficient support. In a library context, the sparsity of the data, unfortunately, makes this approach unfeasible. Increasing the support threshold to reduce the computational complexity will lead to pruning all meaningful but weak association patterns that may be below the support threshold, but that are still statistically significant. This article presents a strategy to overcome these obstacles with behavior-based recommendations that can be efficiently generated from anonymous session data on off-the-shelf PC systems.

In digital libraries, recommender systems already have a tradition of supporting the search process of users. Fab, for example, was developed as part of the Stanford Digital Library Project.2 Fab combines a content-based and a collaborative recommender system that filters Web pages according to content analysis and creates usage profiles for user groups with similar interests. PADDLE is a system that introduces customization and personalization features to deal with the information overload caused by the mass of documents in digital libraries.3 The University of California at Berkeley Digital Library Project allows users to build personalized collections of their documents of interest.4 Recommendation services for digital libraries and their evaluation methods are discussed by Bollen and Rocha.5 The virtual university of the Wirtschaftsuniversität Wien offers a collection of categorized Web links and is equipped with many different recommendation and personalization services.6 Other projects deal with the problem of information overload and intelligent systems for information retrieval in digital libraries, such as query-relevance feedback and information alert systems.7

Today, many commercial sites compete by offering a variety of different value-added services that can successfully be added to digital library applications. In fact, online bookstores not offering at least some of these services are not supposed to survive. Amazon.com as market leader offers many different types of recommendation services. But others, such as bol.com, jumped on the bandwagon of recommendations as well, thus achieving a broader range of expertise about documents than a bookstore clerk can possibly provide.

In contrast to digital libraries, no related work can be found in the field of traditional scientific libraries, although these are up-to-date and still the ones offering the broadest and largest variety of literature. Current digital library projects, compared to traditional libraries, are often specialized, comprise less documents, and focused on different types of documents like Web pages not present in traditional libraries. Although this paper is focused on the context of a scientific library, the system it discusses and the underlying behavior patterns are by no means restricted to scientific environments only. It should be stressed that this system could, in principle, augment any existing library system. Moreover, since the university library of the Universität Karlsruhe (TH) acts as information provider for the South-West German library network with twenty-five libraries, it provides recommendations for a variety of libraries including community libraries like the City Library of Karlsruhe and the Central Library of Baden (Badische Landesbibliothek).

This article is structured as follows. The Distributed Architecture for Library Recommender Systems section shows the scalable architecture of a recommender system for legacy libraries. This distributed-agent-based service is operational at the university library of the Universität Karlsruhe (TH). The Generation of Recommendations section presents the mathematical model adapted from Ehrenberg’s repeat-buying theory used to generate the recommendations and address the questions of privacy and data size involved, especially compared to other currently available techniques. An evaluation of the system is given in the Evaluation section.

Distributed Architecture for Library Recommender Systems

The recommender services implemented at the library of the Universität Karlsruhe (TH) are based on a generic, three-level architecture whose main idea is described by the pattern of a library with agents.8 An agent, as described in Russell and Norvig, perceives its environment through sensors and acts upon it using its effectors.9 Humans as well as software systems can be agents. An example of an agent is a part-picking robot with a conveyer belt and parts as its environment. The robot perceives pixels of varying color and intensity on the belt and acts by picking up the parts and placing them into bins, with the goal of correctly sorting the parts. Figure 1 shows the architecture of the system. The first, lower level consists of the data sources that reside at the legacy library system as well as at the recommendation system’s site. Additionally, external library sources are also integrated into the legacy library application (twelve library catalogs hosted at the university library, thirteen external catalogs). The dashed lines indicate the system boundaries. The second level, separated by the dotted line from the first level, contains the agents for the library system as well as for the recommendation service. The third level represents the user-interface system.

Figure 1. Architecture for a library with recommender services


Figure 1. Architecture for a library with recommender services

Different active agents collaborate in order to make up the library and the recommendation service. The agents’ environment consists of the lower-level data sources as well as the upper-level user-interface agent. The agents detect changes in the environment through their sensors and take appropriate actions to such changes. The arrows in figure 1 indicate which environment resources these agents collect information from and which resources are, in turn, altered by them. Bold arrows represent information that is transported over system boundaries. The data exchanged between agents from the recommendation system and the legacy library system are transported using the hypertext transfer protocol (HTTP). The OPAC agent represents the online catalog functionality of the legacy library application. The agent waits for user-agent requests for document information, extracts that information from the corresponding library sources, and returns the data to the user agent, where it is presented to the user. At the same time, the request is inserted into the transaction logs for later analysis.

A second agent is involved in implementing the display of recommendations to the user: the visualization agent of the recommendation system. By incorporating corresponding links in the reply of OPAC, the visualization agent is also invoked by the user-interface agent. The visualization agent checks in the repository of pregenerated recommendation Web pages to ascertain whether recommendations are available for the currently requested document. Then, depending on the request, it returns a link pointing to the recommendation Web page or directly to the list of recommendations to be displayed to the user. The transaction agent and the aggregation agent cooperate to generate the recommendations. The transaction agent regularly sends the latest transaction logs to the aggregation agent. The aggregation agent analyzes the information contained in the logs and calculates recommendations for each possible information object. The recommendations are saved into a repository in the recommendation system.

The information objects of the university library system are described using the German standard representation for libraries’ metadata, the Maschinelles Austauschformat für Bibliotheken (MAB) format for books and journals.10 MAB is a textual format that can easily be transmitted using HTTP. Each single information object can be uniquely identified by the combination of its document number and the name of the library or catalog to which it belongs. Recommendations are generated based on the unique identifier without any additional knowledge about the document description. The task of the pregeneration agent is to examine newly generated recommendations, provide the MAB description for each of the recommended items, and generate Web pages ready to be presented to the user for each recommendation list. The agent manages a local repository of the document-description data containing the relevant subset of the information defined by the MAB format for the creation of recommendation lists. Whenever it encounters an information object for which no local description data is available, the pregeneration agent communicates with the doc-info agent residing at the library system. The doc-info agent extracts the MAB description from the corresponding library and returns it to the pregeneration agent, which in turn saves a copy of this information in its local repository to save bandwidth and time for future requests of the same document.

The evaluation agent works independently of the other agents. Its environment comprises the user interface from which it collects opinions of general and expert users and a separate repository for saving the evaluations for later analysis.

The library is regarded as a market for information where each detailed-document information constitutes a product. The inspection of an information object’s details is regarded as a purchase at the cost defined by the amount of time the user (consumer) spends on viewing the information. Consecutive inspections of document details by the same user can be identified through the session identifier that is created upon the first search for documents and carried along in all subsequent page requests from the same user interface (that is, the Internet browser). The session identifier is coded in the uniform resource identifiers (URI) of the requests and is thus contained in the Web server’s transaction logs. The aggregation agent analyzes the transaction logs and identifies the market baskets for each user. These baskets contain all detailed-document inspection calls having the same session identifier. The recommendations are computed using the contents of the market baskets as described in more detail in the Generation of Recommendations section.

Due to the nature of the Internet, extraction of market baskets from the transaction log has to be preceded by several preprocessing steps that deal with the following problems:

  • Filtering of Web robot and other automated requests. The detection of robot requests is not a straightforward task and has been the topic of several recent research papers.11 Not all robots identify themselves through an appropriate user-agent specification. Other patterns such as regular or subsequent requests with only a few seconds between them, as well as analysis of the remote machine’s name or IP address, can be used to filter robots. In the preprocessing phase carried out by the aggregation agent all sessions that contain at least one HEAD-request are regarded as originating from an automated process and are filtered. HEAD-requests are defined by the HTTP-protocol as requests which transmit only the meta-data section of a HTML page in order to support for instance the Web robots of search engines for indexing with the purpose of reducing network traffic. This heuristic covers all requests with a user agent identifying an automatic process as a known web robot. Thus it seems to catch Web robots efficiently (at least in the absence of robots pretending to be humans).
  • Bookmarked session. Session identifiers contained in URIs have one major draw-back: Users bookmarking this URI for later access will access the server with an outdated session ID. As the server does not have the ability to invalidate sessions after a certain timeout and detect requests that are made with invalid session IDs, sessions with a pause of at least fifteen minutes are split and assigned new, unique session IDs.
  • Requests without session IDs cannot be used for market-basket analysis and are deleted.
  • Public-terminal access. Public terminals that can be used without personal login may generate requests with the same session identifier but with different users if the explicit closure of the session is omitted. If users change quickly and the network identifications of the public-access terminals are unknown, the appropriate end of a user session is difficult to identify. Still, combinations of specific documents inspected by different users into a larger market basket will occur randomly. Random co-inspections of the same documents are filtered by the stochastic model used for calculating recommendations; thus no false recommendations are produced from the data contained in the multi-user session.

Preprocessing reduces the number of lines in the log file by approximately three quarters. All log entries not containing an HTTP-GET request to the library’s search interface are filtered out. This includes, for example, image files or in-site navigation requests. No relevant behavioral data is lost. The aggregation agent still has to calculate recommendations from, on average, about 142,000 detailed inspections per week. The transaction logs are transmitted on a daily basis at 1 a.m. when user network traffic is low. Computation of the new recommendations is performed regularly four times a month. This update rate is a compromise between the timeliness of the recommendations and the computational expense of each update. The update is done during off-peak hours at night. An incremental update was implemented that accesses only those documents inspected by a user during the analyzed period. A complete update of the presently existing number of documents (1,171,502 documents) is not necessary and would require too much time and computational effort. Using the incremental update, on average 48,718 different documents are accessed in sessions with at least two entries for which the recommendations have to be recalculated. This is possible because the results of the underlying stochastic model for a particular document are independent of other global data. The incremental update thus reduces the computational complexity to 4.16 percent with respect to the total number of documents.

As can be seen from figure 1, data repositories of the different systems are physically distributed; they are only loosely coupled by their update processes. The agents that communicate over system boundaries are more tightly coupled, in that they have to communicate using a well-defined interface. The interfaces consist of simple HTTP-GET and POST requests with a defined set of parameters. For the interface agent, the answer transmitted back is standard HTML; for the pregeneration agent, MAB data; and for the aggregation agent, transaction-log data in plain text format. The visualization agent uses a special technique for displaying a link to the recommendation page. This link is incorporated in the document-detail page delivered by OPAC that is only visible whenever recommendations are actually available for the current document, and can be activated by clicking on an image. The document-detail page always contains a link to the corresponding recommendation page that is activated by clicking on an image. The image is a GIF requested from the visualization agent, and returns a transparent, 1-pixel GIF if there are no recommendations, or an image with “Empfehlungen (Others also use. . . )” if recommendations are present (see figure 2). The advantage of this method is that failure of the recommendation server does not affect the usability of the library OPAC. In the worst case, a broken image icon will be displayed, but all remaining functionalities, as well as the transmission speed of the page, remain unaffected.

Figure 2. Detailed view of documents


Figure 2. Detailed view of documents

Currently, the recommendation server resides on a standard PC with a 1.2 GHz AMD Athlon processor and 1.5 GB main memory, running a customized Linux (Kernel Ver-sion 2.4.20) based on a Mandrake 8.0 distribution. The software agents are implemented in Perl (5.6) using MySQL as the underlying database system. The system hardware costs approximately $2,500; no license cost for software applies due to the consequent usage of open-source software. Even a small community library should be able to afford these expenses.

Figure 3. Recommendation list


Figure 3. Recommendation list

The user interface is shown in figures 2 and 3. Figure 2 presents the detailed-document view with all the important information. On the right hand side, a link to the recommendation list is shown indicating that recommendations are available for this document. The link points to the page shown in figure 3, with the recommendations listed in descending order with respect to their significance as defined by the number of co-inspections with the original document. Additionally, icons for the evaluation of each pair of documents, as well as the service as a whole, are displayed (see Evaluation section). The system is accessible at www.ubka.uni-karlsruhe.de.

Generating Recommendations

The frequency distribution of library document co-inspections is described by Ehrenberg’s repeat-buying theory as following a logarithmic series distribution (LSD) under the following assumptions:

  • the population of potential inspectors is large compared to the number of actual inspectors;
  • patron inspections follow a Poisson distribution;
  • the distribution of the means of Poisson processes follows a truncated gamma-distribution; and
  • the market is stationary.12

Ehrenberg’s classic book on repeat-buying theory remains a readable standard reference and a suitable introduction to consumer panel analysis for the practitioner without a mathematical background.13 However, in order to be applicable in a library context, two additional insights that are not at the center of Ehrenberg’s theory are necessary.

Ehrenberg’s theory faithfully models the noise part of buying processes. That is, repeat-buying theory is capable of predicting random co-purchases of consumer goods. Intentionally bought combinations of consumer goods—a six-pack of beer, spare-ribs, potatoes, and barbecue-sauce for dinner, for example—are outliers. In this sense, Ehrenberg’s theory acts as a
filter to suppress noise (stochastic regularity) in buying behavior.14

The second key insight that allows an adaptation of repeat-buying theory for anonymous user-groups is the economic theory of self-selection, which today is routinely used in the design of incentive contracts. A readable example of how this theory is used in hiring and human-resource management is found in Spence.15 Geyer-Schulz et al. argue that this theory allows observation of aggregate buying processes for self-selected groups of consumers, and thus detects outliers in aggregate processes.16 This is essential for a library context because it preserves the privacy of the individual and thus addresses the problem of privacy raised in the introduction. A detailed account on this work is in progress.

Ehrenberg’s repeat-buying theory was successfully empirically checked for many classes of nondurable consumer products (soap, coffee, toothpaste) and is used today for consumer-panel analysis in marketing.17 An overview of stochastic-purchase incidence models can be found in Wagner and Taudes.18 This theory with its strong independence assumptions was utilized by Böhm et al. to describe the regularities of anonymous users in the context of a virtual university.19 Further on, Ehrenberg’s theory was tested successfully for the automatic generation of product recommendations at a business-to-business computer accessories dealer and within this context compared to association-rule algorithms.20 The adaptation of models from repeat-buying theory to scientific libraries is described in detail in Geyer-Schulz et al. and Geyer-Schulz, Neumann, and Thede.21

According to this theory, recommendations are all outliers with respect to the LSD distribution of a library document and its co-used documents within a session.

Generating recommendations means to identify all co-used document pairs that appear more frequently than expected by the model—that is, all outliers that violate the independence assumptions—and present these to the patron as recommendations.

Figure 4. Logarithmic plot of the frequency distribution of infrared and raman spectroscopy by Schrader


Figure 4. Logarithmic plot of the frequency distribution of infrared and raman spectroscopy by Schrader

The Schrader example shown in figure 4 demonstrates that the framework described above holds for libraries. The observed frequency distribution f (x obs) corresponds to the ranking by decreasing number of repeated co-purchases in figure 3. Recommendations are the outliers, or products, that have been bought together more often than expected by the stochastic model. More specifically, an LSD-model with a robust mean of 1.410 and a robust parameter q=0 479 passes a chi square-goodness-of-fit test at a=0 01 (c2= 8 016, which is below 13 816, the critical value at a=0 01).

Evaluation

Table 1 summarizes the statistical results for Ehrenberg’s repeat-buying models on usage data of the twenty-five libraries hosted by the library of the Universität Karlsruhe (TH) for the period of January 1, 2001 until June 11, 2003. Of the overall quantity of 14.5 million documents, users have viewed 1,171,502 in a session together with at least one other document. From these data, 214,980 lists with recommendations have been generated, with a total number of 2,204,980 recommendations. This corresponds to a total coverage of 1.48 percent, which is small due to the fact that only a small percentage of these documents were actually inspected during the observation period. The coverage with respect to those documents actually inspected in two or more sessions is 18.4 percent. Since only a small number of the documents within the catalogs are inspected by the majority of the patrons, the percentage of recommendations in respect to detailed inspections shows an even higher number. A second, separate statistic (this time computed for the time period from January 1, 2001 to June 18, 2003) shows that for 46.21 percent of the 3,083,840 visited detailed inspection pages recommendations can be provided.

For evaluation of user acceptance of the recommendation service as a whole, a five-item Likert scale was inserted into the recommendation list (see figure 3). The user could choose between “dispensable” (1), “needs improvement” (2), “usable” (3), “good” (4), and “super” (5). During the period from February 7, 2003 to June 23, 2003, a total of 723 votes were collected. The distribution of votes is shown in figure 5, with a mean of 4.26, and a variance of 0.97. The development of the mean can be seen in figure 6. The results show that a very high percentage of users of the recommendation service are very satisfied or satisfied with the availability and quality of the service. The strong stability of the mean over time shows that no major changes in user acceptance can be expected during future observation periods.

Figure 5. Opinions of users about the recommendation service (dispensable [1] to super [5])


Figure 5. Opinions of users about the recommendation service (dispensable [1] to super [5])

Figure 6. Mean value of users' opinions about the recommendation service (dispensable [1] to super [5])


Figure 6. Mean value of users' opinions about the recommendation service (dispensable [1] to super [5])

Although user acceptance seems to be high, such Internet-based opinion surveys raise the problem of positive bias by self-selected users and of what constitutes a representative sample for such surveys. One way of controlling for such bias ex post facto is in testing the distribution of additional known attributes of the sample in order to detect significant deviations in the distribution with respect to the population. Due to the anonymity of the survey, standard demographic attributes cannot be checked. However, the ratio of delivered votes to recommender usage was checked to see whether it remained constant. Additional candidates to be checked would be the distribution of votes and recommendation list accesses with regard to different topics or library catalogs. With each additional attribute tested, the representativity of the sample becomes more credible. However, this remains to be checked.

Figure 7. Daily requests for recommendation lists and users votes


Figure 7. Daily requests for recommendation lists and users votes

Figure 8. Box plot of the ratios of smoothed daily user votes per smoothed recommendation list calls (in percent)


Figure 8. Box plot of the ratios of smoothed daily user votes per smoothed
recommendation list calls (in percent)

Figure 7 shows the distribution of the number of daily votes in the upper graph and the distribution of the daily recommendation list calls for the same period in the lower graph. First, the curves in figure 7 were smoothed using a seven-day moving average to filter the week-specific patterns. From the smoothed values, the ratio of votes-per-recommendation-list calls was calculated for each day in percent. The mean of these ratios is 5.43 percent with a standard deviation of 1.09 percent. The box plot in figure 8 shows the distribution of these ratios. The extreme ratios are 3.19 percent and 7.87 percent. The median ratio is 5.45 percent and the two quartiles around the median lie between 4.53 percent and 6.22 percent. The plot shows that the ratios are nearly normally distributed with the mean near the median. Actually, 65.77 percent of all ratios are within a distance of one standard deviation from the mean. No significant outliers were observed. This shows the acceptability of the hypothesis that there is no significant change in the number of votes collected relative to the number of recommendation lists called during a day.

Implicit preferences can be drawn from the number of links followed from the recommendation pages. During the first five months in 2003, a total of 15,115 calls to detailed-document views were issued from the recommendation list of another document. This corresponds to an average of 3,023 per month. With a total of 16,269 recommendation lists shown per month, it can be seen that in 18.6 percent of recommendation lists, the links were actually followed.

An average of 2.6 percent of the recommendation links shown each day was actually clicked on by users (standard deviation of 0.6 percent). It could be argued that this number indicates that in 97.4 percent of the cases the recommendation service was dispensable. But there are many diverse reasons for users not to be interested in recommendations. Recommendations are useful when browsing through a topic for interesting literature. But users might instead be looking for a particular document (such as a course book) or might be in a hurry and just take the first document that they run into. This result is still slightly above the 1.8 percent reported by Lawrence et al. in a field experiment of IBM and Safeway Stores plc, a U.K. supermarket retailer.22

Implementation Guidelines

The following conditions must be satisfied in order to set up this recommender service with an existing legacy library system:

  • The legacy library system should offer access only through an online public access catalog.
  • Each inspection of a detailed document page must be logged into a log file. Common Web servers can be configured to write a log file with each URL access protocolled. This requires the URL to contain all necessary information for identification of the viewed document.
  • Subsequent inspections from the same user client must be able to be identified. This can be realized using either cookie technology or by personalizing the users’ URLs by inserting unique session identifiers. As the session identifiers appear in the URL, the information can be extracted from the Web-server log
    file. The cookie mechanism requires the script that performs the document inspection to identify known cookies and save the information for later analysis.
  • The point in time of each inspection must be known. The exact time of the Web-page request is contained in the Web-server log.
  • The detailed-document page must be modified to include the extra image link that checks whether recommendations are available for this particular document. The link contains the name of the recommendation-list script as well as all information necessary for identification of the viewed document and all remaining parameters necessary for calling the detailed document page. This permits the recommendation page to present a link back to the detailed document page, which is important for reasons of usability.
  • The recommendation server must be accessible from the Internet as it delivers the link image as well as the recommendation list directly to the user client.
  • The legacy library Web server and the recommendation server must be connected via HTTP or FTP for transferring the above one or more textual log files. Regular execution of programs can be configured in most operating systems; a simple program implementing an HTTP file upload or an FTP file transfer can be used. The recommendation server must accordingly provide a simple Web script that handles HTTP file uploads respectively an FTP server.
  • The recommendation server must be able to translate the document identifier to information about the document (title, author). There are several possibilities for bringing this about. The optimal choice depends on how and where this information is stored (for example, direct database connection, HTTP transmission with data coded in plain text or XML).

On the recommendation server, a job must be started on a regular basis, such as once a week, to examine the newly transferred log files, do the preprocessing, and update the recommendations from the new information product baskets. The necessary data can be stored into simple text files so no database applications are needed. Two different scripts implement the link image and the recommendation list. The scripts can be implemented in any language that supports communication with the Web server via the common gateway interface (CGI).

The Web pages with the recommendation lists can either be generated on the fly at each user access or, to increase performance, be pregenerated after calculation of the new recommendations. The choice depends on the frequency of recommendation updates, on the performance of the Web-side generation, and on the availability of the necessary hard-disk space.

Future Research

The transfer of repeat-buying theory suggests that today’s libraries have much in common with retail chain management, at least as far as the underlying stochastic purchase models are concerned. For further research, the following topics will be addressed:

  • Higher-order associations. Ehrenberg’s model generalizes to higher-order associations that can be used to generate path-specific recommendations for individual users.
  • Fashions, trends, dynamics. By using sliding time windows over the data set nonstationarities can be identified by simple statistical tests. This could be used to summarize purchase histories for individual documents of similar document categories in order to visualize reading trends and fashions.
  • Personalization. Combining self-selection according to experience in a scientific field allows recommendations to be generated that are tailored to the individual interest and ability profile of a patron. For example, in an academic environment such recommendations support individual tutoring of students with heterogeneous educational backgrounds.
  • Content analysis and clustering for catalog management. Significant outliers provide a new source for document clustering which may lead to an improved access by users.
  • Collection management. Repeat-buying models aggregated at the level of document categories could be used for implementing a more customer-centric collection policy.

    Acknowledgment

The authors gratefully acknowledge the funding of the project “Scientific Libraries in Information Markets” by the Deutsche Forschungsgemeinschaft within the scope of the research initiative “V 3 D 2 ” (DFG-SPP 1041).

References

1. Rüdiger Klatt et al., Nutzung und Potenziale der innovativen Mediennutzung im Lernalltag der Hochschulen (Dort-
mund, Germany: BMBF-Studie, 2001). Accessed Oct. 6, 2003, www.stefi.de.

2. Marko Balabanovic, “An Adaptive Web Page Recommendation Service,” in Proceedings of the 1st International Conference on Autonomous Agents, Feb. 1997 (Marina del Rey, Calif.: ACM Pr.), 378–85; Marko Balabanovic and Yoav Shoham, “Fab: Content-Based, Collaborative Recommendation,” Communications of the ACM, 40, no. 3 (Mar 1997): 66–72; The Stanford Digital Libraries Group, “The Stanford Digital Library Project,” Communications of the ACM, 38, no. 4 (Apr. 1995): 59–60.

3. David Hicks, Klaus Tochtermann, and Andreas Kussmaul, “Augmenting Digital Catalog Functionality with Support for Customization,” in Proceedings of 3d International Conference on Asian Digital Libraries (Berlin: Springer, 2000), 155–61.

4. Robert Wilensky et al., “Reinventing Scholarly Information Dissemination and Use,” Technical Report, University of California, Berkeley, 1999. Accessed Oct 6, 2003, http://elib.cs.berkeley.edu/pl/about. html.

5. Johan Bollen and Luis M. Rocha, “An Adaptive Systems Approach to the Implementation and Evaluation of Digital Library Recommendation Systems,” in J. Borbinha and T. Baker, eds., Proceedings of the 4th European Conference on Digital Libraries, vol. 1923 of LNCS (Heidelbert, Germany: Springer, 2000), 356–59.

6. Andreas Geyer-Schulz, Michael Hahsler, and Maximilian Jahn, “A Customer Purchase Incidence Model Applied to Recommender Services,” in R. Kohavi et al., eds., Proceedings of the WebKDD–Mining Log Data across All Customer Touchpoints, vol. 2356 of Lecture Notes in Artificial Intelligence LNAI (Berlin: ACM, Springer-Verlag, 2002), 25–47.

7. G. Salton and C. Buckley, “Improving Retrieval Performance by Relevance Feedback,” Journal of the American Society for Information Science 41, no. 4 (1990): 288–97; Ioannis Papadakis, Ioannis Andreou, and Vassileios Chrissikopoulos, “Interactive Search Results,” in M. Agosti and C. Thanos, eds., Research and Advanced Technology for Digital Libraries, vol. 2458 of LNCS (Berlin: Springer, 2002), 448–62; Ann Apps and Ross MacIntyre, “Prototyping Digital Library Technologies in zetoc,” in M. Agosti and C. Thanos, eds., Research and Advanced Technology for Digital Libraries, vol. 2458 of LNCS (Berlin: Springer, 2002), 309–23; Norbert Fuhr et al., “Daffodil: An Integrated Desktop for Supporting High-Level Search Activities in Federated Digital Libraries,” in M. Agosti and C. Thanos, eds., Research and Advanced Technology for Digital Libraries, vol. 2458 of LNCS (Berlin: Springer, 2002), 597–612.

8. Andreas Geyer-Schulz and Michael Hahsler, "Pinboards and Virtual Libraries—Analysis Patterns for Collaboration," Technical Report 1, Institut für Informationsverarbeitung und
-wirtschaft, WirtschaftsUniversität Wien, A-1090
(Wien: Institut für Informationsverarbeitung und -wirtschaft, 2001). Accessed Oct. 6, 2003, wwwai.wu-wien.ac.at/~hahsler/research/virlib_working2001/virlib.pdf.

9. Stuart Russell and Peter Norvig, “Introduction and Survey of AI,” in Artificial Intelligence: A Modern Approach—The Intelligent Agent Book (Upper Saddle River, N.J.: Prentice-Hall, 1995), 31.

10. Die Deutsche Bibliothek, MAB2: Maschinelles Austauschformat für Bibliotheken, 2d ed. (Leipzig/Frankfurt am Main: Die Deutsche Bibliothek, 1999).

11. Robert Walker Cooley, “Web Usage Mining: Discovery and Application of Interesting Patterns and Web Data” (Ph.D. diss., University of Minnesota, 2000); Pang-Nin Tan and Vipin Kumar, “Discovery of Web Robot Sessions Based on Their Navigational Patterns,” Data Mining and Knowledge Discovery 6, (2002): 9–35.

12. A. S. C. Ehrenberg, Repeat-Buying: Facts, Theory, and Applications, 2d ed. (London: Charles Griffin, 1988).

13. Ibid.

14. Andreas Geyer-Schulz et al., “Behavior-Based Recommender Systems As Value-Added Services for Scientific Libraries,” in Hamparsum Bozdogan, ed., Statistical Data Mining and Knowledge Discovery (Boca Raton, Fla.: Chapman and Hall/CRC, 2003), 433–54.

15. Michael A. Spence, Market Signaling: Information Transfer in Hiring and Related Screening Processes (Cambridge, Mass.: Harvard Univ. Pr., 1974).

16. Geyer-Schulz et al., “Behavior-Based Recommender Systems.”

17. P. Charlton and A. S. C. Ehrenberg, “Customers of the LEP,” Applied Statistics 25 (1976): 26–30; C. Chatfield, A. S. C. Ehrenberg, and G. J. Goodhardt, “Progress on a Simplified Model of Stationary Purchasing Behavior,” Journal of the Royal Statistical Society A 129 (1966): 317–67; N. Powell and J. Westwood, “Buyer-Behaviour in Management Education,” Applied Statistics 27 (1978): 69–72; H. S. Sichel, “Repeat-Buying and the Poisson-Generalised Inverse Gaussian Distributions,” Applied Statistics 31 (1982): 193–204; SRS, “The SRS Motorists Panel,” Technical Report (London: Sales Research Service, 1965); Aske Research, “The Structure of the Toothpaste Market,” Technical Report (London: Aske Research Ltd., 1975).

18. Udo Wagner and Alfred Taudes, “Stochastic Models of Consumer Behaviour,” European Journal of Operational Research 29, no. 1 (1987): 1–23.

19. Walter Böhm et al., “Repeat-Buying Theory and Its Application for Recommender Services,” in Otto Opitz and Manfred Schwaiger, eds, Exploratory Data Analysis in Empirical Research, vol. 22 of Studies in Classification, Data Analysis, and Knowledge Organization, Gesellschaft für Klassifikation e.V. [German Classification Society], (Heidelberg: Springer-Verlag 2002), 229–39.

20. Andreas Geyer-Schulz, Michael Hahsler, and Anke Thede, “Comparing Association-Rules and Repeat-Buying-Based Recommender Systems in a B2B Environment,” in Between Data Science And Everyday Web Practice, Studies in Classification, Data Analysis, and Knowledge Organization (Heidelberg: Springer-Verlag, 2003).

21. Geyer-Schulz et al., “Behavior-Based Recommender Systems,” 433–54; Andreas Geyer-Schulz, Andreas Neumann, and Anke Thede, "Others Also Use: A Robust Recommender System for Scientific Libraries, " in T. Koch and I. T. Solvberg, eds., Research and Advanced Technology for Digital Libraries, 7th European Conference, ECDL 2003, Trondheim, Norway, 2003: Proceedings, Lecture Notes in Computer Science #2769 (Berlin: Springer Verlag, 2003), 113–25.

22. R. D. Lawrence et al., “Personalization of Supermarket Product Recommendations,” Data Mining and Knowledge Discovery 5 (2001): 11–32.


Andreas Geyer-Schulz (geyer-schulz@em.uni-karlsruhe.de), Andreas Neumann (neumann@em.uni-karlsruhe.de), and Anke Thede (thede@em.uni-karlsruhe.de) are Researchers at the Schroff Chair of Information Services and Electronic Markets, Institute for Information Engineering and Management, Department of Economics and Business Engineering, Universität Karlsruhe (TH), Germany.


| ITAL Vol. 22, No. 4|