Chapter 11: Relating content to the user – Academic and Professional Publishing

11

Relating content to the user

Joy van Baren

Abstract:

This chapter introduces User Experience and explains its importance to the future of academic and professional publishing. The latest insights into information-seeking behaviour of researchers are presented along with their implications for information solutions. The chapter proceeds to discuss various approaches to relate content to the user including interactivity, personalisation, text mining, and interoperability and workflow support. New challenges facing the industry related to versioning and trust are described as well as proposed initiatives to address them, such as the CrossMark service.

Key words

User experience

user-centred design

researcher workflow

information-seeking behaviour

personalisation

text mining

information extraction

data mining

interoperability

APIs

CrossMark

Introduction: user experience in the publishing industry

The Digital Revolution has dramatically changed the way in which academic and professional audiences access and use information. In their competition for online readership information providers try to increase the discoverability of their content, present it in an attractive format, and surround it with useful features and services to lure readers back to their sites. For the first time in the history of publishing, publishers truly need to understand the needs, motivations, expectations, and behaviour of their customers and users in order to successfully address them online.

Donald Norman first coined the phrase ‘User Experience’ in the mid 1990s to refer to the range of psychological and behavioural responses surrounding human–system interactions. The International Organization for Standardization (2009) defines User Experience as ‘A person’s perceptions and responses that result from the use and/or anticipated use of a product, system or service.’ A related but somewhat narrower term, usability, is defined as ‘The extent to which a system, product or service can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use.’ The invention and subsequent uptake of these terms, accelerated by the widespread adoption of the Internet, signals the realisation by various industries of the need to change product development processes to encompass an increasingly user-centred design philosophy.

A user-centred design process comprises three distinct stages:

 Understand: explore the users, their needs and their workflow. Research methods for understanding users include direct observation, diary studies, interviews and surveys.

 Design: translate insight from the understanding process into an information architecture, a range of functionalities and a look and feel. This can be achieved through applying interaction design, information architecture and visual design.

 Evaluate: validate design assumptions and assess whether the proposed solution successfully addresses user needs and adheres to usability standards. Evaluation methods include formal usability testing, heuristic analysis and cognitive walkthrough.

The user-centred design process is iterative in nature, meaning that the three stages are repeated to optimise different aspects of the product, or to progressively adapt the design until its User Experience is deemed optimal.

Adherence to a user-centred design process results in a User Experience that enables orientation, transparency, consistency and accessibility. Questions such as ‘where am I?’, ‘what can I do?’ and ‘what will happen next?’ should at all times be easy to answer for different types of users. As a result, users can accomplish their tasks in an efficient and effective manner, and will feel comfortable using the system. Various studies have found correlations between user experience and customer loyalty (Temkin et al., 2009; Garrett, 2006). There is also evidence pointing to savings in development spend that can be realised through ‘getting it right’ the first time around, as well as savings in training and support services realised through better usability. Pressman (1992) and Gilb (1998) investigated studies across different industries showing that every dollar invested in User Experience brings between 2 and 100 dollars in return; IBM (2011) uses the rule of thumb that every dollar invested returns 10–100 dollars.

The publishing industry has been relatively slow to embrace User Experience, perhaps because it was never at the heart of traditional print processes. In a recent article on digital publishing, Brett Sandusky (2011) stated that ‘the publishing industry has a history of creating products for a “customer” that they never speak to, speak of, see, interact with, or consider’. In their haste to migrate content from the print into the digital realm, publishers have typically not strayed far from traditional print format. When comparing an eBook with a print book, or a PDF to a print article, one is struck by how similar they look. Not only does this approach make for a suboptimal User Experience by not taking into account the limitations of the medium, it also fails to take advantage of the possibilities to make digital content more attractive to and relevant for users.

An additional cause of the publishing industry not utilising user-centred design processes to their full potential is that the customer and the user are often not the same person. Conversations primarily if not exclusively focus on librarians and information managers who are responsible for the acquisition and roll out of content and workflow solutions. Their needs and level of understanding with regard to online content and systems are different from those eventually using the content in their daily work: the researchers. Although publishers are well acquainted with researchers in the capacity of authors, they are less well informed on their needs as consumers of content.

A final reason for the somewhat unhappy marriage between publishing and User Experience is that content is not a traditional area of focus for User Experience. Design tends to concentrate on the presentation, the interface, rather than its content. In publishing, content is of course the main asset offered to the customer, and therefore needs to be part of the User Experience equation. In light of the progressive commoditisation of content, future solutions will be expected to provide added value by relating content to the end-users and enabling them to efficiently and effectively apply it in the context of their daily work.

This chapter will investigate findings from market and user research providing insight into the workflow and online behaviour of researchers. It then explores characteristics of digital content and electronic environments that can contribute to relating content to the user, as well as challenges that must be addressed in this endeavour.

Researcher workflow: insights from market and user research

Academics and professionals seek to find and consume content not for leisure but to enable the successful performance of their jobs. For knowledge workers in academia and industry, missing a relevant article may mean the difference between acceptance and rejection of an article or grant proposal, or delay in the solving of a critical problem. Researchers have developed a structural process of systematic investigation that has remained fundamentally unchanged through the centuries. Although new media have the capability to change aspects of this process, they can only do so by providing added value. Users are typically unwilling to adapt their behaviour for the sake of using a new tool, unless it allows them to perform an activity faster or better than before.

Industry research and consultancy firms have emphasised the importance of integrating with customer workflow. In an Electronic Publishing Services report, Worlock and Evans (2003) presented five case studies of information providers who managed to deliver significant end-user benefits by adapting existing solutions or developing new web-based solutions based on workflow research and ongoing discussions with their customer base. Brunelle (2006) urged publishers to ‘be the scientific user’ and to directly tie potential product features to specific and explicit user needs. To follow this advice one must of course first become intimately acquainted with the workflow and information needs of researchers.

Brunelle surveyed information use by academics and professionals reaching 260 respondents in the scientific domain. Perhaps unsurprisingly, nearly all respondents (98 per cent) indicated that they use external information for their jobs. In the majority of cases (77 per cent) they find this information themselves, without the help of others. The physical library has clearly fallen from grace as a valued resource, with only 3 per cent mentioning it as a go-to place for information.

The total amount of time spent on information averaged a significant 11 hours, a 22 per cent increase since 2001. A notable finding is that almost as much time is spent on gathering the information (45 per cent) as on reading and applying it (55 per cent). In nearly one-third of cases the information-seeking process fails, and the scientist has nothing to show for the time invested.

In an ethnographic field study, Jones (2005) examined life scientists’ use of information resources, analysing information activity in the context of real-life research projects. He found that in spite of the abundance of information sources available through the library, life scientists typically adopted a small number of simple tools such as Google or PubMed, chosen for transparent interfaces and high expected utility.

Jones also looked at varying information need in the different stages of the scientific workflow, as illustrated in Figure 11.1. He found that information seeking and usage is highest during literature review, and second highest during manuscript writing. Not only the required amount of information but also the nature of the question varies in different stages of the workflow. In addition to different users having distinct strategies for finding information, the same user may have different needs according to the eventual purpose of the information and the broader task they are trying to accomplish. For example, the search behaviours of a scientist trying to find the exact article information to use in a reference and that same scientist conducting a broad search to find new information in the field are very different.

Figure 11.1 Composite of information activity in research life cycle: A, exhaustive literature review; B, organise, discuss, and share findings; C, initial experiments; D, additional literature review to establish context for initial findings; E, further experiments; F, write draft manuscript; G, additional literature review to establish context for writing; H, write final manuscript

In a series of studies focusing on information-seeking behavior carried out between 2002 and 2012 the Elsevier User Centered Design team interviewed and surveyed hundreds of scientific researchers. The following key recurring challenges were most frequently mentioned as hindering the effective gathering of knowledge.

Information overload

The challenge most frequently mentioned by scientific researchers is that of information overload. With a steep increase in the number of researchers and articles published it has become challenging to filter out irrelevant information. Combined with increased pressure to spend time on research management activities such as grant writing, information overload contributes to the high workload of scientific researchers.

Many different sources

Modern scientists have a great number of different content sources available to them, such as abstract and indexing databases, journal websites, web search engines, institutional repositories, library catalogues, blogs, eBooks and so on. An often heard frustration is the need to repeat the same search in multiple information sources, and more or less manually deduplicate the results.

Need for comprehensiveness

As mentioned in the Introduction, the consequences of not finding relevant content in a timely manner can be severe. Many researchers describe their fear of missing the one article that could have helped win a grant, get their research published or inspire a new idea. Although the need for comprehensiveness has always been present, it is compounded by information overload and the time and effort involved in going to multiple sources.

Finding the right unit of information

Scientific information is contained in documents: articles and books. Although finding a document may in some situations be the ultimate purpose of an online search, in many cases it is not. The primary goal of the searching scientist is to find facts, answers and ideas. Some facts are buried deep in the details of a document, and some patterns may only be discovered through reading multiple documents. Because of this mismatch between information need and information granularity, a lot of time is associated with finding answers, and sometimes they are not found at all.

Workflow support

Once the right information is found, there is still the issue of applying it in the context of one’s workflow. Researchers often spend distinct timeslots on finding multiple relevant articles and then use them during subsequent work, sometimes months or even years later. Scientists indicate that the storing, managing and retrieving of this information is far from smooth. Integration between the different tools used along the scientific workflow is a pain point, and disproportionate amounts of effort can be required for simple actions to port information from one tool into the next, such as citing an article found online in a text editor, sharing it with a colleague through an email client or saving the PDF whilst retaining the appropriate metadata to efficiently retrieve it later on.

Quality and authority of information

A final problem that has increased with the growing amount of information available online is establishing the quality of the information and its source. Peer review is of course a traditional quality distinction, but most search engines do not distinguish between published and non-published content. Yet, they are the primary source used by many students and early career academics. How can academics verify that they are dealing with information that is reliable, accurate and current?

The implications of these findings are clear: the process of finding, consuming and utilising content is imperfect, and the available tools fail to cater to the specific information needs along the academic workflow.

The role of the publisher traditionally has been to select the highest quality content for publication without considering how it will be used. To successfully compete in the current market this role will have to expand to ensure the discovery and successful application of the knowledge communicated in the published content. In the next four sections we will look at several technologies that have the potential to contribute to addressing these critical user needs.

Interactivity, personalization and dialogue

The Introduction noted that there are often not many differences between print articles or books and their digital equivalents. A common pitfall when transitioning from print to electronic is to merely repurpose print content, thereby enforcing the limitations of the original print container. Characteristics of electronic environments such as interactivity, personalisation and dialogue can become critical assets for creating a compelling online user experience.

Paper is a static medium that at best can present different versions of a publication at different moments in time and point to other documents through text references. The instant interaction enabled on the web opens up a new range of possibilities for preserving context, presenting multimedia and allowing users to play around with the content presented. Experimental data can not only be looked at, but can be downloaded. Movies or audio fragments can be presented and viewed in context, without leaving the page. Embedded interactive components let readers manipulate images or charts, for example by rotating them, zooming and even adding or removing data. On paper a reference list is just a list, but electronically it can be sorted according to characteristics such as publication date or number of citations, making it easier to explore and scan.

A publication never stands alone: it cites a set of previously published documents as references and is in turn cited by a number of documents published later. Whereas in the past accessing this related content required another walk to the library, it can now be instantly accessed through linking between online documents. Interlinking between different content types can help to further enhance contextualisation and interpretation of content. An example of this is linking from detailed technical publications to subject-specific encyclopaedias or dictionaries that help explain the terminology used in those publications and provide the background information required for understanding them. Significant efficiency gains can be introduced into the user workflow by providing this information in the context of the original article, at the push of a button.

A unique asset of electronic environments is that end-users can customise them according to their needs and preferences. At the most basic level the way the content is presented can be adapted. Interface colours, font sizes and page layout are increasingly subject to customisation. In some cases users can modify the location of features and information elements on the page, for example through ‘drag and drop’. They can specify which content is most relevant to them, thereby influencing what shows up, or in which order. The system should of course remember all of these preferences and automatically apply them for future viewing of content. Tools such as iGoogle (http://www.google.com/ig) and NetVibes (www.netvibes.com) enable users to set up a personal information dashboard in which they can customise content, layout and appearance according to their preferences.

In addition to the user manually providing information about their preferences, systems can personalise content and user experience based on implicit data. Online behaviour such as items viewed, search history and features utilised can provide the basis for automatic content recommendations and optimisation of the interface. The system can also look at the choices of users with similar characteristics and online behaviour and content preferences and proactively recommend content based on this analysis; Amazon (www.amazon.com), for example, has successfully used this approach in their ‘people who bought this book also bought…’ feature.

Scholarly publications aim to contribute to an ongoing conversation. This dialogue has traditionally taken place on paper, at conferences and in offices or labs. Although social media will never replace these interactions, they have the potential to capture at least some aspects of them. Users can share, tag, recommend or rate online content. They can review publications and provide responses through commenting features or on blogs, wikis and discussion forums. Thus, the digital article becomes a living document reflecting not only the thoughts of the author(s) but also the reactions of its readership.

Aggregators such as FriendFeed (http://friendfeed.com) provide customised feeds containing content shared by friends or colleagues, and facilitate discussion around this content. Faculty of 1000 (http://f1000.com) is a website for researchers and clinicians that provides ratings and reviews of research papers by a panel of peer-nominated experts in the field. These examples illustrate how online social or professional networks can be used as a filter for finding relevant information, and finding it fast.

Although the options described here do not fundamentally change content, they do empower users to interact with content in a way that is relevant to their unique needs and preferences, thereby offering an experience far superior to print.

Metadata and text mining

The proliferation of scholarly literature makes it more and more challenging for researchers to stay current in their field. The push for interdisciplinarity further increases pressure on the individual, who has to stay abreast of relevant developments in adjacent fields as well as their own. A simple keyword search generates thousands if not ten thousands of hits in scientific databases or web search engines. It would be an impossible feat to scan through all of these results manually, let alone read and interpret them.

Publication metadata (e.g. title, source, authors and references) can be utilised as a framework for discovery. First, metadata are a key enabler for online search. In addition to making searches more precise by searching in specific metadata elements, metadata can be used in faceted search. In faceted search, sometimes also referred to as faceted navigation, a collection of documents is analysed by a number of discrete attributes or facets. For scholarly content, these facets often include author name, subject area, publication year, journal and affiliation. Values in each of these facets are presented in order of occurrence frequency, and provide the user with a contextual framework for understanding the scope of their search as well as intuitive options for further refining it.

Metadata can also be used for interlinking content: once an interesting article has been found it becomes possible to follow a variety of trails to find related information. In most literature databases documents by the same author, documents citing the current document or other documents citing the same references are only a mouse click away. This capacity greatly promotes the discoverability of ‘hidden gems’ and helps users find multiple inroads into the abundance of information available for a given topic.

A technology that goes several steps further than the analysis of simple metadata is text mining. Text mining refers to a range of interdisciplinary methods and technologies used to extract information from a collection of free (unstructured) text documents and identify patterns in that information, leading to the discovery of new knowledge (http://www.nactem.ac.uk). Text mining employs methodologies from data mining, information retrieval, machine learning, statistics and natural language processing.

The text mining process can be characterized by three main stages, as follows: Information Retrieval refers to the selection of a subset of relevant documents from a larger collection, based on a user- or system-generated query. It can happen based on searching through document metadata, but is increasingly based on full text search. Information Retrieval in this context is a pre-processing step to gather a relevant body of documents for further analysis, but developments in this field can also be used to improve the quality of search for end-users.

Information Extraction transforms unstructured text into structured information by locating specific pieces of data in text documents and extracting them as entities. Simple examples of entities are people, companies, dates or locations. As a next step, relationships between extracted entities can be elicited. For example, a company can be located in a country, or an event linked to a date. Natural language processing principles such as part-of-speech tagging (classifying words in a sentence as verbs, nouns, adjectives, etc.) and word sense disambiguation (selecting the most appropriate of possible meanings) are used to help make sense of the text and structure the information extracted from it. Figure 11.2 shows an example of entities and their relationships extracted from text and placed into a structured representation called a template. Extracting entities and analysing the relationships and interactions between extracted entities enables advanced semantic applications that reveal patterns in content.

Figure 11.2 Information extraction: from free text to structured template

Data Mining refers to the computational process of analysing patterns in large sets of data in order to discover new knowledge. Once text has been transformed into structured data, the data can be compared with those extracted from other documents to find similarities and patterns. Examples of frequently analysed entities and relationships in the scientific domain are interactions between proteins and associations between genes and diseases (Ananiadou et al., 2006).

The three stages of text mining to some extent mimic the process that humans follow when searching, reading and analysing literature. The human brain is an impressive but imperfect analytical tool; it simply cannot process the vast amounts of data that computers are capable of. Relationships that are buried in the literature, hidden from the human eye, may be discovered through the application of text mining technology. Another difference is that whereas humans are subjective computers are objective; unburdened by prior knowledge and expectations they are less prone to selection bias. In a way, text mining allows the data to speak for itself.

Text and data mining are frequently coupled with visualisation techniques to facilitate the discovery of patterns in information. Visualisation techniques such as tag clouds, heat maps, tree maps, (geographical) scatter plots, stream graphs and time series can all be used to expose relationships between entities.

In addition to detecting patterns, text mining can be utilised to automatically assign documents to groups without human judgment or intervention. Two approaches in this area are categorisation and clustering. Categorisation refers to the automated classification of documents into predefined categories, for example scientific subject areas or disciplines, similar to traditional manual indexing systems. Clustering relates to the grouping of documents into organic groups or clusters based on statistical, lexical and semantic analysis. In this latter case it is not only the grouped documents that are of interest, but also the derivation of the themes emerging from this bottom-up process.

Geißler (2008) has explored the diverse scenarios through which these approaches have been adopted by publishing companies through cases studies involving Thomson Scientific, Elsevier, Springer Science & Business Media and LexisNexis. In addition to these applications by major publishers, it is remarkable how many examples of text mining can be found in the academic domain in the form of solutions created by scientists, for scientists. Many of these applications stem from the field of bioinformatics, which by nature deals with enormous amounts of data. Krallinger et al. (2008) maintain an excellent compendium reviewing the many available text mining applications.

Text mining is a rapidly evolving field with applications that are becoming increasingly used and valued. Increasing computational capabilities coupled with the availability of digital content hold the promise for further growth and experimentation, allowing the field to reach maturity. However, there are several difficulties that have yet to be addressed. Natural language is characterised by ambiguities, which complicates the accurate extraction of entities and relations. Scientists use many acronyms and abbreviations, which can have different meanings in different disciplines. Scientific terminology can be inconsistent or imprecise, and is subject to constant evolution. Words can have different meanings, and names many variants. Developing systems capable of resolving such ambiguities is far from trivial. They require hand-annotated training data or manually constructed semantic lexicons, and their output needs to be reviewed and revised. There is no ‘one size fits all’ text mining solution, as different communities have different information needs, and different questions may require different levels of accuracy.

In a study commissioned by the Publishing Research Consortium, Smit and Van der Graaf (2011) interviewed 29 experts and surveyed 190 representatives of scholarly publishers. The results confirm expectations of a continued upwards trend in journal article mining by both publishers and third parties. Several scenarios for responding to increasing content mining demands were explored. More standardised mining-friendly content formats, a shared content mining platform and commonly agreed permission rules for content mining were viewed as the most promising common solutions, with most universal support for the first scenario.

Challenges facing researchers as discussed include being overwhelmed by the amount of available information as well as difficulties arising from a mismatch between the granularity of information and the type of questions that need answering. Text mining can help end-users manage their information overload and discover information which might otherwise only be found by reading many documents – or not at all – and generate new insights by making connections between previously unrelated concepts. It therefore has the potential to accelerate the discovery process for the individual researcher as well as for science in general.

Interoperability and workflow support

We have observed that scientists use many different tools to address the information needs occurring across their workflow. These tools include different content sources, internal data repositories, reference management applications, text editing software, email clients and many more. Some of those tools work well, others work less well – but most of them do not integrate well with each other. Designing and implementing a one-stop shop serving different audiences across different scholarly fields and activities resembles the pursuit of the Holy Grail! Integration is the key; instead of reinventing the wheel, publishers need to ensure that their content and solutions can be integrated with applications widely used among researchers.

Perhaps the most effective way to relate content to users is by making it directly available to them outside of the boundaries of physical or digital containers, because it allows them to flexibly integrate it with other applications. In other words: ‘your data, my way’. Structured and open dissemination of data can be achieved through Application Programming Interfaces (APIs), as most commonly used for websites and web-based technologies. APIs make data available to third parties so that software developers can build their own solutions on top of it. The underlying philosophy is that no one knows the needs and problems of scientists as well as scientists themselves. By utilising APIs developers can combine content from different sources and use it in the context of existing applications through mashups, thereby creating powerful productivity-enhancing applications.

The New York Times and the UK Guardian Media Group were among the first content providers to adopt this approach. In the scientific domain, Elsevier, Springer and Public Library of Science are amongst those who have created APIs and have facilitated app contests and hackathons in which developers are invited to package and use their content in innovative solutions of their own design. The developed apps are made available through application galleries and marketplaces. This model allows publishers to more effectively partner with their academic and professional communities, and to provide a platform for collaboration.

Authority, versioning and trust

The Internet has given the term ‘authority’ a new meaning. Blog search engine Technorati (http://technorati.com), for example, provides the following definition:

‘Technorati Authority measures a site’s standing and influence in the blogosphere. Authority is calculated based on a site’s linking behavior, categorization and other associated data over a short, finite period of time. A site’s authority may rapidly rise and fall depending on what the blogosphere is discussing at the moment, and how often a site produces content being referenced by other sites.’

Similar to Google’s PageRank, this definition reflects the idea that quality and relevance can be predicted based on popularity, thereby relying on the ‘wisdom of the crowd’.

In a way, this approach is not very different from using citations, the number of times a document is mentioned in the bibliographic references of other publications, as an indication of quality and impact. Citations have long been used as measures of relative importance of publications and constitute the building blocks of journal ranking indicators such as the Impact Factor (http://thomsonreuters.com/products_services/science/academic/impact_factor/) and more recently SJR and SNIP (Colledge et al., 2010) as well as performance metrics for individuals such as the h-Index (Hirsch, 2005) (see Chapter 10). These indicators can help researchers manage information overload by guiding their decisions on which articles, books or journals to select as primary sources from the large volume of potentially relevant literature.

The notion of authority in academia and scholarship is traditionally associated with the process of peer review. Prior to publication, the work of an author is subjected to the scrutiny of domain experts appointed as reviewers. The aim of this quality control is to enhance scientific merit of published literature by ensuring originality, theoretical foundation, transparency of scientific methods and the validity of conclusions (see Chapter 2).

Web search engines typically do not distinguish between peer-reviewed and non-peer-reviewed content, yet they are often the first source used. Multiple versions of the same publications are often available on the web. Uncorrected manuscripts may be available on author homepages, institutional repositories or preprint servers. The final published versions reside in scientific literature databases or can be found through publisher websites. Students or early career researchers may not be aware of why this distinction matters, and of the potential implications of including references in their own work that are not part of the established, trusted literature. Even for experienced scholars it can be difficult to differentiate between the available alternatives and choose the authoritative copy as a foundation for their research.

To address this challenge and enhance transparency of online scholarly publications, CrossRef (www.crossref.org) introduced the CrossMark service. The CrossMark Service is made up of a logo and associated metadata. The CrossMark logo is placed on a PDF or HTML document or abstract page in order to communicate to readers that they are looking at a publisher-maintained copy of the publication. Articles in press can have a CrossMark, but not preprints. Clicking on the logo will bring up metadata provided by the publisher about the current status of the article and other useful information such as the CrossRef DOI, the location of associated data, publication history, agencies that funded the research and the nature of the peer-review process used to evaluate the content.

In addition to making it easier for researchers to discern the peer-reviewed version from manuscripts and preprints, CrossMark serves as a record of updates occurring post-publication. Scientific content is subject to change, and hence publications can never be considered final. Examples of changes are errata, corrections, enhancements, updates or new editions, while in some cases publications may even be withdrawn or retracted. The CrossMark metadata include status information indicating whether the published content has undergone any changes. Thus, readers can rest assured that they are utilising information that is both accurate and current.

A pilot of the CrossMark service launched in 2011 and is scheduled to go live in 2012. Publishers participating in this pilot include Elsevier, Wiley, OUP and the Royal Society. Participation involves displaying the CrossMark logo in online solutions, depositing CrossMark metadata with CrossRef, and of course keeping this information up to date.

Conclusion and outlook

Over the centuries, publishers have built up excellence in the creation and evaluation of content. In the current age, where the value of content in itself is no longer a certainty, these core competencies need to be extended. Understanding how users need to access and interact with content in their daily work and taking advantage of the capabilities of digital content as well as the environment in which it is presented are critical success factors.

Semantic technologies such as text mining can help to address challenges of information overload, the need for comprehensiveness and granularity of information. As these technologies come of age, they will change the way in which researchers engage with content and speed up the process of scientific discovery.

The structure of the scholarly article has remained relatively unchanged. Its linear, static representation is very suitable for print, but less so for online reading. When will authors truly start writing for the web? New and different structuring approaches could be aimed towards taking full advantage of the interactivity of the web, as well as optimising content for semantic analysis.

APIs allow publishers to open up their data to third parties and individual developers to build their own applications on top of published content. This is expected to result in less product-centric and more modular approaches to product development, in which content sources, applications and platforms can be combined into custom solutions with relative ease.

Publishers will have to find ways to deal with new challenges introduced by the proliferation of information online, such as the availability of multiple versions of documents and the fading distinction between peer-reviewed and non-peer-reviewed. Although initiatives such as CrossMark attempt to rise to this challenge, their adoption and eventual effectiveness remain to be seen.

To take full advantage of the possibilities of presenting content electronically and to rise to the challenges outlined here, publishers will have to develop new skills. They will need user-centred research capabilities to gain insight into customer needs and workflows, design skills to translate those insights into an effective User Experience, and the technology expertise to successfully implement this while utilising the capabilities of online environments. Whereas large publishers may develop or gain access to such skills with relative ease, smaller publishers may have to rely on collective platforms or partnerships with subject matter experts to take the next step in presenting their content to online audiences.

References

Ananiadou, S., Kell, D. B. Text mining and its potential applications in systems biology. Trends in Biotechnology. 2006; 24(12):571–579.

Colledge, L., De Moya-Anegón, F., Guerrero-Bote, V., López-Illescas, C., El Aisati, M., Moed, H. SJR and SNIP: two new journal metrics in Elsevier’s Scopus. Serials: The Journal for the Serials Community. 2010; 23(3):215–221.

Brunelle, B. Scientists As Information Users — Product Innovation is the Name of the Game. London: Outsell Inc. ; 2006.

Garrett, J. J., Customer loyalty and the elements of user experience. Design Management Review, 2006;17(1) http://www. dmi. org/dmi/html/interests/strategy/06171GAR35. pdf

Geißler, S. New methods to access scientific content. Information Services and Use. 2008; 28(2):141–146.

Gilb, T. Principles of Software Engineering Management. Reading, MA: Addison Wesley; 1988.

IBM. Cost Justifying Ease of Use. http://www. cim. mcgill. ca/~jer/courses/hci/ref/IBM_ease_of_use. pdf, 2007.

International Organization for Standardization (ISO)ISO FDIS 9241210. Human-centred design process for interactive systems. Gland, Switzerland: ISO, 2009.

Hirsch, J. E. An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences USA. 2005; 102(46):16569–16572.

Jones, P. H. Information practices and cognitive artifacts in scientific research. Cognition, Technology & Work. 2005; 2005:88–100.

Krallinger, M., Hirschman, L., Valencia, A. Linking genes to literature: text mining, information extraction, and retrieval applications for biology. Genome Biology 9: S8. http://zope. bioinfo. cnio. es/bionlp_tools/, 2008.

Pressman, R. S. Software Engineering: A Practitioner’s Approach. New York: McGraw-Hill; 1992.

Sandusky, B. Portraits of an industry in flux: digital publishing and UX. UX Magazine. article no. 601. http://uxmag. com/strategy/portraits-of-an-industry-in-flux, 2011.

Smit, E., M., Van der Graa. Journal article mining: a research study into practices, policies, plans…and promises. Publishing Research Consortium http://www. publishingresearch. net/documents/PRCSmitJAMreport20June2011VersionofRecord. pdf, 2011.

Temkin, B. D., Chu, W., Geller, S. Customer Experience And Loyalty: A Closer Look. Impact Of Usefulness, Ease Of Use, And Enjoyability Differs Across Industries. Forrester Research, 2009.

Worlock, K., Evans, N. Integrating Content With Workflow: Learning from the pioneers. London: Electronic Publishing Services Ltd (now part of Outsell), 2011.

Sources of further information

User experience

Krug, S. Don’t Make Me Think! A Common Sense Approach to Web Usability, 2nd ed. Berkeley, CA: New Riders Publishers; 2006.

Spool, J., Perfetti, C., Brittan, D. Designing for the Scent of Information. Retrieved from http://www. uie. com/reports/scent_of_information/, 2004.

UX Magazine, http://uxmag. com

Researcher workflow and information seeking behavior

Anderson, C., Glassman, M., McAfee, R., Pinelli, T. An investigation of factors affecting how engineers and scientists seek information. Journal of Engineering and Technology Management. 2001; 18(2):131–155.

Hine, C. Databases as scientific instruments and their role in the ordering of scientific work. Social Studies of Science. 2006; 36(2):269–298.

Inger, S., Gardner, T. How readers navigate so scholarly content. Retrieved from http://www. sic. ox14. com/howreadersnavigatetoscholarlycontent. pdf, 2008.

Text mining

Cohen, K. B., Hunter, L. Getting started in text mining. PLoS Computational Biology. 2008; 4(1):0001–0003.

Jackson, P., Moulinier, I. Natural Language Processing for Online Applications: Text Retrieval, Extraction, and Classification, 2nd ed. Herndon, VA: John Benjamins Publishing Company; 2007.

National Centre for Text Mining, University of Manchester. http://www. nactem. ac. uk.

Rzhetsky, A., Seringhaus, M., Gerstein, M. B. Getting started in text mining: Part two. PLoS Computational Biology. 5(7), 2009.