Computer Science - Digital Libraries Publications

Number of published scholarly articles is growing exponentially. To tackle this information overload, researchers are increasingly depending on niche academic search engines. Recent works have shown that two major general web search engines: Google and Bing, have high level of agreement in their top search results.

This article offers a personal perspective on the current state of academic publishing, and posits that the scientific community is beset with journals that contribute little valuable knowledge, overload the community's capacity for high-quality peer review, and waste resources. Open access publishing can offer solutions that benefit researchers and other information users, as well as institutions and funders, but commercial journal publishers have influenced open access policies and practices in ways that favor their economic interests over those of other stakeholders in knowledge creation and sharing. One way to free research from constraints on access is the diamond route of open access publishing, in which institutions and funders that produce new knowledge reclaim responsibility for publication via institutional journals or other open platforms.

Whereas the generation of Shannon-type information is coupled to the second law of thermodynamics, redundancy--that is, the complement of information to the maximum entropy--can be increased by further distinctions: new options can discursively be generated. The dynamics of discursive knowledge production thus infuse the historical dynamics with a cultural evolution based on expectations (as different from observations). We distinguish among (i) the communication of information, (ii) the sharing of meaning, and (iii) discursive knowledge.

The importance of dimensional analysis and dimensional homogeneity in bibliometric studies is always overlooked. In this paper, we look at this issue systematically and show that most h-type indices have the dimensions of [P], where [P] is the basic dimensional unit in bibliometrics which is the unit publication or paper. The newly introduced Euclidean index, based on the Euclidean length of the citation vector has the dimensions [P3/2].

A researcher may publish tens or hundreds of papers, yet these contributions to the literature are not uniformly distributed over a career. Past analyses of the trajectories of faculty productivity suggest an intuitive and canonical pattern: after being hired, productivity tends to rise rapidly to an early peak and then gradually declines. Here, we test the universality of this conventional narrative by analyzing the structures of individual faculty productivity time series, constructed from over 200,000 publications matched with hiring data for 2453 tenure-track faculty in all 205 Ph.

As more scholarly content is being born digital or digitized, digital libraries are becoming increasingly vital to researchers leveraging scholarly big data for scientific discovery. Given the abundance of scholarly products-especially in environments created by the advent of social networking services-little is known about international scholarly information needs, information-seeking behavior, or information use. This paper aims to address these gaps by conducting an in-depth analysis of researchers in the United States and Qatar; learn about their research attitudes, practices, tactics, strategies, and expectations; and address the obstacles faced during research endeavors.

We provide an up-to-date view on the knowledge management system ScienceWISE (SW) and address issues related to the automatic assignment of articles to research topics. So far, SW has been proven to be an effective platform for managing large volumes of technical articles by means of ontological concept-based browsing. However, as the publication of research articles accelerates, the expressivity and the richness of the SW ontology turns into a double-edged sword: a more fine-grained characterization of articles is possible, but at the cost of introducing more spurious relations among them.

International collaboration in science continues to grow at a remarkable rate, but little agreement exists about dynamics of growth and organization at the discipline level. Some suggest that disciplines differ in their collaborative tendencies, reflecting their epistemic culture. This study examines collaborative patterns in six previously studied specialties to add new data and conduct analyses over time.

Researchers in the Digital Humanities and journalists need to monitor, collect and analyze fresh online content regarding current events such as the Ebola outbreak or the Ukraine crisis on demand. However, existing focused crawling approaches only consider topical aspects while ignoring temporal aspects and therefore cannot achieve thematically coherent and fresh Web collections. Especially Social Media provide a rich source of fresh content, which is not used by state-of-the-art focused crawlers.

Collections of Web documents about specific topics are needed for many areas of current research. Focused crawling enables the creation of such collections on demand. Current focused crawlers require the user to manually specify starting points for the crawl (seed URLs).

The recently proposed Euclidean index offers a novel approach to measure the citation impact of academic authors, in particular as an alternative to the h-index. We test if the index provides new, robust information, not covered by existing bibliometric indicators, discuss the measurement scale and the degree of distinction between analytical units the index offers. We find that the Euclidean index does not outperform existing indicators on these topics and that the main application of the index would be solely for ranking, which is not seen as a recommended practice.

It is no secret that the number of scholarly events and venues available for researchers is and has been dramatically expanding. While this tremendous expansion is certainly a boon for academia as a whole, it has become increasingly difficult for many researchers to identify events and venues related to their work. Therefore, as opportunities to share scholarly work continue to expand, researchers may find themselves unable to determine effectively which venues publish data and research most in line with their scholarly interests.

Cities are engines of the knowledge-based economy, because they are the primary sites of knowledge production activities that subsequently shape the rate and direction of technological change and economic growth. Patents provide a wealth of information to analyse the knowledge specialization at specific places, such as technological details and information on inventors and entities involved, including address information. The technology codes on each patent document indicate the specialization and scope of the underlying technological knowledge of a given invention.

Web archives capture the history of the Web and are therefore an important source to study how societal developments have been reflected on the Web. However, the large size of Web archives and their temporal nature pose many challenges to researchers interested in working with these collections. In this work, we describe the challenges of working with Web archives and propose the research methodology of extracting and studying sub-collections of the archive focused on specific topics and events.

The CENDARI infrastructure is a research supporting platform designed to provide tools for transnational historical research, focusing on two topics: Medieval culture and World War I. It exposes to the end users modern web-based tools relying on a sophisticated infrastructure to collect, enrich, annotate, and search through large document corpora. Supporting researchers in their daily work is a novel concern for infrastructures.

The scientific paper output of the United Nations University (UNU) was bibliometrically analysed.It was found that (i) a noticeable continous paper output starts in 1995, (ii) about 65% of the research papers have been published as international cooperations and 18% as single-authored papers, (iv) the research papers rank above world average according to Pudovkin-Garfield Percentile Rank Index, and (v) paper content indicate the wide variety of scientific topics UNU has been and is working on.

Capabilities to exchange health information are critical to accelerate discovery and its diffusion to healthcare practice. However, the same ethical and legal policies that protect privacy hinder these data exchanges, and the issues accumulate if moving data across geographical or organizational borders. This can be seen as one of the reasons why many health technologies and research findings are limited to very narrow domains.

The Journal Impact Factor (JIF) has been heavily criticized over decades. This opinion piece argues that the JIF should not be demonized. It still can be employed for research evaluation purposes by carefully considering the context and academic environment.

Citations are commonly held to represent scientific impact. To date, however, there is no empirical evidence in support of this postulate that is central to research assessment exercises and Science of Science studies. Here, we report on the first empirical verification of the degree to which citation numbers represent scientific impact as it is actually perceived by experts in their respective field.

In this comment, I discuss the use of statistical inference in citation analysis. In a recent paper, Williams and Bornmann argue in favor of the use of statistical inference in citation analysis. I present a critical analysis of their arguments and of similar arguments provided elsewhere in the literature.

In this short communication, we provide an overview of a relatively newly provided source of altmetrics data which could possibly be used for societal impact measurements in scientometrics. Recently, Altmetric - a start-up providing publication level metrics - started to make data for publications available which have been mentioned in policy-related documents. Using data from Altmetric, we study how many papers indexed in the Web of Science (WoS) are mentioned in policy-related documents.

Bibliometrics is successful in measuring impact, because the target is clearly defined: the publishing scientist who is still active and working. Thus, citations are a target-oriented metric which measures impact on science. In contrast, societal impact measurements based on altmetrics are as a rule intended to measure impact in a broad sense on all areas of society (e.

Research in sentiment analysis is increasing at a fast pace making it challenging to keep track of all the activities in the area. We present a computer-assisted literature review and analyze 5,163 papers from Scopus. We find that the roots of sentiment analysis are in studies on public opinion analysis at the start of 20th century, but the outbreak of computer-based sentiment analysis only occurred with the availability of subjective texts in the Web.

Although altmetrics and other web-based alternative indicators are now commonplace in publishers' websites, they can be difficult for research evaluators to use because of the time or expense of the data, the need to benchmark in order to assess their values, the high proportion of zeros in some alternative indicators, and the time taken to calculate multiple complex indicators. These problems are addressed here by (a) a field normalisation formula, the Mean Normalised Log-transformed Citation Score (MNLCS) that allows simple confidence limits to be calculated and is similar to a proposal of Lundberg, (b) field normalisation formulae for the proportion of cited articles in a set, the Equalised Mean-based Normalised Proportion Cited (EMNPC) and the Mean-based Normalised Proportion Cited (MNPC), to deal with mostly uncited data sets, (c) a sampling strategy to minimise data collection costs, and (d) free unified software to gather the raw data, implement the sampling strategy, and calculate the indicator formulae and confidence limits. The approach is demonstrated (but not fully tested) by comparing the Scopus citations, Mendeley readers and Wikipedia mentions of research funded by Wellcome, NIH, and MRC in three large fields for 2013-2016.

This is a Commentary in $Physics~Today$ on the novel review process developed by the biology journal $eLife$, with the suggestion that it be adopted by physics journals.

This paper analyses the problem of scientific quality of physics journals. The main assumption is that the quality of a physics journal exists only in reference to other journals. Instead of constructing new indicators of scientific quality, we identify a physical journal with corresponding empirical distribution function of citations.

Despite the increasing number of women graduating in mathematics, a systemic gender imbalance persists and is signified by a pronounced gender gap in the distribution of active researchers and professors. Especially at the level of university faculty, women mathematicians continue being drastically underrepresented, decades after the first affirmative action measures have been put into place. A solid publication record is of paramount importance for securing permanent positions.

In recent years, a number of studies have introduced methods for identifying papers with delayed recognition (so called "sleeping beauties", SBs) or have presented single publications as cases of SBs. Most recently, Ke et al. (2015) proposed the so called "beauty coefficient" (denoted as B) to quantify how much a given paper can be considered as a paper with delayed recognition.

Acknowledgments are one of many conventions by which researchers publicly bestow recognition towards individuals, organizations and institutions that contributed in some way to the work that led to publication. Combining data on both co-authors and acknowledged individuals, the present study analyses disciplinary differences in researchers credit attribution practices in collaborative context. Our results show that the important differences traditionally observed between disciplines in terms of team size are greatly reduced when acknowledgees are taken into account.

The social network analysis of bibliometric data needs matrices to be recast in a network framework. In this paper we argue that a simple conservation rule requires that this should be done only using fractional counting so that conservation at the paper level will be faithfully reproduced at higher levels ofaggregation (i.e.

Twitter is among the commonest sources of data employed in social media research mainly because of its convenient APIs to collect tweets. However, most researchers do not have access to the expensive Firehose and Twitter Historical Archive, and they must rely on data collected with free APIs whose representativeness has been questioned. In 2010 the Library of Congress announced an agreement with Twitter to provide researchers access to the whole Twitter Archive.

Assessing the influence of a scholar's work is an important task for funding organizations, academic departments, and researchers. Common methods, such as measures of citation counts, can ignore much of the nuance and multidimensionality of scholarly influence. We present an approach for generating dynamic visualizations of scholars' careers.

In their study entitled "Constructing bibliometric networks: A comparison between full and fractional counting," Perianes-Rodriguez, Waltman, & van Eck (2016; henceforth abbreviated as PWvE) provide arguments for the use of fractional counting at the network level as different from the level of publications. Whereas fractional counting in the latter case divides the credit among co-authors (countries, institutions, etc.), fractional counting at the network level can normalize the relative weights of links and thereby clarify the structures in the network.

To provide users insight into the value and limits of world university rankings, a comparative analysis is conducted of 5 ranking systems: ARWU, Leiden, THE, QS and U-Multirank. It links these systems with one another at the level of individual institutions, and analyses the overlap in institutional coverage, geographical coverage, how indicators are calculated from raw data, the skewness of indicator distributions, and statistical correlations between indicators. Four secondary analyses are presented investigating national academic systems and selected pairs of indicators.

Improving software citation and credit continues to be a topic of interest across and within many disciplines, with numerous efforts underway. In this Birds of a Feather (BoF) session, we started with a list of actionable ideas from last year's BoF and other similar efforts and worked alone or in small groups to begin implementing them. Work was captured in a common Google document; the session organizers will disseminate or otherwise put this information to use in or for the community in collaboration with those who contributed.

With the growing amount of published research, automatic evaluation of scholarly publications is becoming an important task. In this paper we address this problem and present a simple and transparent approach for evaluating the importance of scholarly publications. Our method has been ranked among the top performers in the WSDM Cup 2016 Challenge.

PP(top x%) is the proportion of papers of a unit (e.g. an institution or a group of researchers), which belongs to the x% most frequently cited papers in the corresponding fields and publication years.

In this paper, we have identified and analyzed the emergence, structure and dynamics of the paradigmatic research fronts that established the fundamentals of the biomedical knowledge on HIV/AIDS. A search of papers with the identifiers "HIV/AIDS", "Human Immunodeficiency Virus" and "Acquired Immunodeficiency Syndrome" in the Web of Science (Thomson Reuters), was carried out. A citation network of those papers was constructed.

We present an analysis of citations accrued over time by cohorts of patents from specific technology sectors (e.g., Electrical and Electronic) granted by the

The OSCOSS project (Opening Scholarly Communication in Social Sciences), which will be outlined, aims at providing integrated support for all steps of the scholarly communication process. Incl. collaborative writing of a scientific paper, collecting data related to existing publications, interpreting and including data in a paper, submitting the paper for peer review, reviewing the paper, publishing an article, and, finally, facilitating its consumption by readers. Read More

This paper aims at providing a statistical model for the preferred behavior of authors submitting a paper to a scientific journal. The electronic submission of (about 600) papers to the Journal of the Serbian Chemical Society has been recorded for every day from Jan. 01, 2013 till Dec. Read More

This study is an attempt to build a contemporary linguistic corpus for Arabic language. The corpus produced, is a text corpus includes more than five million newspaper articles. It contains over a billion and a half words in total, out of which, there is about three million unique words. Read More

The field of web archiving provides a unique mix of human and automated agents collaborating to achieve the preservation of the web. Centuries old theories of archival appraisal are being transplanted into the sociotechnical environment of the World Wide Web with varying degrees of success. The work of the archivist and bots in contact with the material of the web present a distinctive and understudied CSCW shaped problem. Read More

Using percentile shares, one can visualize and analyze the skewness in bibliometric data across disciplines and over time. The resulting figures can be intuitively interpreted and are more suitable for detailed analysis of the effects of independent and control variables on distributions than regression analysis. We show this by using percentile shares to analyze so-called "factors influencing citation impact" (FICs; e. Read More

Today, full-texts of scientific articles are often stored in different locations than the used datasets. Dataset registries aim at a closer integration by making datasets citable but authors typically refer to datasets using inconsistent abbreviations and heterogeneous metadata (e.g. Read More

Scientific activity plays a major role in innovation for biomedicine and healthcare. For instance, fundamental research on disease pathologies and mechanisms can generate potential targets for drug therapy. This co-evolution is punctuated by papers which provide new perspectives and open new domains. Read More

As it happened in all domains of human activities, economic issues and the increase of people working in scientific research have altered the way scientific production is evaluated so as the objectives of performing the evaluation. Introduced in 2005 by J. E. Read More

Finding related published articles is an important task in any science, but with the explosion of new work in the biomedical domain it has become especially challenging. Most existing methodologies use text similarity metrics to identify whether two articles are related or not. However biomedical knowledge discovery is hypothesis-driven. Read More

Various studies have attempted to assess the amount of free full text available on the web and recent work have suggested that we are close to the 50% mark for freely available articles (Archambault et al. 2013; Bjork et al. 2010; Jamali and Nabavi 2015). Read More

Knowledge Organization Systems (e.g. taxonomies and ontologies) continue to contribute benefits in the design of information systems by providing a shared conceptual underpinning for developers, users, and automated systems. Read More