Download Official Program and Abstracts (PDF, 1.1 Mb)


Minutes from the workshop 

Session: Bibliometrics

Rüdiger Mutz: How to use bibliometric data to rank universities according to their research performance

  • Bibliometric data as an objective and reliable base for measuring performance, but usually come with many problems
  • Rankings: statistical models needed to differentiate between random and real fluctuations
  • Percentile-based measures
  • Covariate-adjusted version of Leiden ranking
  • Statistical perspective on data has serious consequences for rankings
  • Choice between full and fractional counting not trivial, e.g.
  • The quality of data needs to be improved: more emphasis on data cleaning and disambiguation
  • Different publication types–proceedings, journals, books–need to be uniformly covered in bibliometric data to have a better basis for comparison of different fields and measurement of interdisciplinary impact
  • More attention towards highligting the differences of citations meaning in different fields needed
  • Citations are not the only measure of quality, which is a general practice when ranking universities
  • A counter-example of Netherlands, where evaluation of research projects incorporates social impact of research
  • Performance of research groups and individuals could incorporate resources available, i.e. measures of productivity and not total output.


Robin Haunschild: Publications, Citations, Mentions

  • Assessment of peers and citations correlate positively
  • Citation cultures differ very much across disciplines
  • Reason for citation
    • Intellectual debt, acknowledgment of knowledge transfer
    • Influence on the manuscript
    • Reciprocal citations
    • Well-known person
    • Potential reviewer
  • Field-specific discrepancy between references in the paper, and linked references in bibliometric databases
    • Humanities have many references outside bibliometric databases, that are not counted
  • Normalization of citation counts based on citing side or cited side? For humanities, the citing side may be best strategy
  • Larger coverage of scientific literature can possibly solve the issues coming from differences in citation cultures
    • Coverage in databases is particularly difficult for social sciences, due to difference in forms of scientific communication
    • Geographic differences in citation cultures
    • Different publication languages not covered homogeneously
  • Effects of publication format/practical restriction of citations by journal format
    • Values of citations differ based on number of references in the citing paper


Dirk Tunger: Bibliometrics

  • Attention is what we want to measure
  • Best case: citation indicates knowledge transfer
  • Measuring interdisciplinary impact: J-factor (normalization at the journal level)
  • Possible potential in linking altmetrics and bibliometric data
  • Institutions may actually not be interested in the attention that someone gets, but in the output that is produced given the funding provided
  • The mean should not be used with bibliometric data, as is not an appropriate measure of "average"–and often is not defined–for broad distributions, which are commonly met
  • "Altmetric" is open to provide data
  • In many contexts related to measuring scientific performance we need correct for attention, and not take it as a proxy of performance.


Stakeholder Session: Ranking

Sonja Berghoff: CHE and rankings

  • CHE ranking for German universities
  • U Multirank: project led by consortium of CHE and Leiden CTWS
  • CHE approach to rankings
    • No enumerated ranking list, but ranking groups (e.g. "top", "average", "low")
    • Field-specific, not universal rankings
  • Multivariate ranking, no weighting and aggregation of factors
    • Publications, research grant, reputation survey as factors
  • Awareness about the risks of reputation surveys: visibility bias, geographic scope; However, high response rate (professors surveyed)
    • Collect all professors at universities


Martin Juno: University Rankings – weightings and bias

  • QS: Rankings by subject, region, aspects
  • Same focus, different methodology
  • 3500 universities ranked, top 850 of them are shown to public
  • Four general factors considered
    • Research, teaching, employability, internationality
  • Weighting factors
    • 40 % academic reputation (survey based)
    • 10 % employer reputation (survey based)
    • 20 % citations per faculty
    • 20 % faculty students
    • 5 % intl faculty
    • 5 % intl students
  • Bias in the ranking
    • Heavily survey based
    • Research-centric
    • Reliance on self-reported data (however, limited)
    • Cultural (e.g. different levels of directness)
  • Academic reputation surveys (QS academic reputation data set) provided
  • In order to interpret results, institutions need to get insights into the methodology
  • Some ranking providers are more open than others
  • Reinforcing feedback of rankings to reputation surveys is probably pronounced
  • Variability in data has to be taken into account
    • Australia case study: ranking not possible, because the variability of opinions within the institutions is higher than across them


Nees Jan van Eck: CWTS Leiden Ranking: An advanced bibliometric approach to university ranking

  • CWTS: bibliometric contract research
  • VOSviewer: visualising scientific landscapes
  • LeidenRanking
    • Exclusively based on bibliometric indicators for scientific citations and collaborations, no surveys
    • Based on Web of Science data
    • Multiple dimensions, no aggregation
    • Focus on scientific performance
  • Bibliometric approach
    • Fractional counting of co-authorship data
    • Percentile-based indicators to deal with highly skewed distributions
    • Field normalization
  • Percentile-based indicators is more stable average-based indicators
  • Exclusion of non-core publications (about 1/6 of the total number of publications)
    • Non-English
    • Retracted publications
    • National level scientific journals, trade journals, popular magazines
    • Publications in fields with low citation density
    • Proceedings
  • Full vs. fractional counting: full counting is biased towards fields with lots of collaborations
  • Known: collaborative publications tend to be cited more often than non-collaborative publications
  • Use of appropriate bibliometric methodology is important
    • Percentile-based indicators
    • No blind use of databases, removing non-core journals
  • The weights for fractional counting determined based on the number of different institutions involved
  • Fractional counting at the individual level not recommended
    • At the aggregate level some effects cancel out, that may be a problem at the individual level
  • Extensive manual cleaning of affiliation data performed
  • Thomson Reuters, the owner of Web of Science, works with universities to introduce canonical affiliation names
    • So far for 4800 university names disambiguated


Session: Social sciences

Peter van den Besselaar: Bibliometrics beyond rankings

  • Bibliometric indicators have become popular as incentives
  • Actual question is: What questions can be answered based on citation data?
  • Finding in social sciences (2008-2010): decisions do not correlate with bibliometrics
    • Many false positives, low predictive validity
  • ER Council study (2014-2015): ER Starting Grant Review example
    • Important research outcomes
    • Impressive h-index
    • Many citations
    • But: publication venues not very prestigious, so rejected
    • No clear difference of citation distribution between successful and non-successful candidates
  • Real questions are at the systems level
    • How to organize the research system to maximise the performance
    • What measures/incentives should be used?
  • Interesting questions that can be addressed
    • Relation between organizational form and performance
    • Impact of funding ecology on performance
    • Talent recognition and selection
    • Identification of new fields
  • Example 1: What incentives to introduce?
    • Simonton on scientific creativity: creative people try a lot, the more they try, the higher the chance to achieve a breakthrough
    • Existing evidence: most productive authors have the largest share in most cited papers
  • Example 2: What is the role of competition
    • Share of competition-based funding negatively correlates with performance (sample of 14 countries)
    • Level of autonomy of the university negatively correlates with performance (sample of 9 countries)
    • Level of academic freedom reported positively correlates with performance (sample of 8 countries)
  • Talent selection
    • Independence, impact, productivity
    • Low overlap with the supervisor's collaboration network and research agenda


Flaminio Squazzoni: Competition, serious “gamification” and scientist misbehaviour

  • Introduction to PEERE COST Action
  • Simmel: money and prices important trigger for the emergence of rationality in western societies
  • Science: Reputation and citations play the role of money and prices
  • Book by Robert Merton: The Sociology of Science
    • Reputation is only productive, if competitive spirits are constrained by strong social norms
    • Attention as a scarce resource: Signalling mechanisms like high-impact journals can help to avoid coordination problems
  • Number of retractions increases over time
  • "Rankings are natural social artefacts"
  • Independent of ranking quality, it always has consequences
  • Science becomes a "serious games"
  • "In times of scarce attention the "rankitude" could bring people to easy, broad-tent view conclusions about value of people independently of context and situations"
  • Indicators are used both in intended and unintended ways
  • Misbehaviour of reviewers can take very creative forms: not only nonobjective rejections, but prolongation of the review process and other ways to keep control over the paper
  • While metrics become more popular, they influence the dynamics of science, but this influence can be beneficial in certain cases
    • In Italy, associations played a major role previously
  • The system of science needs to be steered towards beneficial external motivations for scientists
  • Mechanisms reinforcing the social norms that restrict the "competitive spirits" needed


Judit Bar-Ilan: Altmetrics - Alternative metrics

  • Definition of altmetrics: supplementary measures
  • Highest-ever Altmetric rank: "Experimental evidence of massive-scale emotional contagion through social networks", PNAS vol. 111 no. 24
    • Controversial paper on Facebook experiments
  • Wikipedia included in Altmetric score
  • Sources of altmetrics
    • Mendeley
    • CiteULike
    • Blogs
    • F1000Prime
    • PubPeer
    • ResearchGate,
  • Readership counts (Mendeley) vs. citations
    • Medium strength positive correlation with citation counts
  • Suggestion made to Altmetric to not report an aggregate score
  • The real implication and usage of altmetics for measurements in science not clear yet
  • Measures incorporating altmetrics have to very robust, as these are prone to manipulations, e.g. by means of link farms
  • Altmetrics don't measure social impact, but more the social media attention, which cannot be used as scientific advancement measure


Session: Computer Science 

Filippo Radicchi: Dynamical graph-based impact metrics

  • Scientific motivation: data on large social system
  • Practical motivation: research evaluation
  • Example: Italian National Scientific Qualification
    • Number of papers (divided by academic age)
    • Number of citations (divided by academic age)
    • Contemporary h-index
  • For individual fields: median values are computed
  • Network structure of citation data is often neglected in research evaluation
  • Examples of network based measures: CiteRank, Eigenfactor
  • Weighted citation network of author-author citations
    • Weighted by out-degree of a paper in the citation network
    • Physical Review database
  • SARA: Science Author Rank Algorithm (PageRank + diffusion equation)
  • Career trajectories of real scientists (Nobel laureates)
  • Validation
    • Relative scores of metrics vs. prizes
    • Table of best ranked physicists 1976/2004 vs. prizes won
  • Interesting to retrospectively compare predictions of SARA and the prizes won later by the top in the ranking
  • The approach of using historical data for predictions assumes that the dynamics don't change which is not necessarily the case
  • Yearly contest idea: Predict scientific prize winners based on scholarly citation data


Martin Rosvall: Machine learning for robust rankings

  • Zero- vs. first- vs. second-order Markov models for journal-journal citations
  • Second-order Markov models particularly useful for interdisciplinary fields
  • Good ranking has to be robust: removing journals should not significantly change the ranking of the remaining journals
    • Second-order Markov models lead to more expressed rankings and are more robust
    • Robustness comes at the cost of additional data
    • Second-order models have higher predictive power
  • visualization tool for information space navigation
  • Zero-, first-, second-order Markov models can complement each other, thus can have higher predictive power when used together


Ingo Scholtes: The social dimension of citation networks

  • "Science is done by people", Heisenberg
  • Social system of scientists
  • Complex social mechanisms leading to information filtering, thus shaping the citation network
    • Social cognition mechanism
    • Attention mechanisms
    • Trust
    • Reputation
  • Conference community case study
    • Small community around specific topic, thus homogeneous citation network expected
    • However, high correlation between collaborations and citations
  • Hypothesis: authors importance in the collaboration network is indicative for the citation success of the papers in the network
    • Dynamic collaboration network (Microsoft Academic Search data on Computer Science)
    • Two-year rolling window aggregates of the network
  • Predicting percentile-based citation success based on collaboration centrality
  • Random forest classifier based on the centrality
  • 6 times better precision then random
  • Low recall which is good, as otherwise your success would be just described by your position in the collaboration network
  • Citation networks mainly have semantic meaning, but also social component
    • Social effects can contribute to "self-fulfilling prophecy" effect
  • Productive author probably has more collaborations (thus, more chances to be more central) and higher chance to get a paper to be in top 10% by random
  • Self-citations are considered in the study, excluding them can be considered
  • The citation success is predicted for a paper, and the more central author contributes to the prediction
    • Corollary is that a young researchers citation success is dependent on the PI's centrality, if the latter is in the authors list
  • The model can be of interesting for journal editors, for the data providers to check this social bias based on their data.
  • The levels of social bias among different fields can be compared
    • For computer science (used in the study) it is expected to be higher social bias is higher, than in e.g. physics due to the conferences (personal meetings) and conference proceedings being the main types of communication.


Stakeholder Session: Data

Evangelia Lipitakis: Quaitnfying scientific impact using the citation network

  • Evolution of citation analysis and ISI Thomson Reuters
  • Increasing use of citation analysis by corporate sector
  • 58 million records
  • 900 million citations
  • 3000 journal submissions per year with 10% acceptance rate
  • Team of full-time editors deciding about new journal submissions
    • 27000+ journals indexed by the WoS platform
    • Extended coverage to be added soon
  • What we can and should be measured
    • productivity and impact: patents, h-index, fractional counting, etc.
    • normalization: category normalization, journal normalized citation impact, etc.
    • top performance and excellence: hot papers, top 1% or 10%, baselines, etc.
    • collaborations: international, etc.
    • InCites: Indicators Handbook
  • Request for a standard test data set with disambiguated and normalized institution names for research purposes
  • From a data scientist's perspective not easy to decouple the effects from sudden addition of large numbers of journals from the changes in the systems dynamics, thus providing data on the journal inclusion time can be helpful
  • About 3,000 OA journals indexed
  • A work in progress showing the citation differences between open access and subscription-based journals
  • PLoS One: 83000 articles between 2004-2013
  • Normalization on journal level subject categories has the disadvantage when considering journals with broad topic coverage
  • The historical data for newly included journals also indexed


Martijn Roelandse

  • Open access has positive effect on impact factor
  • Rise of container Journals: PLoS One
    • Acceptance rate 85 %
    • 2013 Impact Factor: 3.54
  • Impact factor for such journal not meaningful due to the differences in disciplines
  • Average citations per discipline very different
  • Article-level metrics more important than impact factor of the journal
  • Research Evaluation Framework
    • Impact is defined as effect on society, academia, quality of life
  • 1:AM 2014 the first altmetrics conference
  • 2:AM 2015 expected in September 2015
  • Estimation that 80% will be open access
  • Possible emergence of two main journal categories: big container journals versus flagship journals
  • Although the impact factor for container journals is not meaningful, it has an influence on behaviour; People from low average IF fields jump onto wagon of PLoS to benefit from the IF coming from life sciences
  • Although download data possibly has potential for quantifying impact, information on total downloads not trivial to obtain, as usually the publications are hosted on multiple services with different data sharing policies.
  • Social media such as Twitter can be considered as a new fast communication channel for science, but at the same time it can and is used for "advertising" purposes


Session: Statistical Analysis of Science Networks 

Matus Medo: Temporal patterns in social and information systems

  • Generalization for the preferential attachment model for citations
    • Growing networks with fitness and aging, Phys. Rev. Lett. 107, 238701
    • Additional intrinsic quality + aging factor describing a decrease of relevance over time
    • In the model, paper popularity grows exponentially with quality
    • Thus, quality depends logarithmically on popularity
  • Modification of PageRank, to correct for temporal biases
  • Classification of users as leaders or followers in social information systems
  • Implicit assumptions of centrality measures do not seems to be justified in scientometric data
  • Altmetrics are shallow
  • We can avoid some of the problems, by building only on the community structure of science
  • Leader detection method can be applied to detect authors cultivating interdisciplinarity, facilitating knowledge flow between disciplines
  • For a better evidence for the fitness model, attempts to consider the fitness parameter as external and estimate it outside of the model's scope


Olesya Mryglod: The downloads as a measure of attractiveness of scientific publication

  • Goodhart's law: "When a measure becomes a target, it ceases to be a good measure."
  • Multiple dimensions of scientific impact
    • Popularity, prestige, attractiveness
  • Correlation between downloads and citations
  • Download pattern for open-access and subscription based journals qualitatively similar
  • Classification of paper by burstiness and overall attractiveness
  • Deeper investigation of relation between downloads and citations


Alexander Petersen: Quantifying growth trends in science careers with applications in bibliometric analyses

  • How fast is science changing?
  • Quantifiable patterns of scientific success? Unintended consequences of bibliometric measures?
    • Budget doubling of NIH in late 1990s resulting PhD bubble
    • Shifts in co-author numbers over last decades
    • Ways of rewarding scientists have to adapt to these changing trends
    • Microscopic career growth dynamics and potential pitfalls in the forecasting of careers
  • Flaws in Nature 489, 201–202 addressed
    • Aggregating across different career ages
    • Artificially large R2 value in the correlation, simply because H-index is non-decreasing!
  • Splitting careers into stages: early, postdoc, tenure track
    • Predictive power for H-index for each of these cohorts is low
  • Interacting networks of reputation flows
    • Author-specific factors matter for citation dynamics (reputation effects)
    • Reputation effect is strong for papers with small citation numbers
    • Impossible to get a highly-cited paper by reputation alone
    • 66 % increase in citation rate for each unit increase in reputation
  • Analyzing dynamic ego collaboration networks
    • Rapid accumulation of co-authors after publication of influential papers
    • Life partners in terms of collaborations (super-ties) have a positive effect on productivity


Plenary discussion

Predicting success in scientific careers: What are the clues?

  • Important to understand the focus, depending on the question:
    • is it to confidently rule out the worst cases at the cost of also ruling out some better cases (many false negatives)
    • or is it confidently identifying the very best at the cost of getting also average candidates (many false positives)
    • more generally, finding the right balance between the two is important and not trivial
  • There is a social component in the scientific career success, which needs to be subtracted when predicting the success as scientific impact
  • The distribution of talent is very heterogeneous, even more the distributions of citations/funding – the differences have to be accounted when comparing them or using ones to predict the outcome of the others
  • Definition of the successful scientist: ideally that is an influential researcher, in the sense of being a generalist, inspiring other scientists and fields
    • Quantifying such qualities is a challenge but needed


Sensitivity for manipulation:

  • It is easier to target a scalar indicator and optimize the behaviour for maximizing that indicator (manipulation)
  • However, if an indicator is not an aggregate, but comprises multiple carefully selected, independent dimensions, it is much harder to maximize it by manipulation
    • In the ideal case the only way to maximize it will be the intended behaviour, i.e. the indicator will serve its purpose
    • Higher the number of dimensions that the indicator comprises, harder to manipulate it
  • Creative ways of manipulation
    • Think about a scientist at the saturation point, generating a project with wide visibility to boost his impact
    • Manipulation by adding additional feedback layers of attention: altmetrics is one example for such an additional layer
  • Distortion of research findings and messages in public media
  • Robust measures are especially crucial to have when considering that only a small fraction of scientists who adopt the manipulative bahaviour is needed to undermine the social norms of "proper" behaviour



  • The question of what to measure for interdisciplinarity evaluation needs to be answered before the question "how".
  • Going beyond measures based on mere counting, looking at the network is particularly crucial
  • Interdisciplinary impact is particularly prone to manipulation
    • One can try to sell trivial things from their discipline to other disciplines as a breakthrough
  • Research on predicting the emergence of new scientific fields is currently being conducted


Measures, indicators for decision-support

  • The choice between simple, human-readable, but inaccurate measures and more complex, more accurate measures
  • Risk of using indicators to shift the decision responsibility from humans to machines exists
    • Using the indicators for decision support, versus letting machines take decisions
  • Importance of conflicting goals/"frustrated" problems
    • There is not the ideal option/candidate
    • This allows differences and different preferences be viable
  • The importance of focus also applies here:
    • Confidence in ruling out the worst at the cost of ruling out the not that bad ones (false negatives)
    • Confidence in identifying the best at the cost of also taking not that good ones (false positives)
  • Transparency of a measure can also be in the methodology (e.g., clearly defined machine learning algorithm with multiple dimensions), rather than in having a simple numeric measure.
  • Often it is important not show aggregate numbers, because their interpretation needs knowledge of statistics
  • Understanding and knowing what data is used behind measures is as important, as knowing and understanding the method behind these measures
    • This is a relevant issue, as the same measures are different not only across different data providers, but sometimes also for the same data provider in different contexts (e.g., the measure they provide is different from the independently calculated one, based on their data)