covid-19, featured, Notebook

Prediction in complex systems using agent-based models

Guest post by Corinna Elsenbroich & Gary Polhill

Should we ask people to stay at home during a pandemic?
Or just let the disease run its course?

The COVID-19 crisis forced governments to make difficult decisions at short notice that they then had to justify to their electorate. In many cases, these decisions were informed by computer simulations.

An advanced kind of computer simulation, known as agent-based modelling, proved particularly helpful in evaluating different options where it was used. In agent-based models, there is a virtual representation of an artificial population of human beings, each so-called ‘agent’ going about its simulated daily life, and, critically, affecting, and being affected by, other agents.

So, if one agent becomes “infected”, and spends too long near another agent not yet immune, then the computer simulation can “infect” the other agent. Furthermore, agent-based models can simulate social networks, families, friends, work colleagues, and take into account which people are likely to spend too long near another to transmit infections. Agent-based models can also simulate interactions with wider social environments. If one agent not wearing a mask finds themselves in an area where all the other agents are wearing masks, the simulated agent can decide whether to put their mask on (by allowing themselves to be influenced by the social norm), or remain mask-free (because their identity outweighs the norm, or because they cannot wear a mask for medical reasons).

Each agent has their own ‘story’, and the computer can simulate how these stories intertwine to form the narrative of the artificial population’s interaction with a communicable disease and measures to prevent its spread.

The pandemic was a vivid example of the challenges of governing complex systems. Complex systems are studied by scholars in various disciplines, including mathematics, physics, economics, sociology, computer science, geography, ecology and biology. They are fundamental to life, from the cellular to international relations levels, and as fascinating as they are challenging. The reasons why they are called ‘complex’ are the reasons that make them difficult to govern. Some of these reasons include:

  • They are ‘nonlinear’. Using some made-up numbers for the purposes of illustration, nonlinearity means that if a government spends £1Bn to save the first 100,000 lives, they might have to spend £5Bn to save the next 100,000, but only £500M for the 100,000 after that. Nonlinearity is challenging mathematically; a lot of ‘classical’ mathematics (including a 200-year-old algorithm now laughably rebranded as ‘machine learning’) assumes linearity. It is from nonlinearity that we get the concept of a ‘tipping point’: the difference in habitability between 1C and 1.5C of global warming is not the same as the difference between 1.5C and 2C.
  • They have ‘fat-tailed’ distributions. A mathematical law called the ‘central limit theorem’ is often used to justify assuming everything has a normal distribution. Because of this, a lot of statistics is focused on working with that distribution. In complex systems, however, the law of large numbers, on which the central limit theorem depends, does not always apply. Distributions can have ‘fat-tails’, meaning that the probabilities of extreme events are higher than if a normal distribution is assumed. Underestimating the probability of an extreme event is risky for a government, and potentially fatal to some of its population.
  • They are sensitive to local circumstances. Mathematicians call this ‘non-Markovian’ or non-ergodic, and again, find themselves unable to rely on a large body of work that can be applied very successfully when there is not such sensitivity. The practical outcome is that a policy that works in one place may not work in another.
  • They are not at equilibrium. Even now, for some ecologists and economists, the assertion that living systems are not at equilibrium is controversial. Systems apparently remaining in similar (or cycling) states is instead referred to in complex systems language as ‘homeostasis’. The important difference with equilibrium is that homeostasis requires energy, and so by definition is not at equilibrium. For example, your body tries to maintain its blood temperature at the same level (around 36.5C), but has different mechanisms to do this depending on whether the weather is hot or cold, and dry or humid. Mathematically, not being at equilibrium means that calculus becomes a less useful tool. For government, it may mean that after a perturbation, a society will not necessarily return to the way it lived before.
  • They are evolutionary. Complex systems can adapt, innovate and learn. This means that a measure that worked historically may not work now. Indeed, even the language used to describe what people do and how they differ can change. In medical circles, we no longer speak of ‘humours’ or ‘miasmas’, but of white blood cells, bacteria and viruses, and their mutations and variants.

Agent-based modelling grew out of studying complex systems as a way of helping scientists understand them better. But that has not led to the community of practitioners being as willing to use their agent-based models to make predictions. Quite the opposite, in fact. Many practitioners, on the basis of their understanding, regard prediction in complex systems as impossible, and point to other important and useful applications of agent-based models.

All these challenges to classical mathematics make prediction in complex systems much harder. Even those who don’t regard prediction as impossible use guarded language like ‘rough forecasting’, or ‘anticipated outcomes’.

However, claiming that prediction is impossible does not help the policy-maker decide what to do about a pandemic, nor to justify the expense and curtailment of liberties to the people. Worse, there is still a significant community of researchers quite willing to ignore complexity altogether, and to apply methods to make predictions and claim them as such that rely on assumptions that are false in complex systems. (In some circumstances, over short time periods, these methods can work because complex systems don’t always behave in complex ways.) Agent-based models have been argued to have an important role in helping people make decisions in complex systems.

It might be that agent-based modellers need to find ways of participating in discussions about governing complex systems, in circles where prediction is part of the narrative, while still being true to their understanding. Rather than remaining a taboo, prediction is something agent-based modellers need to face. In a special issue of the International Journal of Social Research Methodology, we have collected contributions that aim to open up a conversation about prediction with agent-based models. They reflect a diversity of opinion as varied as the backgrounds of people in the community of practitioners.

Our beleaguered global governments, wearily emerging from the pandemic, find themselves facing an escalated war in Europe, polarized societies, economic instability, persistent misinformation spread on social media, a sixth mass-extinction, and ever-more frequent extreme weather events. Each of these issues is complex, multidimensional and multi-scale, and any solution (including doing nothing) has uncertain, unintended, cascading consequences. If agent-based modelling can help with such challenging decision-making, then it should.

The full editorial Agent-based Modelling as a Method for Prediction for Complex Social Systems is freely available International Journal of Social Research Methodology

Corinna Elsenbroich is Reader of Computational Modelling in Social and Public Health Science at University of Glasgow. Follow @CElsenbroich on Twitter and read more research via ORCID

J. Gareth Polhill (known as Gary Polhill) is a Senior Research Scientist in the Information and Computational Sciences Department at The James Hutton Institute. Follow @GaryPolhill⁩ ⁦on Twitter and read more research via ORCID

featured, Notebook

What is Ragin’s Indirect Method of Calibration?

Guest post by Preya Bhattacharya

In the last few years, Qualitative Comparative Analysis (QCA) has become one of the most important comparative data analysis approaches in the field of social sciences. QCA is useful if you have a small to medium number of cases, you are collecting data from multiple levels, and trying to analyze the causal pathway through which a condition or combination of conditions impacts an outcome, cross-cases and/or across time.

In my doctoral dissertation on “Can Microfinance Impact National Economic Development? A Gendered Perspective,” I applied panel data QCA as my causal analysis method, because I had a small number of case studies, and I wanted to analyze how microfinance institutions impact the economic development of women at the level of the national economy, in the context of former Yugoslavian countries (Bhattacharya 2020). But, during my data analysis phase, I came across this concept of “calibration.” At that time, I did not have any formal training in QCA, and it was a little bit difficult for me to understand and apply this concept. So, through this research note, I hope to guide future researchers in their own work on QCA, by explaining the process of calibrating an interval variable.

In my article, I have first defined the concept of calibration, and then applied my own data to demonstrate the steps of data calibration. As described in my article, calibration can be defined as a data transformation process, in which researchers try to transform the data that they have collected into set-membership scores, of 0/1 (crisp-set), or a range of 0 to 1 (fuzzy-sets). This data transformation process helps a researcher interpret their data in the context of the cases studied, and it depends on a variety of factors. As a result, the process of calibration might differ from one variable to another, even within the same dataset.

To demonstrate this, I have described the data calibration process for an interval variable, that does not have established prior theoretical cut-off points. I have divided my calibration process into five main steps:

  • First, define the set
  • Second, visualize the data distribution
  • Third, identify qualitative changes in data distribution
  • Fourth, assign qualitative categories
  • Fifth, transform these categories into set membership scores, ranging from 0 to 1 (fuzzy set) or 0/1 (crisp)

Finally, I have discussed the issue of robustness in calibration, and how researchers can ensure that their calibrated data matches with the reality of the cases they have studied. By describing these steps, I hope to help future researchers in their own process of calibrating interval variables in QCA.

I would also like to thank the 2019 Southern California QCA Workshop, organized by Dr. Charles Ragin (University of California, Irvine) and Dr. Peer Fiss (University of Southern California), for introducing me to the world of Qualitative Comparative Analysis and set theoretic research approach!

Read the full article here

featured, Notebook

25th Anniversary Editorial appendix on methods and data

By Iasonas Lamprianou

One of the sections of the 25th Anniversary Editorial of the International Journal of Social Research Methodology (IJSRM), presents the thematic trends in published contributions, for the whole period of 25 years of the journal’s life. Investigating the thematic trends in published contributions was not an easy task, not only because of the huge number of published papers, but also because of various technical details (for example, the published papers were not accompanied by keywords in the first volumes). 

Coding and charting the thematic trends of published papers proved to be a very laborious task and the workload was shared between four researchers. The aim of this document is to help interested readers understand the nature and the structure of the dataset. This could be useful to those interested in extending our own analysis, which had to be confined in the limited space of an Editorial.  

Methodology 

It is important for prospective users of the dataset to understand how the published content was coded and how reliable the coding was. 

Three coders worked in parallel for three weeks, under the supervision of an experienced researcher. They coded each of the contributions published by IJSRM, not only in the first 25 issues, but also in the ‘latest papers’ section of the journal’s web page, which includes papers which have not yet been assigned to specific volumes/issues.  

The three coders and the experienced researcher, developed a coding scheme, which included all the necessary variables to be coded in an Excel file. Through long online meeting, the group discussed the aims of the coding exercise, the structure and content of the coding scheme etc. The group coded a number of common papers to confirm that they interpreted the coding scheme in the same way. Regular online meetings and email exchanges were necessary to discuss various issues which emerged and to keep the coders in sync. To make sure that the coders did not ‘drift’ over time, they were instructed to ‘blindly’ re-code 5%-10% of each other’s excel file incrementally (every few days). The coders were in communication all the time and they exchanged emails where they would update each other about coding difficulties in order to remain in sync. As a result of this procedure, various issues came to the surface (e.g. there were many papers which could not be easily categorized as Qualitative, Quantitative or Mixed, so a new category was created; more information later). 

When all the coding was completed, the experienced researcher re-coded blindly 50 random papers – around 5% of the total number of papers in the database – but no major discrepancies were detected (for example, in one case, the number of views was miskeyed as ‘867’ instead of ‘861’) .  

Overall, there is no reason to believe that there is widespread bias or errors in the data. We expect the dataset to give a fair interpretation of what has been published in the journal in the last 25 years of its life. 

Variables in dataset 

The dataset includes the following variables: 

Vol Volume 
No Issue number 
Title The title of the paper (no coding, it was just copied and pasted) 
Abstract The abstract of the paper (no coding, it was just copied and pasted) 
Keywords The keywords of the paper (no coding, it was just copied and pasted) 
Paradigm Main research paradigm. Takes four values: Qualitative, Quantitative, Mixed Methods, General/Other.  Note: The General/Other category refers to papers which cannot be described accurately by the three other codes (Qualitative, Quantitative, Mixed Methods) 
Views Number of views (as reported on the journal’s web page) 
CrossRef Number of CrossRef citations (as reported on the journal’s web page) 
Altmetric Altmetric count (as reported on the journal’s web page) 

Data filtering 

The original dataset consisted of 1043 records, but book reviews, editorials and other small items were removed, resulting to a ‘clean’ dataset of 924 published papers (including ‘Research Notes’). 

Dataset format 

The dataset is provided as an R data frame, with the name EditorialData.Rda

You can download the data files here.

featured, Notebook

A virtual collection to celebrate 25 volumes of IJSRM 

The International Journal of Social Research Methodology is celebrating its 25th anniversary!!  In that period, we have become a leading methods journal publishing high quality contributions across the methodological field – both qualitative and quantitative, and including mixed, multi-media, comparative and simulation methods. 

To mark the occasion, we have gathered together a series of methods discussions that have been published in the journal, and our publisher, Routledge, is making them freely available as a collection.  

Choosing which articles to include in our anniversary virtual collection was a hard task.  We inevitably had to leave some important and favourite pieces aside.  The collection below includes contributions that we felt represented the range of methodological articles that we publish in IJSRM, a selection of early career prize winning articles, influential pieces and discussions that deserve more attention for their contributions, and individual editors’ personal choices.  

The methodological reach of our anniversary, then, ranges across survey non-response, behavioural experiments, quantitative pedagogy, the Delphi method, the problem-centred expert interview, the self-interview, narrative and computerised text analysis, qualitative methods transformations, anonymisation, triangulation of perspectives, indigenous data sovereignty, post-humanism, and researcher safety. 

We hope that you enjoy our selection. You can access it at:  

https://www.tandfonline.com/journals/tsrm20/collections/TSRM_25th_Anniversary_SI

Rosalind Edwards, Jason Lamprianou, Jane Pulkingham, Malcolm Williams 

IJSRM Co-Editors