#TwitterBan and speaking into existence: Evaluating migration intentions in Nigeria

rstats
tutorial
tidyverse
rtweet
migration
Tweeting patterns in response to the Twitter ban in Nigeria partly reflects migration flow from Nigeria.
Author

Emmanuel Olamijuwon

Published

February 22, 2022

Modified

June 20, 2023

Migration Intention

Introduction 😎

About a year ago, the Federal Government of Nigerian prohibited the use of Twitter for official purposes. This was based on the micro-blogging platform’s reaction to the president’s tweet, which sought to treat members of IPOB in the language they understand.

Subsequently, several national carriers in the country blocked access to the microblogging platform. Meaning that users could no longer access Twitter. To circumvent this restriction, Twitter users had to use a VPN in Nigeria.

VPN (a virtual private network) is a technology that encrypts your internet traffic on unsecured networks to protect your online identity, hide your IP. A virtual private network extends a private network across a public network and enables users to send and receive data across shared or public networks as if their computing devices were directly connected to the private network.

Many VPNs allow users to connect to any of their servers in a limited number of countries, including the United Kingdom, United States, Canada, and others. This implies that Twitter users in Nigeria can access Twitter “from the UK, US or Canada”.

Code
library(tweetrmd)
tweet_embed(tweet_url("1206638885553479680", "1401081942775406597"))
Code
tweet_embed(tweet_url("24291371", "1401098169966989313"))

Soon, Nigerians started disclosing where they were tweeting from in response to an initial tweet. For many, this just meant the country server they were connected to, while for others, a response in solidarity to the government’s position on the ban and a prophecy for where they wish to be - “speaking into existence” for others.

Code
tweet_embed(tweet_url("4076510429", "1401107325671333891"))

Several users responded to this tweet. However, it is not clear whether this information could reliably model known migration and migration intention patterns known previously. Multiple studies in the past have explored the potential use of digital traces as viable compliments to traditional data sources.

Here, I analysed the interactions of Nigerians on Twitter regarding where they were tweeting from with data from Twitter. I subsequently compared the trends from these data with data from the United Nation (UN) migration stock data. The UN migration stock data includes estimates of in-and-out migration by the country of origin and the country of destination.

Code
    library (tidyverse)   # Data wrangling:: 
                            # dplyr::select(), filter(), mutate, arrange(), 
                            # dplyr::ungroup(), arrange(), left_join()
                            # stringr::str_replace_all(), str_remove(), 
                            # tidyr::pivot_longer(), pivot_longer()
    library (readxl)      # Read excel files
    library (lubridate)   # Manipulate and parse dates
    library (rtweet)      # Collect and analyze Twitter Data
    library (tidytext)    # Tidy text-data analysis
    library (showtext)    # Importing fonts:: showtext_auto()
    library (ggpubr)      # Merging graphs
    library (gganimate)   # for animating images:: transition_manual(), animate(), 
    library (gifski)
Code
    showtext_auto()
    if ("BarlowCondensed-Light.ttf" %in% list.files("C:\\Windows\\Fonts")) {
      
      font_add("BarlowCondensed-Light", "BarlowCondensed-Light.ttf")
      axis_text <- "BarlowCondensed-Light"
      
    } else {
      
      font_add("ARIALN", "ARIALN.ttf")
      axis_text <- "ARIALN"
    
    }

    ## Caption Text
    if ("BarlowCondensed-Medium.ttf" %in% list.files("C:\\Windows\\Fonts")) {
      
      font_add("BarlowCondensed-Medium", "BarlowCondensed-Medium.ttf")
      title_text <- "BarlowCondensed-Medium"
      
    } else {
      
      font_add("ARIALNI", "ARIALNI.ttf")
      title_text <- "ARIALNI"
    
    }

    ## Graphics Title
    if ("AlegreyaSans-MediumItalic.ttf" %in% list.files("C:\\Windows\\Fonts")) {
      
      font_add("AlegreyaSans", "AlegreyaSans-Italic.ttf")
      caption_text <- "AlegreyaSans"
      
    } else {
      
      font_add("Candara", "Candara.ttf")
      caption_text <- "Candara"
    
    }

Data Source 💾

This analysis uses a corpus of tweets that includes the keyword “tweeting from”. The tweets were retrieved directly from Twitter on June 7 using the script below via Twitter API.

Code
tweet_NG <- search_tweets(q = "\"tweeting from\"",
                          n = 50000,
                          geocode = "-2.791166,12.094754,2258mi",
                          include_rts = FALSE,
                          type = "recent", 
                          retryonratelimit = TRUE)

Due to API-related restrictions, only tweets posted within the last 8days can only be retrieved via the API. As a result, the composition of tweets retrieved now may exclude those analysed in this work. UserIDs and screen names have also been removed from the tweets. As a result, I have saved the data on Github and can be accessed using the script below.

Code
tweet_NG <- read.csv("https://raw.githubusercontent.com/eolamijuwon/datasets/main/MigTwitter.csv")

To further compare the patterns observed in the tweets with traditional data sources, data on migration stock was also retrieved from the UN website. The UN migration stock data includes estimates of in-and-out migration by the country of origin and the country of destination. The data can be retrieved and imported into R directly using the download.file() function.

Code
download.file(url = "https://www.un.org/en/development/desa/population/migration/data/estimates2/data/UN_MigrantStockByOriginAndDestination_2019.xlsx",
              method = "curl",
              destfile = "UN Migration.xlsx")

migration_UN <- read_xlsx("UN Migration.xlsx",
                          skip = 15,
                          sheet = 2)

Data Wrangling - Twitter 👨‍💻

I leveraged multiple approaches in cleaning the data.

  • I restricted the data to tweets posted between June 5 and June 7 because this was the period that most people posted about tweeting from Canada using a VPN.

  • Each tweet is a combination of multiple words. As a result, I sub-divided the tweets into trigrams (sentences with just three words) and subsequently created different columns for each word in the trigram.

  • Furthermore, I replaced words unrelated to the analysis with NA. This includes words that include numbers as this were unlikely to be the name of a country or a location. I also excluded other words such as https, opera, keepiton and others.

  • I also filtered the data for rows with valid words in either of word2 or word3 columns.

Code
updt_tweet <- tweet_NG %>% 
              filter (created_at >= as.Date("2021-06-05") &
                        created_at <= as.Date("2021-06-07")) %>% 
              select (status_id, created_at, text, screen_name) %>% 
              unnest_tokens(bigram, text, token = "ngrams", n = 3) %>% 
              separate(bigram, c("word1", "word2", "word3"), sep = " ") %>% 
              filter (word1 == "from") %>% 
              mutate (word2 = replace(word2,
                                      which(str_detect(word2,
                                                       pattern = "[[:digit:]]+[[:digit:]]+") |
                                            str_detect(word2, 
                                                       pattern = "(.)\\1{3,}") |
                                            str_detect(word2, pattern = "\\b(.)\\b")),
                                      NA),
                      word3 = replace(word3,
                                      which(str_detect(word3,
                                                       pattern = "[[:digit:]]+[[:digit:]]+") |
                                            str_detect(word3, 
                                                       pattern = "(.)\\1{3,}") |
                                            str_detect(word3, pattern = "\\b(.)\\b")
                                            ## removes any remaining single letter words
                                            ),
                                      NA)) %>% 
              mutate (word2 = replace(word2,
                                      which(word2 %in% stop_words$word),
                                      NA),
                      word3 = replace(word3,
                                      which(word3 %in% stop_words$word),
                                      NA)) %>% 
              mutate (word2 = replace(word2,
                                      which(str_detect(word2,
                                                       pattern = "https|opera|abeg|abi|keepiton|zoo|accessing|biko|jack|lol|country|countries") |
                                            str_detect(word2, pattern = "twitterban|echoke|guy|vpn|twitter|tweeting|tundeeddnut") |
                                            str_detect(word2, pattern = "trolls_queen|thankgodforvpn|thunder|timothymutuake|sm") |
                                            str_detect(word2, pattern = "someplace|sophie|sos|sir|sire|9mobilengcare|9mobileng|3rd")),
                                      NA),
                      word3 = replace(word3,
                                      which(str_detect(word3,
                                                       pattern = "https|opera|abeg|abi|keepiton|zoo|accessing|biko|jack|lol|country|countries") |
                                            str_detect(word3, pattern = "twitterban|echoke|guy|vpn|twitter|tweeting|tundeeddnut") |
                                            str_detect(word3, pattern = "trolls_queen|thankgodforvpn|thunder|timothymutuake|sm") |
                                            str_detect(word3, pattern = "someplace|sophie|sos|sir|sire|9mobilengcare|9mobileng|3rd")),
                                      NA)) %>% 
            mutate (word3 = replace(word3,
                                    which(!is.na(word2) & 
                                          str_detect(word3, "nigeria")),
                                    NA)) %>% 
            filter (!is.na(word2) | !is.na(word3))

                      #!word2 %in% stop_words$word)

Due to the possibility of having two-words country names such as (United Kingdom), I merged word2 and word3 for instances where both columns are not missing while the column with a valid word is retained for instances when only one of word2 and word3 had a valid word.

Code
tweet_GEO <-  updt_tweet %>% 
              mutate (country = ifelse((!is.na(word2) & !is.na(word3)),
                                       paste0(word2, " ", word3),
                                       coalesce (word2, word3))) %>% 
              mutate (country = str_to_sentence(country))

During data exploration, I observed that some reported city names (such as London) as the location for tweeting. As a result, I created a comprehensive list of countries, their popular cities and re-group all locations to the national level. This was done for the United States, United Kingdom, Netherlands, Nigeria and others. I also carefully reviewed all the locations reported to identify those that may have been incorrectly spelt (such as Landon, Nethaland, etc).

Code
clean_tweet <- tweet_GEO %>% 
               mutate (country = replace(country,
                                        which(str_detect(country,
                                                         "Alabama|Alaska|Arizona|Arkansas|California|Colorado|Connecticut|Delaware|Florida|Georgia|Hawaii|Idaho|Illinois|Indiana|Iowa|Kansas|Kentucky|Louisiana|Maine|Maryland|Massachusetts|Michigan|Minnesota|Mississippi|Missouri|Montana|Nebraska|Nevada|Hampshire|Jersey|New Mexico|York|Unitedstates|Vegas|Carolina|Dakota|Ohio|Oklahoma|San|Oregon|Pennsylvania|Rhode Island|South Carolina|South Dakota|Tennessee|Texas|Utah|Vermont|Virginia|Washington|West Virginia|Wisconsin|Wyoming|Seattle|Miami|Boston|Chicago|Nyc|Los|Las|Houston|Losangeles|Atlanta|America|Dellas|Dallas|Usa|Brooklyn|usa|carolina|mississippiTwittingfromatlanta|florida|Minneapolis|Newyork|Manhattan|Maimi")),
                                        "USA")) %>% 
              mutate (country = replace(country,
                                        which(str_detect(country,
                                                         "Amsterdam|Netherland|Nederland|Nethaland|Neitherland|Nertherland|Meppel")),
                                        "Netherlands")) %>% 
  
              mutate (country = replace(country,
                                        which(str_detect(country,
                                                         "Buckingham|London|Manchester|Custard|England|Uk|Southampton|Unitedkingdom|Landon|Glasgow|Scotland|Northampton|Birmingham|Westminster|United kingdom|Britain|london|United kingdon|United kindom")),
                                        "United Kingdom")) %>% 
  
              mutate (country = replace(country,
                                        which(str_detect(country,
                                                         "Uar|Abj|Abk|Aba|Asaba |Arewa|Ibadan|Aso|Asorock|Abeokuta|Abuja|Adamawa|Nigeria|Ikorodu|Ado|Agege|Akoka|Akure|Lagos|lagos|Egbeda|Ikeja|Akurehowfar|Akwaibomtwitter|Naija|Portharcourt|Osun|Osogbo|Ogun|Yenagoa|Ibarapa|Calabar|Ife|Edo|Ebonyi|Eastern|Fmicnigeria|Oye|Katsina|Niegria|Nigeriagov|Lokoja|Lafiaji|Sokoto|Nigerians|Oduduwa|Uyo|9ja|Shomolu|Unitedafricanrepublic|United african|Zaria|Stateofosun|Sagamu|Mowe ibafo|Ogbmosho|Ogbomosho|Ogbomoso|Oke ode|Olaiya strt|Oluyole town|Ondo|Onitsha anambra|Ota|Makurdi|Lekki|Kaduna|Kano|Jos|Ekiti|Enugu|Akwa ibom|United africa|Port harcourt|Maiduguri|United aafrican|United adamugaba|United arewa|United ede|United epe")),
                                        "Nigeria")) %>% 
  
              mutate (country = replace(country,
                                        which(str_detect(country,
                                                         "Alberta|British Columbia|Manitoba|ontario|Montreal|Ontario|Quebec|Saskatchewan|Toronto|Vancouver|Canada|canada")),
                                        "Canada")) %>% 
        
              mutate (country = replace(country,
                                        which(str_detect(country,
                                                         "Paris|paris|France")),
                                        "France")) %>% 
  
              mutate (country = replace(country,
                                        which(str_detect(country,
                                                         "Instabul|Istanbul")),
                                        "Turkey"),
                      country = replace(country,
                                        which(str_detect(country,
                                                         "Germany|germany|Frankfurt|9jagermany|Berlin|Stutggart")),
                                        "Germany"),
                      country = replace(country,
                                        which(str_detect(country,
                                                         "switzerland|Zurich|Switzerlandjust")),
                                        "Switzerland"),
                      country = replace(country,
                                        which(str_detect(country,
                                                         "Sydney|Australia|australia")),
                                        "Australia"),
                      country = replace(country,
                                        which(str_detect(country,
                                                         "Spain|Barcelona")),
                                        "Spain"),
                      country = replace(country,
                                        which(str_detect(country,
                                                         "Soweto|Southy|South africa|Johannesburg")),
                                        "South Africa")) %>% 
              mutate (country = replace(country,
                                        which(country == "United"),
                                        "USA"))

To create a time series of the location of tweets, I created an additional variable hour with information about the date and hour that the tweet was posted. The dataset was also re-arranged by the date posted.

Code
tweet_clean <- clean_tweet %>% 
              mutate (hour = paste0(months(as.Date(updt_tweet$created_at)),
                                     " ", day(created_at),
                                   ": ", hour(as_datetime(created_at)), "Hrs")) %>% 
              arrange ((as_datetime(created_at)))


levels <- as.character(unique(tweet_clean$hour))                         

We can create a bar graph to show the distribution using the code chunk below. First, we ordered the distribution in descending order and then selected the top 20 locations. To adjust for the position of the data labels, we can create another column with 200 subtracted from the value of n when n > 200 and 100 added to the value of n if otherwise.

Code
migr1_plot <- tweet_clean %>% 
              select(country) %>% 
              count(country, sort = TRUE) %>% 
              slice (1:20) %>% 
              mutate (pos = ifelse((n > 200),
                                 n-200,
                                 n+100)) %>% 
              ggplot() +
              geom_col(aes(y = reorder(country, n),
                           x = n),
                       fill = c("#003f5c", "#bc5090",
                                rep("#003f5c", 18))) + 
              geom_text (aes(y = country,
                             x = pos,
                             label = n), 
                        color = c(rep("#ffffff",7),
                                   rep("#000000",13)),
                        family = axis_text, 
                        fontface = "bold",
                        vjust = 0.5, hjust = 0,
                        size = 12) +
              labs (x = "Number of Tweets",
                    y = "",
                    subtitle = "Top 20 desired countries of \ndestination by Nigerians on Twitter") +
              theme_minimal (base_size = 38,
                             base_family = axis_text) + 
              theme (legend.position = "none",
                     plot.subtitle = element_text(lineheight = unit(0.4, "pt"),
                                                  size = 45),
                     panel.grid = element_line(colour = NULL),
                     panel.grid.major.y = element_line(colour = "#D2D2D2",
                                                       linetype = "dashed",
                                                       size = 0.3),
                     panel.grid.major.x = element_line(colour = "#D2D2D2",
                                                       linetype = "dashed",
                                                       size = 0.3),
                     # panel.grid.major.x = element_blank(),
                     panel.grid.minor = element_blank()) +
              scale_x_continuous(labels = scales::comma)

migr1_plot
ggsave(file="top_country_tweet.png", dpi=350, height= 7, width= 7)

The figure below shows the top 20 countries from which Twitter users from Nigeria were reportedly tweeting. The figure shows that the United States of America, the United Kingdom, and Canada are Nigerians’ most preferred international destination on Twitter.

In addition, it emerged that a significant number of users reportedly tweeted from Nigeria - an act that could be interpreted as being in solidarity with the country’s ban on Twitter activities. Both South Africa and Ghana were also the only African countries in the top 20 preferred destinations.

Data Wrangling - UN Migration Stock 👨‍💻

As mentioned previously, we can compare the trends observed on Twitter with those from traditional data sources. The dataset includes information on the years, the total number of moves from countries, regions or other economic classifications of countries. I retained and renamed the year, country of destination and counts of move from Nigeria.

The UN migration stock data includes estimates of in-and-out migration by the country of origin and the country of destination.

Code
#International migrant stock 2019
migration <- migration_UN %>%
            select (year = ...1,
                    country = ...3,
                    code = ...5,
                    Nigeria) %>% 
            filter (Nigeria != ".." &
                      Nigeria != "" & !is.na(Nigeria)) %>% 
            mutate (Nigeria = as.numeric(Nigeria))

I also retrieved the total number of migrants from Nigeria in 2019 using the code and year columns. I also retrieved and assigned all regional grouping of countries to a vector.

Code
migration$Nigeria[migration$code == "900" &
                    migration$year == "2019"] %>% 
  as.numeric() -> mt

mt[1]
regions <- migration$country[1:19]

Lastly, I filtered the data based on a set of conditions

  • Excluded moves to regions such as “Europe”, “More developed regions”, “Africa” and others to focus more on moves to other countries.

  • Excluded rows that include any regional classification such as “Western Africa” or different capitalisation formats for the regions such as Europe or EUROPE.

  • Retained only the more recent (2019) counts of migration. There was no migration estimate for the year 2020 at the time of this analysis.

Subsequently, I calculated the percentage of moves from Nigeria to each of the countries and ranked these to obtain the top 20 countries of destination for Nigerians.

Code
migration_updt <- migration %>% 
                  filter (!country %in% regions) %>% 
                  filter (!str_detect(country, "Western Africa|Southern Africa|Middle Africa|Northern Africa")) %>%
                  filter (!str_detect(country, "Europe|EUROPE|Asia|ASIA|OCEANIA|NORTHERN AMERICA")) %>% 
                  filter (year == 2019) %>% 
                  mutate (perc = round(Nigeria/mt[1], 3)) %>% 
                  arrange (desc(Nigeria)) %>% 
                  slice (1:20) %>% 
                  mutate (country = replace(country,
                                            which(country == "United States of America"),
                                            "USA"))

We can also create a bar graph to distribution of the top 20 countries that Nigerians migrated to in 2019 using the code chunk below.

Code
migr2_plot <- migration_updt %>% 
              mutate (percentage = paste0((perc * 100), "%")) %>% 
              ggplot() +
              geom_col(aes(y = reorder(country, perc),
                           x = perc),
                           fill = "#003f5c") + 
              geom_text (aes(y = country,
                             x = perc + 0.01,
                             label = percentage),
                         color = "#000000",
                        family = axis_text, 
                        fontface = "bold",
                        vjust = 0.5, hjust = 0,
                        size = 12) +
              labs (x = "Number of Tweets",
                    y = "",
                    subtitle = "Top 20 destination countries \nfor Nigerian migrants in 2019") +
              theme_minimal (base_size = 38,
                             base_family = axis_text) + 
              theme (legend.position = "none",
                     plot.subtitle = element_text(lineheight = unit(0.4, "pt"),
                                                  size = 45),
                     panel.grid = element_line(colour = NULL),
                     panel.grid.major.y = element_line(colour = "#D2D2D2",
                                                       linetype = "dashed",
                                                       size = 0.3),
                     panel.grid.major.x = element_line(colour = "#D2D2D2",
                                                       linetype = "dashed",
                                                       size = 0.3),
                     # panel.grid.major.x = element_blank(),
                     panel.grid.minor = element_blank()) +
              scale_x_continuous(labels = scales::percent,
                                 limits = c(0, 0.3))


migr2_plot

ggsave(file="top_countryTweet_UN.png", dpi=350, height= 7, width= 7)

The figure below shows the top 20 countries that Nigerians migrate to based on UN estimates. The figure shows that the United States of America, the United Kingdom, Italy, Germany and Canada are the top non-African countries of destination for Nigerian migrants.

The Republic of Niger, Benin and Ghana were the top African destinations for African migrants. Surprisingly, South Africa was not among the top 10 destination countries.

How do these compare with existing sources of data on migration 📊

For ease of comparison, I used the ggarrange() function from ggpubr📦 to combine the two graphs into one for ease of comparison.

Code
ggarrange(migr1_plot, migr2_plot,
          nrow = 1, widths = c(0.45, 0.55))

ggsave(file="top_countryMig.png", dpi=350, height= 6, width= 9)

As shown in the figure above, tweets could be used to model migration flows to a large extent. The top two countries preferred countries of destination based on the Tweets model the migration patterns observed in 2019 - most people want to migrate to the USA and UK. Germany and Canada were also among the top five preferred destination countries for Nigerian migrants.

Although the two data sources were collected/estimated at different time points, the analysis also shows some inconsistencies between migration intention and actual migration flows. For example, regional migration flows (with Africa) was not prominent in the Tweet. Only a few users wanted to migrate to South Africa and Ghana compared to the actual flow of migration from Nigeria to Cameroon, Niger, Benin, and Ghana in 2019.

Extra

Next, we will use the gganimate 📦 to visualise and export the graph. We set the animation transition to change every day for which there’s data. I used day and month since that uniquely identifies the date and the visualisation focuses on just one year. We might have to include the year if we consider visualising the data for periods covering March 2020 to April 2021.

We also set the animation parameters in animate. This includes the animation speed (duration = 60s), and the size of the animation (width = 2000, height = 1800). The start_pause and end_pause freeze-frame also makes it easy to have a closer look at the final distribution of confirmed cases of coronavirus in the country before the loop begins afresh

You may also need to disable your Anti-Virus for the animation to be saved on your computer. You can also read more about other possibilities with the📦 on the package website.

Contact ✉️

If you have any suggestions for improving the tutorial or experience any difficulty with the codes in the tutorial, please send me an email or reach me via Twitter: @eOlamijuwon.