Code
library(tweetrmd)
tweet_embed(tweet_url("1206638885553479680", "1401081942775406597"))
Press Release: Nigerians are back to Twitter tweeting from different countries. pic.twitter.com/5jEy40Fmir
— The Pinnacle™ (@the_pinnakle) June 5, 2021
Emmanuel Olamijuwon
February 22, 2022
June 20, 2023
About a year ago, the Federal Government of Nigerian prohibited the use of Twitter for official purposes. This was based on the micro-blogging platform’s reaction to the president’s tweet, which sought to treat members of IPOB in the language they understand.
Subsequently, several national carriers in the country blocked access to the microblogging platform. Meaning that users could no longer access Twitter. To circumvent this restriction, Twitter users had to use a VPN in Nigeria.
VPN (a virtual private network) is a technology that encrypts your internet traffic on unsecured networks to protect your online identity, hide your IP. A virtual private network extends a private network across a public network and enables users to send and receive data across shared or public networks as if their computing devices were directly connected to the private network.
Many VPNs allow users to connect to any of their servers in a limited number of countries, including the United Kingdom, United States, Canada, and others. This implies that Twitter users in Nigeria can access Twitter “from the UK, US or Canada”.
library(tweetrmd)
tweet_embed(tweet_url("1206638885553479680", "1401081942775406597"))
Press Release: Nigerians are back to Twitter tweeting from different countries. pic.twitter.com/5jEy40Fmir
— The Pinnacle™ (@the_pinnakle) June 5, 2021
tweet_embed(tweet_url("24291371", "1401098169966989313"))
Where are you tweeting from?👀#TwitterBan#TwitterbaninNigeria
— Punch Newspapers (@MobilePunch) June 5, 2021
Soon, Nigerians started disclosing where they were tweeting from in response to an initial tweet. For many, this just meant the country server they were connected to, while for others, a response in solidarity to the government’s position on the ban and a prophecy for where they wish to be - “speaking into existence” for others.
tweet_embed(tweet_url("4076510429", "1401107325671333891"))
Tweeting from London into existence. https://t.co/mSiIf86xwc
— Mykee (@Mykee_9) June 5, 2021
Several users responded to this tweet. However, it is not clear whether this information could reliably model known migration and migration intention patterns known previously. Multiple studies in the past have explored the potential use of digital traces as viable compliments to traditional data sources.
Here, I analysed the interactions of Nigerians on Twitter regarding where they were tweeting from with data from Twitter. I subsequently compared the trends from these data with data from the United Nation (UN) migration stock data. The UN migration stock data includes estimates of in-and-out migration by the country of origin and the country of destination.
library (tidyverse) # Data wrangling::
# dplyr::select(), filter(), mutate, arrange(),
# dplyr::ungroup(), arrange(), left_join()
# stringr::str_replace_all(), str_remove(),
# tidyr::pivot_longer(), pivot_longer()
library (readxl) # Read excel files
library (lubridate) # Manipulate and parse dates
library (rtweet) # Collect and analyze Twitter Data
library (tidytext) # Tidy text-data analysis
library (showtext) # Importing fonts:: showtext_auto()
library (ggpubr) # Merging graphs
library (gganimate) # for animating images:: transition_manual(), animate(),
library (gifski)
showtext_auto()
if ("BarlowCondensed-Light.ttf" %in% list.files("C:\\Windows\\Fonts")) {
font_add("BarlowCondensed-Light", "BarlowCondensed-Light.ttf")
axis_text <- "BarlowCondensed-Light"
} else {
font_add("ARIALN", "ARIALN.ttf")
axis_text <- "ARIALN"
}
## Caption Text
if ("BarlowCondensed-Medium.ttf" %in% list.files("C:\\Windows\\Fonts")) {
font_add("BarlowCondensed-Medium", "BarlowCondensed-Medium.ttf")
title_text <- "BarlowCondensed-Medium"
} else {
font_add("ARIALNI", "ARIALNI.ttf")
title_text <- "ARIALNI"
}
## Graphics Title
if ("AlegreyaSans-MediumItalic.ttf" %in% list.files("C:\\Windows\\Fonts")) {
font_add("AlegreyaSans", "AlegreyaSans-Italic.ttf")
caption_text <- "AlegreyaSans"
} else {
font_add("Candara", "Candara.ttf")
caption_text <- "Candara"
}
This analysis uses a corpus of tweets that includes the keyword “tweeting from”. The tweets were retrieved directly from Twitter on June 7 using the script below via Twitter API.
tweet_NG <- search_tweets(q = "\"tweeting from\"",
n = 50000,
geocode = "-2.791166,12.094754,2258mi",
include_rts = FALSE,
type = "recent",
retryonratelimit = TRUE)
Due to API-related restrictions, only tweets posted within the last 8days can only be retrieved via the API. As a result, the composition of tweets retrieved now may exclude those analysed in this work. UserIDs and screen names have also been removed from the tweets. As a result, I have saved the data on Github and can be accessed using the script below.
tweet_NG <- read.csv("https://raw.githubusercontent.com/eolamijuwon/datasets/main/MigTwitter.csv")
To further compare the patterns observed in the tweets with traditional data sources, data on migration stock was also retrieved from the UN website. The UN migration stock data includes estimates of in-and-out migration by the country of origin and the country of destination. The data can be retrieved and imported into R directly using the download.file()
function.
download.file(url = "https://www.un.org/en/development/desa/population/migration/data/estimates2/data/UN_MigrantStockByOriginAndDestination_2019.xlsx",
method = "curl",
destfile = "UN Migration.xlsx")
migration_UN <- read_xlsx("UN Migration.xlsx",
skip = 15,
sheet = 2)
I leveraged multiple approaches in cleaning the data.
I restricted the data to tweets posted between June 5 and June 7 because this was the period that most people posted about tweeting from Canada using a VPN.
Each tweet is a combination of multiple words. As a result, I sub-divided the tweets into trigrams (sentences with just three words) and subsequently created different columns for each word in the trigram.
Furthermore, I replaced words unrelated to the analysis with NA. This includes words that include numbers as this were unlikely to be the name of a country or a location. I also excluded other words such as https, opera, keepiton and others.
I also filtered the data for rows with valid words in either of word2 or word3 columns.
updt_tweet <- tweet_NG %>%
filter (created_at >= as.Date("2021-06-05") &
created_at <= as.Date("2021-06-07")) %>%
select (status_id, created_at, text, screen_name) %>%
unnest_tokens(bigram, text, token = "ngrams", n = 3) %>%
separate(bigram, c("word1", "word2", "word3"), sep = " ") %>%
filter (word1 == "from") %>%
mutate (word2 = replace(word2,
which(str_detect(word2,
pattern = "[[:digit:]]+[[:digit:]]+") |
str_detect(word2,
pattern = "(.)\\1{3,}") |
str_detect(word2, pattern = "\\b(.)\\b")),
NA),
word3 = replace(word3,
which(str_detect(word3,
pattern = "[[:digit:]]+[[:digit:]]+") |
str_detect(word3,
pattern = "(.)\\1{3,}") |
str_detect(word3, pattern = "\\b(.)\\b")
## removes any remaining single letter words
),
NA)) %>%
mutate (word2 = replace(word2,
which(word2 %in% stop_words$word),
NA),
word3 = replace(word3,
which(word3 %in% stop_words$word),
NA)) %>%
mutate (word2 = replace(word2,
which(str_detect(word2,
pattern = "https|opera|abeg|abi|keepiton|zoo|accessing|biko|jack|lol|country|countries") |
str_detect(word2, pattern = "twitterban|echoke|guy|vpn|twitter|tweeting|tundeeddnut") |
str_detect(word2, pattern = "trolls_queen|thankgodforvpn|thunder|timothymutuake|sm") |
str_detect(word2, pattern = "someplace|sophie|sos|sir|sire|9mobilengcare|9mobileng|3rd")),
NA),
word3 = replace(word3,
which(str_detect(word3,
pattern = "https|opera|abeg|abi|keepiton|zoo|accessing|biko|jack|lol|country|countries") |
str_detect(word3, pattern = "twitterban|echoke|guy|vpn|twitter|tweeting|tundeeddnut") |
str_detect(word3, pattern = "trolls_queen|thankgodforvpn|thunder|timothymutuake|sm") |
str_detect(word3, pattern = "someplace|sophie|sos|sir|sire|9mobilengcare|9mobileng|3rd")),
NA)) %>%
mutate (word3 = replace(word3,
which(!is.na(word2) &
str_detect(word3, "nigeria")),
NA)) %>%
filter (!is.na(word2) | !is.na(word3))
#!word2 %in% stop_words$word)
Due to the possibility of having two-words country names such as (United Kingdom), I merged word2
and word3
for instances where both columns are not missing while the column with a valid word is retained for instances when only one of word2
and word3
had a valid word.
During data exploration, I observed that some reported city names (such as London) as the location for tweeting. As a result, I created a comprehensive list of countries, their popular cities and re-group all locations to the national level. This was done for the United States, United Kingdom, Netherlands, Nigeria and others. I also carefully reviewed all the locations reported to identify those that may have been incorrectly spelt (such as Landon, Nethaland, etc).
clean_tweet <- tweet_GEO %>%
mutate (country = replace(country,
which(str_detect(country,
"Alabama|Alaska|Arizona|Arkansas|California|Colorado|Connecticut|Delaware|Florida|Georgia|Hawaii|Idaho|Illinois|Indiana|Iowa|Kansas|Kentucky|Louisiana|Maine|Maryland|Massachusetts|Michigan|Minnesota|Mississippi|Missouri|Montana|Nebraska|Nevada|Hampshire|Jersey|New Mexico|York|Unitedstates|Vegas|Carolina|Dakota|Ohio|Oklahoma|San|Oregon|Pennsylvania|Rhode Island|South Carolina|South Dakota|Tennessee|Texas|Utah|Vermont|Virginia|Washington|West Virginia|Wisconsin|Wyoming|Seattle|Miami|Boston|Chicago|Nyc|Los|Las|Houston|Losangeles|Atlanta|America|Dellas|Dallas|Usa|Brooklyn|usa|carolina|mississippiTwittingfromatlanta|florida|Minneapolis|Newyork|Manhattan|Maimi")),
"USA")) %>%
mutate (country = replace(country,
which(str_detect(country,
"Amsterdam|Netherland|Nederland|Nethaland|Neitherland|Nertherland|Meppel")),
"Netherlands")) %>%
mutate (country = replace(country,
which(str_detect(country,
"Buckingham|London|Manchester|Custard|England|Uk|Southampton|Unitedkingdom|Landon|Glasgow|Scotland|Northampton|Birmingham|Westminster|United kingdom|Britain|london|United kingdon|United kindom")),
"United Kingdom")) %>%
mutate (country = replace(country,
which(str_detect(country,
"Uar|Abj|Abk|Aba|Asaba |Arewa|Ibadan|Aso|Asorock|Abeokuta|Abuja|Adamawa|Nigeria|Ikorodu|Ado|Agege|Akoka|Akure|Lagos|lagos|Egbeda|Ikeja|Akurehowfar|Akwaibomtwitter|Naija|Portharcourt|Osun|Osogbo|Ogun|Yenagoa|Ibarapa|Calabar|Ife|Edo|Ebonyi|Eastern|Fmicnigeria|Oye|Katsina|Niegria|Nigeriagov|Lokoja|Lafiaji|Sokoto|Nigerians|Oduduwa|Uyo|9ja|Shomolu|Unitedafricanrepublic|United african|Zaria|Stateofosun|Sagamu|Mowe ibafo|Ogbmosho|Ogbomosho|Ogbomoso|Oke ode|Olaiya strt|Oluyole town|Ondo|Onitsha anambra|Ota|Makurdi|Lekki|Kaduna|Kano|Jos|Ekiti|Enugu|Akwa ibom|United africa|Port harcourt|Maiduguri|United aafrican|United adamugaba|United arewa|United ede|United epe")),
"Nigeria")) %>%
mutate (country = replace(country,
which(str_detect(country,
"Alberta|British Columbia|Manitoba|ontario|Montreal|Ontario|Quebec|Saskatchewan|Toronto|Vancouver|Canada|canada")),
"Canada")) %>%
mutate (country = replace(country,
which(str_detect(country,
"Paris|paris|France")),
"France")) %>%
mutate (country = replace(country,
which(str_detect(country,
"Instabul|Istanbul")),
"Turkey"),
country = replace(country,
which(str_detect(country,
"Germany|germany|Frankfurt|9jagermany|Berlin|Stutggart")),
"Germany"),
country = replace(country,
which(str_detect(country,
"switzerland|Zurich|Switzerlandjust")),
"Switzerland"),
country = replace(country,
which(str_detect(country,
"Sydney|Australia|australia")),
"Australia"),
country = replace(country,
which(str_detect(country,
"Spain|Barcelona")),
"Spain"),
country = replace(country,
which(str_detect(country,
"Soweto|Southy|South africa|Johannesburg")),
"South Africa")) %>%
mutate (country = replace(country,
which(country == "United"),
"USA"))
To create a time series of the location of tweets, I created an additional variable hour
with information about the date and hour that the tweet was posted. The dataset was also re-arranged by the date posted.
tweet_clean <- clean_tweet %>%
mutate (hour = paste0(months(as.Date(updt_tweet$created_at)),
" ", day(created_at),
": ", hour(as_datetime(created_at)), "Hrs")) %>%
arrange ((as_datetime(created_at)))
levels <- as.character(unique(tweet_clean$hour))
We can create a bar graph to show the distribution using the code chunk below. First, we ordered the distribution in descending order and then selected the top 20 locations. To adjust for the position of the data labels, we can create another column with 200 subtracted from the value of n when n > 200 and 100 added to the value of n if otherwise.
migr1_plot <- tweet_clean %>%
select(country) %>%
count(country, sort = TRUE) %>%
slice (1:20) %>%
mutate (pos = ifelse((n > 200),
n-200,
n+100)) %>%
ggplot() +
geom_col(aes(y = reorder(country, n),
x = n),
fill = c("#003f5c", "#bc5090",
rep("#003f5c", 18))) +
geom_text (aes(y = country,
x = pos,
label = n),
color = c(rep("#ffffff",7),
rep("#000000",13)),
family = axis_text,
fontface = "bold",
vjust = 0.5, hjust = 0,
size = 12) +
labs (x = "Number of Tweets",
y = "",
subtitle = "Top 20 desired countries of \ndestination by Nigerians on Twitter") +
theme_minimal (base_size = 38,
base_family = axis_text) +
theme (legend.position = "none",
plot.subtitle = element_text(lineheight = unit(0.4, "pt"),
size = 45),
panel.grid = element_line(colour = NULL),
panel.grid.major.y = element_line(colour = "#D2D2D2",
linetype = "dashed",
size = 0.3),
panel.grid.major.x = element_line(colour = "#D2D2D2",
linetype = "dashed",
size = 0.3),
# panel.grid.major.x = element_blank(),
panel.grid.minor = element_blank()) +
scale_x_continuous(labels = scales::comma)
migr1_plot
ggsave(file="top_country_tweet.png", dpi=350, height= 7, width= 7)
The figure below shows the top 20 countries from which Twitter users from Nigeria were reportedly tweeting. The figure shows that the United States of America, the United Kingdom, and Canada are Nigerians’ most preferred international destination on Twitter.
In addition, it emerged that a significant number of users reportedly tweeted from Nigeria - an act that could be interpreted as being in solidarity with the country’s ban on Twitter activities. Both South Africa and Ghana were also the only African countries in the top 20 preferred destinations.
As mentioned previously, we can compare the trends observed on Twitter with those from traditional data sources. The dataset includes information on the years, the total number of moves from countries, regions or other economic classifications of countries. I retained and renamed the year, country of destination and counts of move from Nigeria.
The UN migration stock data includes estimates of in-and-out migration by the country of origin and the country of destination.
I also retrieved the total number of migrants from Nigeria in 2019 using the code and year columns. I also retrieved and assigned all regional grouping of countries to a vector.
migration$Nigeria[migration$code == "900" &
migration$year == "2019"] %>%
as.numeric() -> mt
mt[1]
regions <- migration$country[1:19]
Lastly, I filtered the data based on a set of conditions
Excluded moves to regions such as “Europe”, “More developed regions”, “Africa” and others to focus more on moves to other countries.
Excluded rows that include any regional classification such as “Western Africa” or different capitalisation formats for the regions such as Europe or EUROPE.
Retained only the more recent (2019) counts of migration. There was no migration estimate for the year 2020 at the time of this analysis.
Subsequently, I calculated the percentage of moves from Nigeria to each of the countries and ranked these to obtain the top 20 countries of destination for Nigerians.
migration_updt <- migration %>%
filter (!country %in% regions) %>%
filter (!str_detect(country, "Western Africa|Southern Africa|Middle Africa|Northern Africa")) %>%
filter (!str_detect(country, "Europe|EUROPE|Asia|ASIA|OCEANIA|NORTHERN AMERICA")) %>%
filter (year == 2019) %>%
mutate (perc = round(Nigeria/mt[1], 3)) %>%
arrange (desc(Nigeria)) %>%
slice (1:20) %>%
mutate (country = replace(country,
which(country == "United States of America"),
"USA"))
We can also create a bar graph to distribution of the top 20 countries that Nigerians migrated to in 2019 using the code chunk below.
migr2_plot <- migration_updt %>%
mutate (percentage = paste0((perc * 100), "%")) %>%
ggplot() +
geom_col(aes(y = reorder(country, perc),
x = perc),
fill = "#003f5c") +
geom_text (aes(y = country,
x = perc + 0.01,
label = percentage),
color = "#000000",
family = axis_text,
fontface = "bold",
vjust = 0.5, hjust = 0,
size = 12) +
labs (x = "Number of Tweets",
y = "",
subtitle = "Top 20 destination countries \nfor Nigerian migrants in 2019") +
theme_minimal (base_size = 38,
base_family = axis_text) +
theme (legend.position = "none",
plot.subtitle = element_text(lineheight = unit(0.4, "pt"),
size = 45),
panel.grid = element_line(colour = NULL),
panel.grid.major.y = element_line(colour = "#D2D2D2",
linetype = "dashed",
size = 0.3),
panel.grid.major.x = element_line(colour = "#D2D2D2",
linetype = "dashed",
size = 0.3),
# panel.grid.major.x = element_blank(),
panel.grid.minor = element_blank()) +
scale_x_continuous(labels = scales::percent,
limits = c(0, 0.3))
migr2_plot
ggsave(file="top_countryTweet_UN.png", dpi=350, height= 7, width= 7)
The figure below shows the top 20 countries that Nigerians migrate to based on UN estimates. The figure shows that the United States of America, the United Kingdom, Italy, Germany and Canada are the top non-African countries of destination for Nigerian migrants.
The Republic of Niger, Benin and Ghana were the top African destinations for African migrants. Surprisingly, South Africa was not among the top 10 destination countries.
For ease of comparison, I used the ggarrange()
function from ggpubr
📦 to combine the two graphs into one for ease of comparison.
As shown in the figure above, tweets could be used to model migration flows to a large extent. The top two countries preferred countries of destination based on the Tweets model the migration patterns observed in 2019 - most people want to migrate to the USA and UK. Germany and Canada were also among the top five preferred destination countries for Nigerian migrants.
Although the two data sources were collected/estimated at different time points, the analysis also shows some inconsistencies between migration intention and actual migration flows. For example, regional migration flows (with Africa) was not prominent in the Tweet. Only a few users wanted to migrate to South Africa and Ghana compared to the actual flow of migration from Nigeria to Cameroon, Niger, Benin, and Ghana in 2019.
Next, we will use the gganimate
📦 to visualise and export the graph. We set the animation transition to change every day for which there’s data. I used day and month since that uniquely identifies the date and the visualisation focuses on just one year. We might have to include the year if we consider visualising the data for periods covering March 2020 to April 2021.
We also set the animation parameters in animate.
This includes the animation speed (duration = 60s), and the size of the animation (width = 2000, height = 1800). The start_pause and end_pause freeze-frame also makes it easy to have a closer look at the final distribution of confirmed cases of coronavirus in the country before the loop begins afresh
You may also need to disable your Anti-Virus for the animation to be saved on your computer. You can also read more about other possibilities with the📦 on the package website.
If you have any suggestions for improving the tutorial or experience any difficulty with the codes in the tutorial, please send me an email or reach me via Twitter: @eOlamijuwon.