A twitter trend took over on August 4-6th wherein people were dropping the convention of not “following back” people back and essentially following every person using the hashtag #nocomradeunder1k. I decided to pull out the rtweet package to see how the trend was faring after a day: library(rtweet) comrade_df <- search_tweets("#NoComradesUnder1k", n = 100000, retryonratelimit = T, parse = T, include_rts = F) I had gained quite a few followers (250 in 5 hours vs 500 over multiple years on another account), and was wondering what that twitter network looked like, where people were using the hashtag, and whether or not I could automate a process to give my new followers a gift of their own network / friends (e.

Continue reading

code.sourceCode span { display: inline-block; line-height: 1.25; } code.sourceCode span { color: inherit; text-decoration: inherit; } code.sourceCode span:empty { height: 1.2em; } .sourceCode { overflow: visible; } code.sourceCode { white-space: pre; position: relative; } div.sourceCode { margin: 1em 0; } pre.sourceCode { margin: 0; } @media screen { div.sourceCode { overflow: auto; } } @media print { code.sourceCode { white-space: pre-wrap; } code.sourceCode span { text-indent: -5em; padding-left: 5em; } } pre.

Continue reading

Since an undergrad I’ve found it difficult to create a really nice streetmap. They always tend to look cluttered, line widths for different streetypes are always a challenge…the list goes on. I found this amazing post (and site) created by Christian Burkhart which gives some great tips on graphic design and data visualization. This post is largely a walk-through of the process he uses. if(!require("pacman")) install.packages("pacman") pacman::p_load("osmdata", "tidyverse","sf") Following the tutorial, I can extract the data for my city (Montreal) using the osmdata package:

Continue reading

As a part of a Montreal-based meetup group that I’ve attended a few times (far less than I would like), I’m producing here a reproduction of a “Makeover Monday” data vizualization, of a UNESCO dataset found here. I’ll be using a few packages for this project: library(pacman) p_load("reactable", "tidyverse", "readxl", "DataExplorer") Based on the visualization from UNESCO, it looks like some countries may be excluded from the dataset.

Continue reading

After finding myself going back to some previous projects a few times to review some very useful lines of lesser-known dplyr functions, I decided I should write them both into the eternal bottomless pit that is web-blogging. Using mutate_at and case_when I love this example. I found myself constantly repeating case_when() lines within a mutate() to change variables based on names, and knew there had to be a better way. I’m sure it could be neater, but until I make it more efficient, this is what I have:

Continue reading

The other night I was reading and kept seeing some very interesting lines in the text, and I thought, “is it possible to identify the more quote-worth sentence(s) from a text?”. Realizing that this is a pretty big question, I decided to tone it down a bit and ask a more reasonable question - could I create a database of quotes by some of my favourite authors? Which led me to ask…could I create a database searching tool in a shiny app so anyone could check out quotes of their favourite author?

Continue reading

Missing data can cause a compromise in inferences made from clinicial trials, and the mechanism (or reason) why the data is missing in the first place implicates whether or not an analytic method can be used to correct that missingness at all. There are three mechanisms which can cause missing data: missing completely at random (MCAR) missing at random (MAR), or and missing not at random (MNAR) Jakobsen, Gluud, Wetterslev & Winkel (2017).

Continue reading

Author's picture

Corey Pembleton

Learning and using R to solve data challenges big and small

Analyst & Consultant

Montréal