June meeting: Machine learning, data viz and prizes!

read

This month’s meeting will be held on Wednesday, 21st of June, at 5.00pm, in Chrystal MacMillan Building, Seminar Room 2 (same as G.02). Special things are planned for this meeting: we have some prizes from the awesome GitKraken team to give away! So come along to get a raffle ticket, and see what you might win at the end of the meeting. The meeting will be followed by drinks and chat in the Potting Shed. Everyone is welcome, particularly newcomers and/or beginners!

Our first speaker is Alastair Rushworth, data scientist at Tesco Bank, who will tell us about:

Doing machine learning in R (get slides here)

Recent developments in supervised learning have resulted in a diverse constellation of packages appearing on CRAN each with varying syntax, hyperparameter and input requirements. Navigating the idiosyncrasies of different software packages presents a substantial challenge for the time-pressed analyst wishing to compare (or combine) several model types for a particular task. In this talk I will provide a brief overview of the machine learning landscape in R, and introduce packages such as caret and mlr that provide high level wrappers to specialist CRAN packages and which streamline the process of working with different models.

Our second speaker is Nevil Hopley, who teaches maths and statistics at George Watson’s College, and has been learning R since January 2017. The discussion will focus on how to visualize frequency tables:

A novice R user’s journey (download R code here)

Consider an example data frame with 70 entries of two variables. These can be either numeric or categorical variables. In this basic example we have categorical values of ‘a’, ‘b’, ‘c’ and ‘d’ and numerical values of 1, 2, 3. Creating a frequency table from the data frame, using table() gives:

	1	2	3
a	9	5	9
b	6	8	2
c	3	3	4
d	3	11	7

Options for visualizing this information using ggplot2 include displaying the data frame as a:

(Count plot created using `geom_count`).

As a new variant, I shall talk through how to create what could be called a ‘breakdown plot’ or ‘cluster plot’. In addition, I shall showcase the application of this new visualisation in the context of displaying reporting data on pupils in a Secondary School.

Breakdown / cluster plot