Book Spotlight: How to Think like a Data Scientist

It is said that the most important characteristic of a data scientist is curiosity. So how do you structure a class that encourages students to be curious and ask questions of the data? When I first taught DS-320 at Luther in 2017 I had to make it up as I went. Luckily I had some great, and very patient students that were willing to go with it. My main goal was to “let the data drive the learning!” My vision for the course was to pick some data sets, do exploratory data analysis on them, generate a bunch of questions as a class, and then figure out what we needed to know in order to answer our questions. As a course, I thought this was an amazing and really fun way to structure the learning. It doesn’t lend itself to a structured day by day syllabus since you can’t necessarily predict everything you are going to learn when you start! But we learned a LOT in that class, and we had a lot of fun doing it.

Finding textbooks for undergraduate data science courses is really hard. There is little agreement on curriculum at the undergrad level and definitely not much with a more liberal arts emphasis. So, I was thrilled when the Applied Computing Series team at Google asked me to take the lead on creating a book for their AC201 course.

This book is an attempt to build some structure around the approach described above, without totally killing the spontaneity of encouraging students to ask good questions of the data. You are never going to find a data set to make everyone happy, but if you pick several data sets hopefully enough of them will interest enough students to keep everyone engaged. In this text (so far) we look at World Happiness data, Movie Reviews, the CIA world factbook, United Nations speeches, Bike Rental data from Washington DC and shopping cart data from Instacart.

The learning objectives of the course are as follows:

  • Articulate the data science processing pipeline
  • Extract data using SQL
  • Gather data from the Internet using web API’s and screen scraping
  • combine data from different sources
  • Clean the data
  • Handle missing data/finding outliers/fixing data
  • Normalizing and rescaling data
  • Visualize the data
  • Translate questions to analysis and analysis to interesting stories
  • Analyze data
    • Single variable regression, logistic regression
    • Market basket analysis
    • Cohort analysis
    • Sentiment analysis, exposure to Bayes Theorem
    • Time series
    • Geographic analysis
    • Simulations, Monte Carlo
  • Understand statistical significance and how to test for it using practical simulation techniques.

You can see how the individual skills learned map onto different data sets and chapters by taking a look at the preface

One of the big challenges of this was how to make the book interactive even while wanting students to install and run their own copy of a Jupyter notebook server. The approach is to have the book lead the students through some analysis while asking them to do work in the notebook and bring answers back into the textbook. For example use the notebook to find the busiest bike rental pickup point, and then paste the id of that station into a multiple choice question in the text.

Maybe at some point we’ll have a way to embed Jupyter notebook cells into a Runestone text, but that will require a LOT more computing power.

In the meantime please take a look. The book, as is, has been classroom tested in four schools this spring, but I think there is a lot more content that could be added, and the existing content still needs work to clean up. So feel free to let us know about any issues on github.

Runestone History and a Roadmap

Runestone Interactive was created in 2011 during Brad’s sabbatical. I should have been working on a new edition for two paper textbooks, but I had the worst kind of writers block. I just couldn’t stand the idea of a paper textbook for computer science in 2011. Textbooks should let you run the examples! Even better textbooks should encourage you to edit the examples and play around with them. When a google search for python in the browser turned up the skulpt project I knew I was onto something.

After spending a couple of months building a turtle graphics module, I realized that nobody would write a book if they had to do a ton of javascript programming for every example. So I started to look around and found Sphinx and docutils. Although markdown is probably more popular, Sphinx/docutils is so much more extensible. So I set out about writing some extensions to Sphinx, and the rest is history. Now adding an example to the textbook is just as easy as copy/pasting the code into the plain text document!

We first used Runestone in the classroom in 2012 for 60 students at Luther College. From 2012 to now Runestone has grown to serving 25,000 students a day around the world at something like 800 institutions. The real surprise came when I discovered that many of them were high schools. This made me very happy !

Our library now lists 18 books! But there are probably at least another 18 that I don’t know about. The number of translations of Runestone books that I have randomly discovered is amazing. That makes me very happy also.

The tagline “democratizing textbooks for the 21st century”, is really inspired by a class visit with Guy Kawasaki in a class I taught during January Term when I would take 12 students to Silicon Valley to visit with entrepreneurs, at all kinds of companies. It is, in Guy’s terms, a mantra. It means that textbooks should be free! You should not be excluded from learning about CS because you cannot afford $200 for a textbook! If Runestone can play a role in disrupting textbook publishing that would be awesome. I’m hoping that Runestone can serve 2 million students a day in my lifetime! It also means that textbooks should be interactive, intelligent, living documents.

In 2018, I decided to leave my dream job at Luther College to focus all of my energy on a new dream job, Runestone Interactive. I was growing increasingly frustrated that there were not enough hours in the day to teach classes, attend committee meetings, grade homework, prep lectures, and work on Runestone. This turned out to be a great leap of faith, as not long after I made the decision I was contacted by a some team members in Google’s EngEDU organization that wanted to use Runestone as part of their Applied Computing Series of courses. The goals of Runestone and the goals of the AC team could not be more aligned. Runestone is also used as a platform for teaching courses by LaunchCode. I get to work with a bunch of really smart Googlers, and have time to continue to develop Runestone.

A Roadmap for the Future

The sign of a good project is that the todo list never gets shorter. Every time I cross something off the list three new things replace it. There is no doubt that with focus, time and energy that Runestone can be way more awesome than it is today. The details and current development priorities are outlined here .

What I am most interested in is creating a sustainable community around Runestone so that it will continue after I am not caring for it every day. This means a concerted effort on funding, on growing the number of students, and building the number of authors and developers.

All of the above has been happening organically, but we need to accelerate on all fronts. This new home page, is one part of that. YOu will begin to see articles detailing development, as well as posts about how people are using Runestone in the classroom in real life. Please share this site with your friends, and colleagues, introduce influencers to Runestone and help us spread the word.