'Teaching Data Science' conference paper

Activity: Talk or presentation for an academic audienceOral presentation for an academic audience

Description

Experience suggests that motivating contexts are important for CS education: animation, games and robotics all certainly have a part to play in a broad and balanced computing education, but so does working with data. Data science is certainly a growth area in the application of programming beyond school, and few could argue that education has become increasingly 'datafied' in recent years. Now that students and teachers in many schools are gaining fluency in their use and teaching of programming, we as educators might start to look beyond our present sets of contexts and applications to solving, or at least understanding, real-world problems that affect many people: the study of computer science should be hand-in-hand with a wider awareness of issues affecting our communities, society and civilisation, and the incorporation of data science into a middle- or high-school curriculum is one way to support such ambitions. In the session Miles introduces an approach to planning a data science curriculum for schools, that addresses foundations (probability and statistics), applications (working with small and large scale data using freely available tools) and implications (the wider, personal and societal impact of data science). Miles demonstrates how some simple Python programming can help students' understanding of key ideas in probability and statistics, and how these domains can provide a useful context for practising coding skills. The main focus of the session follows a worked example in which Miles explores publicly available weather station data to give a flavour of how a data-based project could be approached through Python coding. Miles walks us through a question > acquire > clean > explore > visualise > model > communicate workflow that can be applied to many data science projects. The toolset here builds on students' growing understanding of programming in Python, extending that to some of the great libraries now available for doing data science. He demonstrates the Jupyter Notebook interface, reflecting Knuth's vision of 'literate programming' and providing a far more interactive environment for trying out ideas and exploring the patterns and relationships present in data. Using a cloud based Jupyter server, participants will be able to follow along on their own laptops, exploring the data, code and libraries themselves. The session concludes with a discussion of how we might help students to understand issues such as privacy, transparency and other ethical issues related to collecting, storing and processing personal data.

Estimated audience numbers (if applicable)

150
Period13 Jul 2020
Event titleCSTA 2020 Virtual Conference
Event typeConference
LocationChicago, United States, IllinoisShow on map
Degree of RecognitionInternational