Data Handling in Python

Course Summary

This three day course is aimed at those wishing to learn how to use Python to work with and handle Data. When combined with our Introduction to Data Science course you would be set up well to follow a Python learning journey into Data Engineering, Advanced Data Analytics, Data Science, Machine Learning, and Artificial Intelligence.

During the programme you will be introduced to Python and specific development environments and packages for working with Data, with a focus on NumPy, Pandas, Matplotlib, and Seaborn.

Along the way you will see how to clean and manipulate tabular data, apply simple statistical techniques and data visualisations, and learn about how to control the flow of your program in order to automate processes.

Throughout the course you will engage with activities and discussions with one of our Data Science technical specialists and complete technical lab activities to practice the techniques you have learnt and develop ideas for further practice.

Delegates will learn how to

  • Benefit from the speed and functionality of the NumPy and Pandas python packages
  • Create and control Data Visualisations using Matplotlib and Seaborn
  • Use Python with the Jupyter development environment
  • Retrieve, clean, and prepare data from multiple types of sources
  • Gain a firm grounding in Python with Data in order to progress to further study to connect to AI models, Engineer data pipelines, and develop Data Science solutions

This course is intended for Data Analysts, Data Engineers, Data Ops roles, and those training to consume AI Services or become Data Scientists and tune and develop Machine Learning and AI models on our subsequent Data Science learning pathway.

This course covers the key pre-requisites for a large range of further learning opportunities involving Python, Data, and AI.

No prior experience with Python is necessary, though it is assumed that you will be familiar with core data concepts such as simple table structures and data types – all the pre-requisites you need are covered by our Data Essentials course.

No

The course is delivered in collaboration with QA.

1. Introduction to Programming for Data Handling

  • Describe the pros and cons of using programming languages to work with data
  • Identify the languages most suitable for data handling
  • Explain the challenges of using programming languages versus data analysis tools

2. Introduction to Python and IDEs

  • Describe the key attributes of the Python programming language.
  • Explain the role of the Jupyter IDE for Python programming.
  • Use the Jupyter IDE to write a basic Python program.
  • Write a program which uses string, integer, float and boolean data types.

3. Data Structures, Flow Control, Functions, and Basic Types

  • Construct collections to solve data problems.
  • Utilise selection and iteration syntax to control the flow of a Python program.
  • Write reusable functions which can be used to alter data & automate repetitive tasks.
  • Use Python’s built-in open function to create, read, and edit files.

4. Mathematical and Statistical Programming with NumPy

  • Describe the core features of NumPy arrays.
  • Create, index, and manipulate NumPy arrays to solve data problems.
  • Use masking and querying syntax to retrieve desired values.
  • Use vectorised ufuncs.

5. Introduction to Pandas

  • Create, manipulate, and alter Series and DataFrames with Pandas.
  • Define and change the indices of Series & Dataframes.
  • Use Pandas’ functions and methods to change column types, compute summary statistics and aggregate data.
  • Read, manipulate, and write data from csv, xlsx, json and other structured file formats.

6. Data Cleaning with Pandas

  • Identify missing data and apply techniques to deal with it.
  • Deduplicate, transform and replace values.
  • Use DataFrame string methods to manipulate text data.
  • Write regular expressions which munge text data.

7. Data Manipulation with Pandas

  • Construct Pivot tables in Pandas.
  • Time series manipulation.
  • Stream data into Pandas to handle data size problems.

8. Methods for Visualising Data

  • Construct and tailor basic data visualisations using Matplotlib & Seaborn for both numeric & non-numeric data.
  • Meaningfully visualise aggregate data using Matplotlib and Seaborn.

Course Overview

3 dagar

Can’t find a (suitable) date, but are interested in the course? Send in an expression of interest and we will do what we can to find a suitable opportunity.

Customized Courses

The course can be adapted from several perspectives:

  • Content and focus area
  • Extent and scope
  • Delivery approach

In interaction with the course leader, we ensure that the course meets your needs.

Skicka intresseanmälan för utbildningen

Send an expression of interest for the training