QA-HWDSHDP

Ladda ner som PDF

HDP Analyst: Data Science

This course Provides instruction on the processes and practice of data science, including machine learning and natural language processing. Included are: tools and programming languages (Python, IPython, Mahout, Pig, NumPy, pandas, SciPy, Scikitlearn), the Natural Language Toolkit (NLTK), and Spark MLlib.

Target Audience
Architects, software developers, analysts and data scientists who need to apply data science and machine learning on Hadoop.

Prior knowledge

Please note: Hortonworks courses are delivered using electronic courseware. for delegates attending remotely (Virtual classes or Attend from Anywhere) you must ensure that you have dual monitors or a single monitor plus tablet device. Dual monitors are required in order to allow you to view labs and lab instructions on separate screens.

Technical pre-requisites

This course is for you if you have little or no previous experience with data science, machine learning, or the field of big data, but are keen to learn. You might be a software developer keen to branch out into more analytical tools, a manager of a technical team that you need to understand, or just at the beginning of your technical career. A basic understanding of programming/scripting and experience with the Linux shell will be useful but not essential

Students must have experience with at least one programming or scripting language, knowledge in statistics and/or mathematics, and a basic understanding of big data and Hadoop principles. Students new to Hadoop are encouraged to attend the HDP Overview: Apache Hadoop Essentials course.

Objectives:

Course Outline:

  • Recognize use cases for data science on Hadoop
  • Describe the Hadoop and YARN architecture
  • Describe supervised and unsupervised learning differences
  • Use Mahout to run a machine learning algorithm on Hadoop
  • Describe the data science life cycle
  • Use Pig to transform and prepare data on Hadoop
  • Write a Python script
  • Describe options for running Python code on a Hadoop cluster
  • Write a Pig User-Defined Function in Python
  • Use Pig streaming on Hadoop with a Python script
  • Use machine learning algorithms
  • Describe use cases for Natural Language Processing (NLP)
  • Use the Natural Language Toolkit (NLTK)
  • Describe the components of a Spark application
  • Write a Spark application in Python
  • Run machine learning algorithms using Spark MLlib
  • Take data science into production

Hands-On Content

  • Lab: Setting Up a Development Environment
  • Demo: Block Storage
  • Lab: Using HDFS Commands
  • Demo: MapReduce
  • Lab: Using Apache Mahout for Machine Learning
  • Demo: Apache Pig
  • Lab: Getting Started with Apache Pig
  • Lab: Exploring Data with Pig
  • Lab: Using the IPython Notebook
  • Demo: The NumPy Package
  • Demo: The pandas Library
  • Lab: Data Analysis with Python
  • Lab: Interpolating Data Points
  • Lab: Defining a Pig UDF in Python
  • Lab: Streaming Python with Pig
  • Demo: Classification with Scikit-Learn
  • Lab: Computing K-Nearest Neighbor
  • Lab: Generating a K-Means Clustering
  • Lab: ... Läs mer

Objectives:

Course Outline:

  • Recognize use cases for data science on Hadoop
  • Describe the Hadoop and YARN architecture
  • Describe supervised and unsupervised learning differences
  • Use Mahout to run a machine learning algorithm on Hadoop
  • Describe the data science life cycle
  • Use Pig to transform and prepare data on Hadoop
  • Write a Python script
  • Describe options for running Python code on a Hadoop cluster
  • Write a Pig User-Defined Function in Python
  • Use Pig streaming on Hadoop with a Python script
  • Use machine learning algorithms
  • Describe use cases for Natural Language Processing (NLP)
  • Use the Natural Language Toolkit (NLTK)
  • Describe the components of a Spark application
  • Write a Spark application in Python
  • Run machine learning algorithms using Spark MLlib
  • Take data science into production

Hands-On Content

  • Lab: Setting Up a Development Environment
  • Demo: Block Storage
  • Lab: Using HDFS Commands
  • Demo: MapReduce
  • Lab: Using Apache Mahout for Machine Learning
  • Demo: Apache Pig
  • Lab: Getting Started with Apache Pig
  • Lab: Exploring Data with Pig
  • Lab: Using the IPython Notebook
  • Demo: The NumPy Package
  • Demo: The pandas Library
  • Lab: Data Analysis with Python
  • Lab: Interpolating Data Points
  • Lab: Defining a Pig UDF in Python
  • Lab: Streaming Python with Pig
  • Demo: Classification with Scikit-Learn
  • Lab: Computing K-Nearest Neighbor
  • Lab: Generating a K-Means Clustering
  • Lab: POS Tagging Using a Decision Tree
  • Lab: Using NLTK for Natural Language Processing
  • Lab: Classifying Text using Naive Bayes
  • Lab: Using Spark Transformations and Actions
  • Lab Using Spark MLlib
  • Lab: Creating a Spam Classifier with MLlib

Please note: This course is delivered by accredited Hortonworks instructors. The syllabus includes specific use cases and examples to help illustrate and reinforce the theory and the functionality of the technology being explored. Where possible, the instructor will provide further examples and answer questions relevant to an individual delegates specific application of the technology. However, due to the complexity of the technology and the breadth of application across industries, this may not always be possible in the classroom environment.

Utbildningen levereras i samarbete med

Kurs-ID: QA-HWDSHDP
Längd: 3 dagar
Pris exkl moms: 19 716 kr

Frågor om kursen?

Har du frågor om kursens innehåll, leveransdatum/ort eller behöver en företagsanpassad variant? Fyll i formuläret nedan!


Avtalsrabatter och kampanjer kan ej nyttjas på denna kurs.


Ort och datum

Cloud Access
i Läs mer

Delta på kursen från ditt hem, jobb eller annan plats.

13 mar-15 mar
Boka nu!
10 jul-12 jul
Boka nu!

Tipsa