Data Science with SQL Server and R
Introducing the language, statistics, data mining, and machine learning with R, and using data science in SQL Server and Microsoft BI stack.
R is the most popular environment and language for statistical analyses, data mining, and machine learning. Managed and scalable version of R runs in SQL Server, Power BI, and Azure ML. The main topic of the course is the R language. However, the course also shows how to use the languages and tools available in MS BI suite for data science applications, including Python, T-SQL, Power BI, Azure ML, and Excel. The labs focus on R; the demos also show the code in other languages.
Why This Course?
- Compare R vs Python.
- Get familiar with unsupervised learning methods.
- Execute matrix operations.
- Visualize associations between variables.
- Prepare Data for analytical tasks.
- Get familiar with supervised learning methods
Attendees of this course learn to program with R from the scratch. Basic R code is introduced using the free R engine and RStudio IDE. A lifecycle of a data science project is explained in details. The attendees learn how to perform the data overview and do the most tedious task in a project, the data preparation task. After data overview and preparation, the analytical part begins with intermediate statistics in order to analyze associations between pairs of variables. Then the course introduces more advanced methods for researching linear dependencies.
Finally, the attendees also learn how to use the R code in SQL Server, Azure ML, and Power BI through labs, and how to use Python for inside all of the tools mentioned through demos.
Attendees should have basic understanding of data analysis and basic familiarity with SQL Server tools
Module 1. Introducing data science and R
- What are statistics, data mining, machine learning…
- Data science projects and their lifetime
- Introducing R
- R tools
- R data structures
- Lab 1
Module 2. Introducing Python
- Basic syntax and objects
- Data manipulation with NumPy and Pandas
- Visualizations with matplotlib and seaborn libraries
- Data science with Scikit-Learn
- Lab 2: Discussion – R vs Python
Module 3. Data overview
- Datasets, cases and variables
- Types of variables
- Introductory statistics for discrete variables
- Descriptive statistics for continuous variables
- Basic graphs
- Sampling, confidence level, confidence interval
- Lab 2
Module 4. Data preparation
- Derived variables
- Missing values and outliers
- Smoothing and normalization
- Time series
- Training and test sets
- Lab 3
Module 5. Associations between two variables and visualizations of associations
- Covariance and correlation
- Contingency tables and chi-squared test
- T-test and analysis of variance
- Bayesian inference
- Linear models
- Lab 4
Module 6. Feature selection and matrix operations
- Feature selection in linear modelsExecute
- Basic matrix algebra
- Principal component analysis
- Exploratory factor analysis
- Lab 5
Module 7. Unsupervised learning
- Hierarchical clustering
- K-means clustering
- Association rules
- Lab 6
Module 8. Supervised learning
- Neural Networks
- Logistic Regression
- Decision and regression trees
- Random forests
- Gradient boosting trees
- K-nearest neighbors
- Lab 7
Module 9. Modern topics
- Support vector machines
- Time series
- Text mining
- Deep learning
- Reinforcement learning
- Lab 8
Module 10. R in SQL Server and MS BI
- ML Services (In-Database) structure
- Executing external scripts in SQL Server
- Storing a model and performing native predictions
- R in Azure ML and Power BI
- Lab 9