R Training – the R Academy of eoda

The R Academy of eoda is a modular course program for the R statistical language with regular events and training sessions. Our course instructors have been working with data analysis for over 10 years.

The course concept is aimed to train you to become an R expert. Depending on your needs and interests, you can choose from a variety of different course modules. A strictly hierarchical structure does not exist, and the modules can be combined individually.

Our R training at universities, graduate centers as well as for companies are regularly evaluated and rated very well. A selection of our references:

 

Theme November 2014 December 2014 February 2015 March
2015
April
2015
May
2015
September 2015 October
2015
November 2015
Introduction to R     9.2 bis 10.2       8.9 bis 9.9    
Data Mining with R       17.3 bis 18.3         9.11. bis
10.11
Text Mining with R                 11.11 bis
12.11
Multivariate Statistics         20.4 bis 21.4     7.10 bis
8.10
 
Multivariate Statistics II         22.4        
Time Series Analysis with R 10.11 bis 11.11   16.2 bis 17.2       14.9 bis
15.9
   
Time Series Analysis with R II / Survival Analyses 12.11   18.2            
Advertising-Effectiveness Measurement with R             16.9    
Graphics with R 24.11 bis 25.11       27.4 bis 28.4     12.10 bis
13.10
 
Interactive Graphics with R 26.11       29.4 bis 30.4     14.10 bis 15.10  
Big Data and Hadoop with R       11.3       9.10  
Quality Management with R   2.12 bis
4.12
      11.5 bis 13.5      
Programming with R     11.2 bis 12.2       17.9 bis
18.9
   
Data Management with R       16.3     10.9    
Creating Packages with R     13.2            
R in Live Systems       12.3 und 13.3         16.11 und
17.11
Reproducible Research with R     19.2       11.9    
Introduction to R
  • First steps in R
  • Structure of R, CRAN-Mirror, different environments/editors of R, usage of the internal help functions, internet based help sources
  • The basic concept and philosophy of R
  • Programming language, object orientation in R, functions
  • Types of variables
  • Vectors, data frames, lists, …
  • Import Data
  • .txt-, .csv-, .xls-, .sav-files, internet sources …
  • Data management
  • Assign variable attributes, creating variables, conditional transformations, selecting/filtering cases respectively variables
  • Basic data analysis
  • First descriptive statistics, i.e. means, deviations and other parameters, simple tables and graphics
Data Mining with R

Data Mining indicates a set of methods extracting knowledge from datasets without having presumptions about the data structure. Statistical und mathematical techniques are applied on data to expose inherent patterns. Generally the methods don´t need a high level of measurement (categorical, ordinal or metric scale) while they have the capability to release complex non-linear data relations. Universal applications for Data Mining methods are forecast-models, basket of goods analysis, target group analysis and more.

Methods which are part of the course:

  • Regression- and Classification Trees
  • Random Forest
  • Artificial Neural Networks
  • Support Vector Machines
  • K-Means-Clustering
Statistical Testing Methods

With the help of hypothesis testing the aim of the course is to investigate whether there are differences or relationships between different variables and whether they are randomly or systematically. Depending on the data format different testing methods are used. This course will presen the main methods.

Multivariate Statistics
  • Regression Analysis
  • Factor Analysis
  • Cluster Analysis
Time Series Analysis I

Introduction to time series methods

Foundations, seasonality, creating time series objects
• visualization of time series
decomposition

Trend, seasonal and random effects; calculation of seasonally adjusted values
• test method

Stationarity and autocorrelation
exponential smoothing

Modeling to Holt-Winters, ETS and STL
ARIMA models

Manufacture of stationarity about differentiation; definition of AR and MA terms; modeling
forecasting

Seasonal and non-seasonal models; outlier treatment
• introduction to event history analysis

Basics of creating objects Survival
Kaplan Meier model

Kumulativie hazard curves, log-rank test
Cox regression

Modeling, model checking, interpretation of the coefficients

Time Series Analysis II

To estimate the time span until a special incident occurs, survival-models are used. For example, the prognosis of machine breakdowns or etiopathology are possible application areas. The usage of survival-analyses is taught on the basis of practical representatives. At the end of the course, every attendee should be able to exert the content for his own purpose. To get the best results, we recommend the participation in time series analysis I first.

The following methods are part of the content:

  • Introduction to the fundamental terms of survical-analyses

Episodes & censoring, survivor-functions, hazard-rate

  • Introduction to the survival-analysis on R

The survival package

  • Kaplan-Meyer-Estimator

Basic concept, Visualization, tabulation, group comparison, significance test

  • Cox-Proportional-Hazards-Model

Requirements and approvals, model configuration, the function coxph(), the ties-argument, interpretation of the result

  • Time-varying variables & splitting of episodes

The function survSplit()

  • Cox regression

Implementation in R, comparison of models, likelihood-ratio-test, information criteria (BIC/AIC), appraised values

Graphics with R
  • An overview of R Graphics
  • Functions for producing standard plots
  • Ggplot2 and lattice
Interactive Graphics with R

Interactive graphics are a flexible and efficient way to analyze data and to present analysis results. Interactive graphic applications offer queries, selections, highlighting or the modification of graphics parameters. In the environment of R, there are various concepts that provide the possibility to create interactive graphics and applications directly out of R (IPlots, shiny [eoda shiny App]e.g.). The course presents an overview of the creation of interactive graphics with R and provides the tools to independently implement interactive visualizations in R.

Text Mining with R

As a discipline of Data Mining, Text Mining includes algorithm based analysis methods for the detection of structures and information from texts by using statistical and linguistic analysis tools. An example of application is the Web Mining, which can identify trends and customer requirements on websites and social media platforms. Text Mining is also used to forecast price trends and stock prices on the basis of news reports.

The course focuses on the application of the packets tmRTextTools and OpenNLP and covers the following aspects:

Overview of  Text Mining

Import of unstructured data, Web Scraping

Structuring of texts (Pruning, Tokenization, Sentence Splitting, Normalization, Stemming, N-Gramme)

Simple content analysis and association analysis

Classification of documents with different methods(Support Vector Machines, Generalized Linear Model, Maximum  Entropy, Supervised latent Dirichlet allocation, Boosting, Bootstrap aggregating, Random Forrests, Neural Networks, Regression Tree)

 

Applied Statistics in Quality Management with R

Statistical Controlling of incoming goods in production, and outgoing goods generate operating figures necessary to rate the quality of goods and products. The requirements to process quality controls systematically are methodical knowledge of statistics as well as of the right software. The open source statistical language R represents an interesting alternative.

The course conveys basic knowledge concerning R which can be used to manage previously processed statistical data. Before they are processed practically with R, the concepts of statistical testing will be introduced theoretically. Furthermore AQL standard values according to ISO 2859 and DIN ISO 3951 will be discussed. Additionally their operation modes and application will be presented related to practical applications. The application of the methods in R covers the most important functions in the area of statistical testing and the development of quality control plans. Essential contents from the area of inference statistics include:

  • How can the optimal size of a random sample be determined?
  • How can a decision for a specific testing method be made?
  • How can operating figures be interpreted?
  • Which degree of safety does the result of the random sample contain?
  • How can the risks of deliverers and customers be arranged?
  • Which discrepancies are acceptable?
Programming with R I

 

  • loops and conditionals
  • „apply" funtions
  • Writing own functions
  • The S3 class system
  • Parallelization
  • Integration of other programming languages and operating systems
Programming with R II

The combination of extensive statistics libraries and well founded programming concepts makes R to a powerful programming language for all tasks related to Data Mining, Predictive Analytics and many more.

This continuative course is designed to deepen the participant’s programming knowledge. The course’s goal is to enable the participants to program faster, wider and on a higher level of quality in R to ensure high quality programming solutions.

The following topics will be treated in the course:

  • Metaprogramming

  Exceptions, calling, evaluation, parsing

  • Exceptions/ Error Handling

  try-catch, debug, browser, traceback

  • Performance Optimization

  profiling, memory management, data.table, parallel processing (ff, foreach, plyr)

  • Class Structures in R

  class systems (S3 and S4), reference class

  • Package Development

  filesystem, documentation, testing, Namesspace

 

Big Data with R

Various initiatives have developed different concepts to cope with Big Data. For example different parser and packages have been developed to facilitate the handling of Big Data in R. The course will give an introduction to the following aspects:

  • Connection to data sources like data bases or file systems as Hadoop
  • Linking to cloud environments like WindowsAzure or Amazon Web Services
  • Chunking – Partitition of data into sub parts
  • Parallelization of jobs for calculation
  • Overview over different parser’s concepts (Revolution Analytics, Oracle R Enterprise, Renjin, …)
  • Visualization of Big Data
Hadoop with R

Data in scattered systems require different methods of analysis than not-scattered data do. The principle of MapReduce is to divide problems into small tasks which can be solved on a small part of data. A typical example of application of data, which are saved in a Hadoop-System, is the counting of word in text files. Conventional techniques work through the whole text en bloc which can be really time-consuming. MapReduce fragments the text into single knots and small blocks. The Reduce-Part reunites the results. Even complex search-, compare-, and analysis operations can be parallelized in this way and can therefore be calculated faster. The course does convey the development of scripts for MapReduce jobs with concrete examples.   

Reproducible Research

The analysis of statistical data generate reports with various elements such as text, data, formulas, tables, and graphics . Interfaces between R and latex/html can bring the various contents in R together, and create a clear output which is available for presentation. In addition, it allows R to customize the reports dynamically on the basis of new data. In the method known under the term Reproducible Research the report items are updated without making any manual adjustments. After completion of the course, the participants should be able to create customized and automated reports.

Contents of the course :

• The user interface R-Studio

• The packets " Sweave " and " knitR "

• Short introduction to latex , Markdown and HTML

• Formatting the R-issues with Chunk options

• Making static report templates in various output formats such as pdf and html

• Dynamic reports and automated adjustments

 

The combination of theoretical introductions, specific cases and practical exercises ensure the success of learning.

 


 

Onsite R Training

As an alternative to R-Academy we offer our trainings onsite. The in-house training can be individually assembled and aligned to your needs. On request we also offer our trainings in English. Please contact us for an offer.

 

 

 


© eoda 2014