The eoda R Academy
The eoda R Academy is the modular training program for R that deals with the various possibilities of the free programming language in a comprehensive and practiceoriented way. The training offer of the R Academy ranges from beginner courses to courses for advanced R users and from data management to the visualization of results.
Our R trainings for companies, universities and graduate centers are evaluated regularly and rated very highly. More than 1,000 satisfied participants speak for the quality of the eoda R Academy. You too can profit from the highquality course material and the ideal learning environment in small groups with a maximum of eight participants.
Onsite in your company or university or via web conferencing – with the R Academy you can be sure to obtain the right knowhow for achieving the maximum benefit of using R. Our onsite trainings can be assembled according to individual requirements and adjusted to your needs.
Become an R expert with the eoda R Academy.
Our Courses
Inside R
R is one of the best means for analyzing and visualizing data, for data mining and business intelligence. R achieves a level of functionality, quality and actuality that has never been accomplished before.
This course is an introduction to R and its basic functions and will facilitate the access to R with practical tips and exercises. This basic course serves R beginners as a starting point for the further application of R in individual usage scenarios.
 First steps into R
 Concept and philosophy of R
 Types of variables and their properties
 Importing data
 Data management
 Evaluations with R
 Generating basic graphics
The foundation of any data analysis is good data management. The major effort of the analyzing process arises from the preparation of the raw data.
The course Data Management with R teaches efficient methods for preparing differently structured data by means of practical examples.
The course focuses especially on working with the packages dplyr, tidyr and data.table.
Course Contents:
 data.table (memoryefficient editing and reading of large data sets)
 dplyr (easy and performant syntax for manipulating data frames)
 tidyr (transformation of data sets – from long to wide table and vice versa)
 Dealing with special types of data (editing date variables and string variables)
The combination of extensive statistics libraries and established programming concepts makes R a powerful programming language for any task in the field of data mining, predictive analytics and more. The usage of statistics and graphics functions with programming elements allows for an elegant and efficient automatization of regularly recurring script parts.
Participants will learn to use programming elements by means of practical examples. The aim of this course is to enable participants to apply what they have learned to their individual usage scenarios.
 Loops and control elements/ conditionals
 Vector oriented programming
 Apply family
 Defining own functions
 Environments and scoping
 Object oriented programming/ R class systems
 Exceptions/ error handling
 Package creation
 Debugging
With R, data sources like Excel files and data bases can be transformed into appealing reports. The data is read and evaluated in R and then the contents are published as website or pdf document via an interface between R and html or Latex.
The objects emerging from the analysis such as source code, text, data, formulas, tables and graphics can be combined elegantly within one report. With the on top processes, R creates the dynamic reports at the touch of a button. Modified data are taken into account without the need for manual adjustments.
In this course, participants will get to know the strengths of R in the automatized report creation and learn how to use them. Afterwards the participants should be able to create individual and automatized reports. The contents will be communicated practically based on a theoretical introduction, specific cases and exercises.
Course Contents
 The user interface R Studio
 The packages Sweave and knitR
 Brief introduction to Latex, Markdown and HTML
 Formatting R outputs with Chunk options
 Producing statistical report templates in different output formats such as pdf and html
 Dynamic reports and automatized adjustments
Methods
Data mining means the hypothesisfree extraction of information from data. Statistical and mathematical procedures are applied to data bases in order to identify existing patterns and connections. Data mining procedures usually impose low requirements on the level of measurement of data (categorical, ordinal, metric) and are able to identify complex, nonlinear connections.In this course, you will learn how to apply data mining procedures by means of practical training data sets. Those examples show you the central steps such as the preparatory data management, the training of algorithms as well as the creation and validation of forecasts and how to directly implement them in R. In the course you will generate R scripts which can serve the participants as a template for their own data mining applications.
Course Contents

Introduction to data mining

Model evaluation (forecast vs. observation, error matrix, ROC, Cutoffvalue, AUC, sensitivity, precision, lift, risk analysis, risk chart, ensemble modeling)

Data mining algorithms (decision and regression trees, boosting, random forest, neural networks, Naive Bayes, support vector machine; theory, parameter tuning, model and forecast creation)

Ensemble modeling (techniques and methods, bagging of different models, bagging of an algorithm, SuperLearner package)

Deep learning with H2O
In this course you will get to know analysis methods that will enable you to identify statistical correlations and patterns in your data. The focus is on three classic procedures of multivariate statistics: regression analysis, cluster analysis and factor analysis.
The linear regression analysis enables you to model correlations and influences of different factors on a certain target value. How does the weather affect my number of sales? Which distribution channels are the most successful ones?
The cluster analysis can identify hidden similarities between observations. One classic use case is customer segmentation.
The factor analysis can condense information from different measured values. This can be relevant for the field of anomaly detection for example.
This course can be regarded as an applicationoriented introduction to the three analysis procedures with a focus on the use of R. It addresses users who already have a basic knowledge of R and statistics.
Course Contents:
Regression analysis:
 Introduction to linear regression analysis
 Interpretation
 Model diagnostics
 Advanced modelling
 Other
Cluster analysis:
 Basic concepts of cluster analysis
 Determining similarities
 Hierarchically agglomerative merge algorithms
 Partitioned merge algorithms
 Interpretation and visualization
Factor analysis:
 Basic idea of the factor analysis
 Process of factor analysis
 Interpretation and evaluation
The aim of the time series analysis is to identify structures and patterns in data with time components, to describe these patterns and derive forecasts from this knowledge. The first type of time series analysis with R is about the handling of date variables and the extraction of calendar information. With this you can discover trends or seasonality in a time series.
A special type of analysis is the socalled survival analysis. It calculates the probability of an event at a certain point in time and is used for churn analyses, predictive maintenance and others.
The aim of this course is to introduce participants to the area of time series analysis and teach them the terminology and methods. Afterwards participants should be able to perform time series analyses with R on their own and use them for their individual use case.
Course contents
 Introduction to time series methods
 Date variables in R
 Visualizing time series
 Decomposition
 Exponential smoothing
 Testing methods
 ARIMA models
To estimate the time span until a special incident occurs, survivalmodels are used. For example, the prognosis of machine breakdowns or etiopathology are possible application areas. The usage of survivalanalyses is taught on the basis of practical representatives. At the end of the course, every attendee should be able to exert the content for his own purpose. To get the best results, we recommend the participation in time series analysis I first.
The following methods are part of the content:
 Introduction to the fundamental terms of survicalanalyses
Episodes & censoring, survivorfunctions, hazardrate
 Introduction to the survivalanalysis on R
The survival package
 KaplanMeyerEstimator
Basic concept, Visualization, tabulation, group comparison, significance test
 CoxProportionalHazardsModel
Requirements and approvals, model configuration, the function coxph(), the tiesargument, interpretation of the result
 Timevarying variables & splitting of episodes
The function survSplit()
 Cox regression
Implementation in R, comparison of models, likelihoodratiotest, information criteria (BIC/AIC), appraised values
R is a statistical programming language and therefore perfectly suited for visualizing data. In this course you will get to know the standard graphic system of R and the underlying concepts. The second part of the course deals with the graphic package ggplot2, which is a popular alternative to the standard graphic system. Even complex graphics can be created quickly and easily.
Apart from basic graphic functions, the focus is also on different possibilities of adaptation which can be used to change the appearance of a graphic. The aim is to enable participants to create statistical graphics and adjust them according to their individual requirements.
Course contents
base graphic system:
 Simple one and two dimensional graphics
 Adjustment of graphics with individual elements
 Adjustment of the appearance
 Export of graphics
ggplot2:
 Introduction to the Grammar of Graphics
 Basics of ggplot2
 Different types of graphics with ggplot2
 Adjustment of ggplot2 graphics
 Complex graphics
Interactive graphics are a flexible and efficient way to analyze data and to present analysis results. Interactive graphic applications offer queries, selections, highlighting or the modification of graphics parameters. In the environment of R, there are various concepts that provide the possibility to create interactive graphics and applications directly out of R. The course presents an overview of the creation of interactive graphics with R and provides the tools to independently implement interactive visualizations in R.
Course contents:
 ggvis: ggvis is similar to the popular graphic package ggplot2 with regard to syntax and extends its functionalities by interactive effects.
 htmlwidgets: htmlwidgets offers an R interface for popular javascript graphic libraries, for example leaflet for visualizing geographical data or visjs for illustrating network graphics.
Being a discipline of data mining, text mining comprises algorithmbased analysis procedures for identifying structures and information in texts with statistical and linguistic means. One exemplary field of application is web mining, which can be used to determine trends and customer needs on websites and social media platforms. Text mining is also used for forecasts of price developments or stock prices based on the news.
This course focuses on the computerlinguistic preparation and cleaning of documents. For this, the common and proven R packages in the natural language processing environment will be introduced and applied. The aim is to produce a text body and edit in such a way as to turn it into a structure that can be analyzed with (multivariate) statistical and data mining procedures.
Course Contents
 Overview of text mining with R
 Reading unstructured data, web scraping
 String manipulation with regular expressions
 Feature enrichment (synonymity/polysemy, text correction, abbreviations, partofspeechtagging)
 Preprocessing (tokenization/sentence splitting: splitting into units, stemming, stop word removal: reduction of useless units)
 Feature extraction (Ngrams, bagofwords model)
 Simple content analyses and association analyses
 Introduction to the classification of documents via clustering and Latent Dirichlet allocation (LDA)
 Prospect of applying monitored training algorithms for classifying documents
R in practice
R in live systems addresses companies which want to transfer R from the experimental stage to live systems. There the requirements on R scripts, selfwritten functions and software infrastructure are higher. Aspects such as maintainability, longterm executability, performance and robustness are also becoming more important. Especially when the code is not monitored by a data scientist every time it is executed, but is embedded in automatized processes.
This course covers all relevant aspects that have to be regarded when introducing and running R in live systems. The target group comprises R developers who want to write codes ready for production.
The assessment of advertising material used and its efficiency is still one of the major challenges of marketing. The course is focusing on the analysis of information from the web tracking.
Statistical Controlling of incoming goods in production, and outgoing goods generate operating figures necessary to rate the quality of goods and products. The requirements to process quality controls systematically are methodical knowledge of statistics as well as of the right software. The open source statistical language R represents an interesting alternative.
The course conveys basic knowledge concerning R which can be used to manage previously processed statistical data. Before they are processed practically with R, the concepts of statistical testing will be introduced theoretically. Furthermore AQL standard values according to ISO 2859 and DIN ISO 3951 will be discussed. Additionally their operation modes and application will be presented related to practical applications. The application of the methods in R covers the most important functions in the area of statistical testing and the development of quality control plans. Essential contents from the area of inference statistics include:
 How can the optimal size of a random sample be determined?
 How can a decision for a specific testing method be made?
 How can operating figures be interpreted?
 Which degree of safety does the result of the random sample contain?
 How can the risks of deliverers and customers be arranged?
 Which discrepancies are acceptable?
The R logo is © 2016 The R Foundation.