The eoda R Academy
The eoda R Academy is the modular training program for R that deals with the various possibilities of the free programming language in a comprehensive and practice-oriented way. The training offer of the R Academy ranges from beginner courses to courses for advanced R users and from data management to the visualization of results.
Our R trainings for companies, universities and graduate centers are evaluated regularly and rated very highly. More than 1,000 satisfied participants speak for the quality of the eoda R Academy. You too can profit from the high-quality course material and the ideal learning environment in small groups with a maximum of eight participants.
On-site in your company or university or via web conferencing – with the R Academy you can be sure to obtain the right know-how for achieving the maximum benefit of using R. Our on-site trainings can be assembled according to individual requirements and adjusted to your needs.
Become an R expert with the eoda R Academy.
R is one of the best means for analyzing and visualizing data, for data mining and business intelligence. R achieves a level of functionality, quality and actuality that has never been accomplished before.
This course is an introduction to R and its basic functions and will facilitate the access to R with practical tips and exercises. This basic course serves R beginners as a starting point for the further application of R in individual usage scenarios.
- First steps into R
- Concept and philosophy of R
- Types of variables and their properties
- Importing data
- Data management
- Evaluations with R
- Generating basic graphics
The foundation of any data analysis is good data management. The major effort of the analyzing process arises from the preparation of the raw data.
The course Data Management with R teaches efficient methods for preparing differently structured data by means of practical examples.
The course focuses especially on working with the packages dplyr, tidyr and data.table.
- data.table (memory-efficient editing and reading of large data sets)
- dplyr (easy and performant syntax for manipulating data frames)
- tidyr (transformation of data sets – from long to wide table and vice versa)
- Dealing with special types of data (editing date variables and string variables)
The combination of extensive statistics libraries and established programming concepts makes R a powerful programming language for any task in the field of data mining, predictive analytics and more. The usage of statistics and graphics functions with programming elements allows for an elegant and efficient automatization of regularly recurring script parts.
Participants will learn to use programming elements by means of practical examples. The aim of this course is to enable participants to apply what they have learned to their individual usage scenarios.
- Loops and control elements/ conditionals
- Vector oriented programming
- Apply family
- Defining own functions
- Environments and scoping
- Object oriented programming/ R class systems
- Exceptions/ error handling
- Package creation
With R, data sources like Excel files and data bases can be transformed into appealing reports. The data is read and evaluated in R and then the contents are published as website or pdf document via an interface between R and html or Latex.
The objects emerging from the analysis such as source code, text, data, formulas, tables and graphics can be combined elegantly within one report. With the on top processes, R creates the dynamic reports at the touch of a button. Modified data are taken into account without the need for manual adjustments.
In this course, participants will get to know the strengths of R in the automatized report creation and learn how to use them. Afterwards the participants should be able to create individual and automatized reports. The contents will be communicated practically based on a theoretical introduction, specific cases and exercises.
- The user interface R Studio
- The packages Sweave and knitR
- Brief introduction to Latex, Markdown and HTML
- Formatting R outputs with Chunk options
- Producing statistical report templates in different output formats such as pdf and html
- Dynamic reports and automatized adjustments
Data mining means the hypothesis-free extraction of information from data. Statistical and mathematical procedures are applied to data bases in order to identify existing patterns and connections. Data mining procedures usually impose low requirements on the level of measurement of data (categorical, ordinal, metric) and are able to identify complex, non-linear connections.In this course, you will learn how to apply data mining procedures by means of practical training data sets. Those examples show you the central steps such as the preparatory data management, the training of algorithms as well as the creation and validation of forecasts and how to directly implement them in R. In the course you will generate R scripts which can serve the participants as a template for their own data mining applications.
Introduction to data mining
Model evaluation (forecast vs. observation, error matrix, ROC, Cut-off-value, AUC, sensitivity, precision, lift, risk analysis, risk chart, ensemble modeling)
Data mining algorithms (decision and regression trees, boosting, random forest, neural networks, Naive Bayes, support vector machine; theory, parameter tuning, model and forecast creation)
Ensemble modeling (techniques and methods, bagging of different models, bagging of an algorithm, SuperLearner package)
Deep learning with H2O
In this course you will get to know analysis methods that will enable you to identify statistical correlations and patterns in your data. The focus is on three classic procedures of multivariate statistics: regression analysis, cluster analysis and factor analysis.
The linear regression analysis enables you to model correlations and influences of different factors on a certain target value. How does the weather affect my number of sales? Which distribution channels are the most successful ones?
The cluster analysis can identify hidden similarities between observations. One classic use case is customer segmentation.
The factor analysis can condense information from different measured values. This can be relevant for the field of anomaly detection for example.
This course can be regarded as an application-oriented introduction to the three analysis procedures with a focus on the use of R. It addresses users who already have a basic knowledge of R and statistics.
- Introduction to linear regression analysis
- Model diagnostics
- Advanced modelling
- Basic concepts of cluster analysis
- Determining similarities
- Hierarchically agglomerative merge algorithms
- Partitioned merge algorithms
- Interpretation and visualization
- Basic idea of the factor analysis
- Process of factor analysis
- Interpretation and evaluation
The aim of the time series analysis is to identify structures and patterns in data with time components, to describe these patterns and derive forecasts from this knowledge. The first type of time series analysis with R is about the handling of date variables and the extraction of calendar information. With this you can discover trends or seasonality in a time series.
A special type of analysis is the so-called survival analysis. It calculates the probability of an event at a certain point in time and is used for churn analyses, predictive maintenance and others.
The aim of this course is to introduce participants to the area of time series analysis and teach them the terminology and methods. Afterwards participants should be able to perform time series analyses with R on their own and use them for their individual use case.
- Introduction to time series methods
- Date variables in R
- Visualizing time series
- Exponential smoothing
- Testing methods
- ARIMA models
To estimate the time span until a special incident occurs, survival-models are used. For example, the prognosis of machine breakdowns or etiopathology are possible application areas. The usage of survival-analyses is taught on the basis of practical representatives. At the end of the course, every attendee should be able to exert the content for his own purpose. To get the best results, we recommend the participation in time series analysis I first.
The following methods are part of the content:
- Introduction to the fundamental terms of survical-analyses
Episodes & censoring, survivor-functions, hazard-rate
- Introduction to the survival-analysis on R
The survival package
Basic concept, Visualization, tabulation, group comparison, significance test
Requirements and approvals, model configuration, the function coxph(), the ties-argument, interpretation of the result
- Time-varying variables & splitting of episodes
The function survSplit()
- Cox regression
Implementation in R, comparison of models, likelihood-ratio-test, information criteria (BIC/AIC), appraised values
R is a statistical programming language and therefore perfectly suited for visualizing data. In this course you will get to know the standard graphic system of R and the underlying concepts. The second part of the course deals with the graphic package ggplot2, which is a popular alternative to the standard graphic system. Even complex graphics can be created quickly and easily.
Apart from basic graphic functions, the focus is also on different possibilities of adaptation which can be used to change the appearance of a graphic. The aim is to enable participants to create statistical graphics and adjust them according to their individual requirements.
base graphic system:
- Simple one and two dimensional graphics
- Adjustment of graphics with individual elements
- Adjustment of the appearance
- Export of graphics
- Introduction to the Grammar of Graphics
- Basics of ggplot2
- Different types of graphics with ggplot2
- Adjustment of ggplot2 graphics
- Complex graphics
Interactive graphics are a flexible and efficient way to analyze data and to present analysis results. Interactive graphic applications offer queries, selections, highlighting or the modification of graphics parameters. In the environment of R, there are various concepts that provide the possibility to create interactive graphics and applications directly out of R. The course presents an overview of the creation of interactive graphics with R and provides the tools to independently implement interactive visualizations in R.
- ggvis: ggvis is similar to the popular graphic package ggplot2 with regard to syntax and extends its functionalities by interactive effects.
Being a discipline of data mining, text mining comprises algorithm-based analysis procedures for identifying structures and information in texts with statistical and linguistic means. One exemplary field of application is web mining, which can be used to determine trends and customer needs on websites and social media platforms. Text mining is also used for forecasts of price developments or stock prices based on the news.
This course focuses on the computer-linguistic preparation and cleaning of documents. For this, the common and proven R packages in the natural language processing environment will be introduced and applied. The aim is to produce a text body and edit in such a way as to turn it into a structure that can be analyzed with (multivariate) statistical and data mining procedures.
- Overview of text mining with R
- Reading unstructured data, web scraping
- String manipulation with regular expressions
- Feature enrichment (synonymity/polysemy, text correction, abbreviations, part-of-speech-tagging)
- Pre-processing (tokenization/sentence splitting: splitting into units, stemming, stop word removal: reduction of useless units)
- Feature extraction (N-grams, bag-of-words model)
- Simple content analyses and association analyses
- Introduction to the classification of documents via clustering and Latent Dirichlet allocation (LDA)
- Prospect of applying monitored training algorithms for classifying documents
R in practice
R in live systems addresses companies which want to transfer R from the experimental stage to live systems. There the requirements on R scripts, self-written functions and software infrastructure are higher. Aspects such as maintainability, long-term executability, performance and robustness are also becoming more important. Especially when the code is not monitored by a data scientist every time it is executed, but is embedded in automatized processes.
This course covers all relevant aspects that have to be regarded when introducing and running R in live systems. The target group comprises R developers who want to write codes ready for production.
The assessment of advertising material used and its efficiency is still one of the major challenges of marketing. The course is focusing on the analysis of information from the web tracking.
Statistical Controlling of incoming goods in production, and outgoing goods generate operating figures necessary to rate the quality of goods and products. The requirements to process quality controls systematically are methodical knowledge of statistics as well as of the right software. The open source statistical language R represents an interesting alternative.
The course conveys basic knowledge concerning R which can be used to manage previously processed statistical data. Before they are processed practically with R, the concepts of statistical testing will be introduced theoretically. Furthermore AQL standard values according to ISO 2859 and DIN ISO 3951 will be discussed. Additionally their operation modes and application will be presented related to practical applications. The application of the methods in R covers the most important functions in the area of statistical testing and the development of quality control plans. Essential contents from the area of inference statistics include:
- How can the optimal size of a random sample be determined?
- How can a decision for a specific testing method be made?
- How can operating figures be interpreted?
- Which degree of safety does the result of the random sample contain?
- How can the risks of deliverers and customers be arranged?
- Which discrepancies are acceptable?