The majority of open-source code, software and packages contributed by ACEMS members is in the R language.
R is a leading open-source language and environment globally for both researchers and data scientists, enabling statistical computing, data analysis, graphics, applications specific to a range of domains (such as genetics, econometrics, clinical trials and environmental monitoring), and much more, as shown in the Comprehensive R Archive Network’s CRAN topic areas page. ACEMS R Software and Service highlights for 2020 include:
Each of the lectures were scheduled in the lunchtime timeslot locally, and although timezones were a factor, it was wonderful to attract the broader global mathematical and statistical community including audiences across Australia, as well as North and South America, Europe, Britain, Asia, India, Middle East and New Zealand. ACEMS was incredibly impressed with the interest in the lecture series with more than 1200 individual logons across the 9-lecture series, and with dynamic discussion via the Q & A feature.
ACEMS members added at least ten R packages to CRAN in 2020. See Table 1 below for details. Note that this list is not exhaustive with respect to either R or other open source packages created by ACEMS members. For example, R code and packages are often outputs of collaborative research projects, such as this free software program "Predicting seagrass decline due to cumulative stressors", and may be shared in other repositories such as GitHub.
Table 1: Details of Some New R Packages Created in 2020 by ACEMS members and collaborating authors | ||||
---|---|---|---|---|
R package | Maintainer | Package Title | Description | Downloads Count; monthly average |
distributional | Mitchell O'Hara-Wild | Vectorised Probability Distributions | The distributional package allows distributions to be used in a vectorised context. Vectorised distribution objects with tools for manipulating, visualising, and using probability distributions. Designed to allow model prediction outputs to return distributions rather than their parameters, allowing users to directly interact with predictive distributions in a data-oriented workflow. In addition to providing generic replacements for p/d/q/r functions, other useful statistics can be computed including means, variances, intervals, and highest density regions. | 14,6775; 12,231 |
fable.prophet | Mitchell O'Hara-Wild | Prophet Modelling Interface for 'fable' | Allows prophet models from the 'prophet' package to be used in a tidy workflow with the modelling interface of 'fabletools'. This extends 'prophet' to provide enhanced model specification and management, performance evaluation methods, and model combination tools. | 8282; 690 |
seer | Thiyanga Talagala | Feature-Based Forecast Model Selection | A novel meta-learning framework for forecast model selection using time series features. Many applications require a large number of time series to be forecast. Providing better forecasts for these time series is important in decision and policy making. We propose a classification framework which selects forecast models based on features calculated from the time series. We call this framework FFORMS (Feature-based FORecast Model Selection). FFORMS builds a mapping that relates the features of time series to the best forecast model using a random forest. 'seer' package is the implementation of the FFORMS algorithm. For more details read the paper. | 7616; 635 |
airt | Sevvandi Kandanaarachchi | Evaluation of Algorithm Collections Using Item Response Theory | An evaluation framework for algorithm portfolios using Item Response Theory (IRT). We use continuous and polytomous IRT models to evaluate algorithms and introduce algorithm characteristics such as stability, effectiveness and anomalousness (Kandanaarachchi, Smith-Miles 2020) <doi:10.13140/RG.2.2.11363.09760>. | 5230; 436 |
gratis | Yanfei Kang | Generating Time Series with Diverse and Controllable Characteristics | Generates time series based on mixture autoregressive models. Kang,Y.,Hyndman,R.,Li,F.(2020)<doi:10.1002/sam.11461>. | 4412; 368 |
nortsTest | Asael Alonzo Matamoros | Assessing Normality of Stationary Process | Despite that several tests for normality in stationary processes have been proposed in the literature, consistent implementations of these tests in programming languages are limited. Four normality test are implemented. The Lobato and Velasco's, Epps, Psaradakis and Vavra, and the random projections tests for stationary process. Some other diagnostics such as, unit root test for stationarity, seasonal tests for seasonality, and arch effect test for volatility; are also performed. The package also offers residual diagnostic for linear time series models developed in several packages. | 3641; 303 |
tsibbletalk | Earo Wang | Interactive Graphics for Tsibble Objects | A shared tsibble data easily communicates between htmlwidgets on both client and server sides, powered by 'crosstalk'. A shiny module is provided to visually explore periodic/aperiodic temporal patterns. | 2814; 234 |
composits | Sevvandi Kandanaarachchi | Compositional, Multivariate and Univariate Time Series Outlier Ensemble | An ensemble of time series outlier detection methods that can be used for compositional, multivariate and univariate data. It uses the four R packages 'forecast', 'tsoutliers', 'otsad' and 'anomalize' to detect time series outliers. | 2559; 213 |
DSjobtracker | Thiyanga S. Talagala | What Skills and Qualifications are Required for Data Science Related Jobs? | Dataset containing information about job listings for data science job roles. | 1514; 126 |
brolgar | Nick Tierney | BRowse Over Longitudinal Data Graphically and Analytically in R | Brolgar helps you browse over longitudinal data graphically and analytically in R, by providing tools to: efficiently explore raw longitudinal data; calculate features (summaries) for individuals; and evaluate diagnostics of statistical models This helps you go from a messy “plate of spaghetti” plot to “interesting observations”. The tools and workflows in brolgar are designed to work with a special tidy time series data frame called a tsibble. We can define our longitudinal data in terms of a time series to gain access to some really useful tools. To do so, we need to identify three components: 1. the key variable in your data is the identifier of your individual; 2. the index variable is the time component of your data; 3. he regularity of the time interval (index). Longitudinal data typically has irregular time periods between measurements, but can have regular measurements. Together, time index and key uniquely identify an observation. | 1447; 121 |
ACEMS members maintain a diversity of R Packages which remain popular amongst CRAN’s diversity of international users. There are currently 107 R Packages created by ACEMS members and maintained on CRAN; Table 2 below provides the details and total user download numbers in descending order. Whilst total downloads are an indicator of value to end users, some packages with smaller downloads may have niche user groups and/or otherwise deliver significant impact from use.
The most popular R Package by download is rmarkdown – with more than 2.4 million total downloads by CRAN users.
The significant number of downloads of R packages is indicative of the value and utility of these packages.
Total downloads of ACEMS members’ R packages on CRAN number:
Table 2: Downloads for all ACEMS members’ R packages maintained and updated on CRAN | |||
---|---|---|---|
R Package | Maintainer | Current version | User downloads |
rmarkdown | Yihui Xie | 2.7 | 24709717 |
forecast | Rob J Hyndman | 8.14 | 8757355 |
fracdi | Martin Maechler | 1.5-1 | 5436168 |
GGally | Barret Schloerke | 2.1.1 | 2753057 |
KernSmooth | Brian Ripley | 2.23-18 | 1671269 |
DescTools | Andri Signorell | 0.99.41 | 1497503 |
expsmooth | Rob J Hyndman | 2.3 | 1374944 |
fma | Rob J Hyndman | 2.4 | 912293 |
fpp | Rob J Hyndman | 0.5 | 740974 |
imputeTS | Steen Moritz | 3.2 | 682535 |
naniar | Nicholas Tierney | 0.6.0 | 340371 |
fpp2 | Rob J Hyndman | 2.4 | 326577 |
visdat | Nicholas Tierney | 0.5.3 | 323522 |
hts | Earo Wang | 6.0.1 | 322147 |
tsibble | Earo Wang | 1.0.1 | 309589 |
hdrcde | Rob J Hyndman | 3.4 | 217990 |
rainbow | Han Lin | 3.6 | 176498 |
tsfeatures | Rob J Hyndman | 1.0.2 | 174831 |
thief | Rob J Hyndman | 0.3 | 173083 |
fabletools | Mitchell O'Hara-Wild | 0.3.1 | 170029 |
distributional | Mitchell O'Hara-Wild | 0.2.2 | 146775 |
fable | Mitchell O'Hara-Wild | 0.3.0 | 142023 |
xaringan | Yihui Xie | 0.2 | 141264 |
ggmosaic | Haley Jeppson | 0.3.3 | 131162 |
feasts | Mitchell O'Hara-Wild | 0.2.1 | 128994 |
fds | Han Lin | 1.8 | 125116 |
ftsa | Han Lin | 6.0 | 119576 |
Mcomp | Rob J Hyndman | 2.8 | 112929 |
demography | Rob J Hyndman | 1.22 | 81649 |
TSclust | Pablo Montero | 1.3.1 | 78194 |
bfast | Jan Verbesselt | 1.5.7 | 52654 |
tsibbledata | Mitchell O'Hara-Wild | 0.3.0 | 47950 |
LowRankQP | John T Ormerod | 1.0.4 | 47304 |
feature | Tarn Duong | 1.2.15 | 47202 |
fpp3 | Rob J Hyndman | 0.4.0 | 44904 |
season | Adrian Barnett | 0.3.12 | 43720 |
tourr | Di Cook | 0.6.0 | 41058 |
robets | Ruben Crevits | 1.4 | 39863 |
SSN | Jay VerHoef | 1.1.15 | 36810 |
geozoo | Barret Schloerke | 0.5.1 | 36024 |
CEoptim | Benoit Liquet | 1.2 | 35225 |
sugrrants | Earo Wang | 0.2.8 | 34381 |
bayesImageS | Matt Moores | 0.6-1 | 34332 |
nullabor | Di Cook | 0.3.9 | 34125 |
geomnet | Sam Tyner | 0.3.1 | 32010 |
vegawidget | Ian Lyttle | 0.3.2 | 31099 |
shinycustomloader | Emi Tanaka | 0.9.0 | 29336 |
vitae | Mitchell O’Hara-Wild | 0.4.2 | 29200 |
stR | Alexander Dokumentov | 0.4 | 28089 |
dma | Hana Sevcikova | 1.4-0 | 27096 |
DescribeDisplay | Di Cook | 0.2.7 | 26813 |
staplr | Priyanga Dilini Talagala | 3.1.1 | 25893 |
curvHDR | Matt Wand | 1.2-1 | 25813 |
dobson | Adrian Barnett | 0.4 | 24995 |
MissingDataGUI | Xiaoyue Cheng | 0.2-5 | 23737 |
emma | Laura Villanova | 0.1-0 | 23638 |
ggenealogy | Lindsay Rutter | 1.0.1 | 23452 |
edrGraphicalTools | Benoit Liquet | 2.2 | 23120 |
rwalkr | Earo Wang | 0.5.5 | 23111 |
binb | Dirk Eddelbuettel | 0.0.6 | 21181 |
gammSlice | Matt Wand | 2.0-2 | 20755 |
sgPLS | Benoit Liquet | 1.7 | 20159 |
eechidna | Jeremy Forbes | 1.4.1 | 20140 |
queuecomputer | Anthony Ebert | 1.1.0 | 19487 |
BSL | Ziwen An | 3.2.0 | 18893 |
ggquiver | Mitchell O’Hara-Wild | 0.2.0 | 17896 |
MergeGUI | Xiaoyue Cheng | 0.2-1 | 17602 |
MatTransMix | Xuwen Zhu | 0.1.13 | 17202 |
MBSGS | Benoit Liquet | 1.1.0 | 17188 |
smoothAPC | AlexanderD okumentov | 0.3 | 16187 |
colmozzie | Thiyanga Talagala | 1.1.1 | 16182 |
binostics | Ursula Laa | 0.1.3 | 14425 |
diffpriv | Benjamin Rubinstein | 0.4.2 | 14239 |
mozzie | Thiyanga Talagala | 0.1.0 | 14177 |
gimmEP | Matt Wand | 1.0-3.1 | 13673 |
PPforest | Natalia da Silva | 0.1.1 | 13670 |
quokar | Wenjing Wang | 0.1.0 | 13031 |
serrsBayes | Matt Moores | 0.4-1 | 12518 |
ozmaps | Michael Sumner | 0.4.0 | 11692 |
tourrGui | Di Cook | 0.4 | 11611 |
HRW | Matt Wand | 1.0-4 | 11085 |
starmie | Stuart Lee | 0.1.2 | 10678 |
spinifex | Nicholas Spyrison | 0.2.7 | 10349 |
taipan | Stephanie Kobakian | 0.1.2 | 10136 |
gravitas | Sayani Gupta | 0.1.3 | 9742 |
dobin | Sevvandi Kandanaarachchi | 1.0.2 | 9503 |
fable.prophet | Mitchell O’Hara-Wild | 0.1.0 | 8282 |
sugarbag | Stephanie Kobakian | 0.1.3 | 7879 |
seer | Thiyanga Talagala | 1.1.5 | 7616 |
stray | Priyanga Dilini Talagala | 0.1.1 | 7506 |
eventstream | Sevvandi Kandanaarachchi | 0.1.0 | 7425 |
oddstream | Priyanga Dilini Talagala | 0.5.0 | 6562 |
syn | Nicholas Tierney | 0.1.0 | 6244 |
airt | Sevvandi Kandanaarachchi | 0.2.0 | 5230 |
gratis | Yanfei Kang | 0.2.1 | 4412 |
spinebil | Ursula Laa | 0.1.0 | 4199 |
nortsTest | Asael Alonzo Matamoros | 1.0.0 | 3641 |
tsibbletalk | Earo Wang | 0.1.0 | 2814 |
composits | Sevvandi Kandanaarachchi | 0.1.0 | 2559 |
DSjobtracker | Thiyanga Talagala | 0.1.1 | 1514 |
brolgar | Nicholas Tierney | 0.1.0 | 1447 |
bayesforecast | Asael Alonzo Matamoros | 0.0.1 | 1166 |
lookout | Sevvandi Kandanaarachchi | 0.1.0 | 933 |
spotoroo | Weihao Li | 0.1.1 | 391 |
nestr | Emi Tanaka | 0.1.1 | 363 |
ferrn | H. Sherry Zhang | 0.0.1 | 342 |
MedLEA | Thiyanga Talagala | 1.0.1 | 35 |
ACEMS members contribute to the understanding, adoption, and use of open source tools, including R packages, through their work in running R events - such as hackathons (creating and applying R tools), speaking at conferences, delivering training workshops, and producing guides.
Below is an overview of these types of R Services provided by ACEMS members in 2020.
In February 2020, before COVID-19 lockdowns, ACEMS Monash hosted an R hackathon for academia and industry. It was attended by ACEMS Partner Organisation AT&T’s Emily Dodwell who was visiting ACEMS nodes from AT&T New York. Emily is an R enthusiast, committed to promoting gender diversity in the community, an organizer of R-Ladies New York City, and a member of R Forwards - the R Foundation task force on women and other under-represented groups.
ACEMS Monash R hackathon participants, including ACEMS Partner AT&T’s Emily Dodwell pictured with ACEMS Nick Tierney, an R package author, and a promotion for Emily’s presentation at the R Conference New York featuring collaborative work with ACEMS’ Di Cook
The following presentations/workshops were delivered by ACEMS members/partners in 2020:
Watch the presentation on “Rmarkdown” or use of open source data/tools for understanding Bushfire Ignition
Rob Hyndman has featured in various podcasts covering open source software topics, including:
ACEMS members have been actively educating others to harness open source tools. In 2020, some key resources further developed include:
Above: free online forecasting textbook referencing R tools, and the Cancer Atlas guide