You are viewing the RapidMiner Radoop documentation for version 7.6 - Check here for latest version
Enhancements and bug fixes
The following improvements are part of RapidMiner Radoop 7.5.
Enhancements
- Hive-on-Spark container re-use is now leveraged to improve the performance of processes that contain a lot of preprocessing operators; decrease of process runtime is expected to be huge, especially if there are a lot of operators and the input data is not too big; the feature is enabled by default, it can be disabled by unchecking Enable Hive on Spark container reuse, and it can be tuned by changing global Radoop preferences
- New parallelized Loop (Radoop) operator
- New parallelized Loop Attributes (Radoop) operator
- SparkRM now can do bootstrapping that can be used to train multiple models in parallel on sampled input data sets
- SparkRM parameters related to resource allocation now have more descriptive name and description (cluster resources %, executor memory %, force resource calculation)
- Improved resource allocation heuristics in SparkRM and other Spark based operators now takes only usable nodes into account
- Added purge option to Drop Table, Copy Table, Rename Table and Read CSV operators and to Drop object(s) action on Hadoop Data View, so that tables can be dropped even if the Trash folder is in a different HDFS encrypted zone
- SparkRM can now use a specified value instead of missing values when resolving schema conflict between the outputs of different partitions (handle missing attributes)
- Hive Script now allows scripts that doesn't expose output
Bug fixes
- BUGFIX: Fixed NullPointerException on Radpidminer Server with old connections containing accesswhitelist
- BUGFIX: Fixed Enable impersonation on server checkbox on Radoop Connection Properties dialog, its state can now be changed independently from Enable Security
- BUGFIX: Fixed Job kill connection test with Hive-on-Spark container re-use enabled clusters
- BUGFIX: Fixed error message in import (Check your import settings and try again)
- BUGFIX: Fixed error messages appearing in logs during Studio close
- BUGFIX: Fixed rare error message handling issue of SparkRM
- BUGFIX: Fixed rare timeout issue with Hadoop APIs, timeout now respects Connection timeout setting
- Further bugfixes from 7.4.1 release