![]() ![]() At the same time this task computes some handy summary statistics (that are stored as additional "meta attributes" in the header), such as count, sum, sum squared, min, max, num missing, mean, standard deviation and frequency counts for nominal values. This is particularly important because, as Weka users know, Weka is quite particular about metadata - especially when it comes to nominal attributes. Determining a unified ARFF header from separate data chunks in CSV format.In the future there could be other wrappers - one based on the Spark platform would be cool.īase map and reduce tasks distributedWekaBase version 1.0 provides tasks for: ![]() The second, called distributedWekaHadoop, provides Hadoop-specific wrappers and jobs for these base tasks. It provides base "map" and "reduce" tasks that are not tied to any specific distributed platform. The first new package is called distributedWekaBase. This series of posts is continued in part 2 and part 3. This post is the first of three that outlines what's available, in terms of distributed processing functionality, in several new packages for Weka 3.7. How to handle large datasets with Weka is a question that crops up frequently on the Weka mailing list and forums. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
December 2022
Categories |