# stream-dm **Repository Path**: mirrors/stream-dm ## Basic Information - **Project Name**: stream-dm - **Description**: streamDM，是由华为诺亚方舟实验室开源的使用 Spark Streaming 挖掘大数据的开源软件 - **Primary Language**: Scala - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 8 - **Forks**: 0 - **Created**: 2017-04-03 - **Last Updated**: 2025-09-06 ## Categories & Tags **Categories**: big-data **Tags**: None ## README # streamDM for Spark Streaming streamDM is a new open source software for mining big data streams using [Spark Streaming](https://spark.apache.org/streaming/), started at [Huawei Noah's Ark Lab](http://www.noahlab.com.hk/). streamDM is licensed under Apache Software License v2.0. ## Big Data Stream Learning Big Data stream learning is more challenging than batch or offline learning, since the data may not keep the same distribution over the lifetime of the stream. Moreover, each example coming in a stream can only be processed once, or they need to be summarized with a small memory footprint, and the learning algorithms must be very efficient. ### Spark Streaming [Spark Streaming](https://spark.apache.org/streaming/) is an extension of the core [Spark](https://spark.apache.org) API that enables stream processing from a variety of sources. Spark is a extensible and programmable framework for massive distributed processing of datasets, called Resilient Distributed Datasets (RDD). Spark Streaming receives input data streams and divides the data into batches, which are then processed by the Spark engine to generate the results. Spark Streaming data is organized into a sequence of DStreams, represented internally as a sequence of RDDs. ### Included Methods In this current release of StreamDM v0.2, we have implemented: * [SGD Learner](http://huawei-noah.github.io/streamDM/docs/SGD.html) and [Perceptron](http://huawei-noah.github.io/streamDM/docs/SGD.html#perceptron) * [Naive Bayes](http://huawei-noah.github.io/streamDM/docs/NB.html) * [CluStream](http://huawei-noah.github.io/streamDM/docs/CluStream.html) * [Hoeffding Decision Trees](http://huawei-noah.github.io/streamDM/docs/HDT.html) * [Bagging](http://huawei-noah.github.io/streamDM/docs/Bagging.html) * [Stream KM++](http://huawei-noah.github.io/streamDM/docs/StreamKM.html) we also implemented following [data generators](http://huawei-noah.github.io/streamDM/docs/generators.html): * HyperplaneGenerator * RandomTreeGenerator * RandomRBFGenerator * RandomRBFEventsGenerator We have also implemented [SampleDataWriter](http://huawei-noah.github.io/streamDM/docs/SampleDataWriter.html), which can call data generators to create sample data for simulation or test. In the next release of streamDM, we are going to add: * Classification: Random Forests * Multi-label: Hoeffding Tree ML, Random Forests ML * Frequent Itemset Miner: IncMine For future works, we are considering: * Regression: Hoeffding Regression Tree, Bagging, Random Forests * Clustering: Clustree, DenStream * Frequent Itemset Miner: IncSecMine ## Going Further For a quick introduction to running StreamDM, refer to the [Getting Started](http://huawei-noah.github.io/streamDM/docs/GettingStarted.html) document. The StreamDM [Programming Guide](http://huawei-noah.github.io/streamDM/docs/Programming.html) presents a detailed view of StreamDM. The full API documentation can be consulted [here](http://huawei-noah.github.io/streamDM/api/index.html). ## Environment * Spark 2.3.2 * Scala 2.11 * SBT 0.13 * Java 8+ ## Mailing lists ### User support and questions mailing list: streamdm-user@googlegroups.com ### Development related discussions: streamdm-dev@googlegroups.com