城乡社区社会管理湖北省协同创新中心研究信息中心

培训教学

当前位置：首页培训教学培训教学

SupR: Multithreaded and Distributed R for Big Data Analysis

来源：浏览量：1993 更新时间：2017-06-22

a R-style front-end by maintaining the existing R syntax and internal basic data structures,

a Java-like multithreading model, which would be the key to the success of big data analysis,

a Spark-like cluster computing environment, and

a builtin Simple Distributed File System, which, to some extent, represents a kind of cluster-wide namespace.

Fundamental Work (deprecated/under construction)

SupR Threading: Basic Functions

new.thread(expr, env=parent.frame(), stacksize=NULL, start=TRUE)
# creates a java-like thread (lightweight process) object, which runs
# as a best-possible task runner in parallel and distributed computing

start.thread(thread)
# must be called to run when the thread was created by new.thread with
# start = FALSE

current.thread()
# returns a character object as the current thread's name or id

interrupt(thread, expr=NULL)
# provides a simple way to interact with running threads

cancel.thread(thread)
# cancels the thread named 'thread'

thread.info(...)
# reports infomation on threads.

SupR Concurrency: Some Basic Functions

sync.eval(x, expr, env=parent.frame())
# with the object x synchronized, evaluates the expression expr
# in the environment env

wait(m, timeout=0L)
# causes the current thread to wait until another thread invokes
# the notify() function for the R object m

notify(m, all=FALSE)
# wakes up a single thread or all threads (specified with all=TRUE)
# waiting on the m's monitor

set.synchronized(fun, value=TRUE)
# makes the function fun to be synchronized (value=TRUE)

is.synchronized(obj)
# tests for object synchronization state

SupR Cluster: Four Basic Functions

start.master(port, url=NULL)

start.worker(master, url=NULL)
# runs on each node machine. It also starts and monitors an executor that
# runs on the same node machine to launch threads to run assigned tasks.

start.driver(master)
# starts a driver session.

start.cluster(master, workers, ...)
# launches a cluster with a specified master machine and worker
# node machines etc.

SupR Distributed Data and File System

distribute(x, ...)
# a generic method that distributes/parallelizes object x

iterator(has.next, get.next, envir)
# creates an iterator that provides a practical way of accessing distributed subsets.
# Relevant functions include as.iterator(x), has.next(iter),
# get.next(iter), and as.list(iter) or, more exactly, as.list.iterator(iter).

SupR: a Set MapReduce-type Functions

filter(x, fun, init = NULL, env = parent.frame())
# returns components of data 'x' for which fun returns TRUE

foreach(x, fun, init = NULL, env = parent.frame())
# applied fun to each component of data 'x'. It doesn't collect the returned
# values of fun.

map(x, fun, init = NULL, env = parent.frame())
# applies fun to each component of data 'x'.

reduce(x, fun, init = NULL, env = parent.frame(), by.key = FALSE)
# implements the reduce method of the map-reduce algorithm.

map.reduce(x, map = NULL, reduce = NULL, init = NULL, env = parent.frame(), 
    by.key = FALSE, modifier = "map")
# implements the simple but powerful map-reduce algorithm.

SupR: a set of Iterative MapReduce-type Functions
```
UNDER DEVELOPMENT
```

SupR: Miscellaneous Functions

implicit(class, fun=NULL)
# marks objects of class 'class' as functions.
# For example, ...

iterator(has.next, get.next, envir)
# creates an iterator that provides a practical way of accessing distributed subsets.
# Relevant functions include as.iterator(x), has.next(iter),
# get.next(iter), and as.list(iter) or, more exactly, as.list.iterator(iter).

标签：

上一篇：优秀的微观计量经济学的学习资料

下一篇：如何巧用微观数据做实证研究？常见思路及方法

热点新闻

微信公众号

官网