面向对象指南

本章是对在使用R中如何识别和使用面向对象系统的一个指南。R有三套面向对象系统(另加基础类),听起来有些吓人。本章不是为了让你成为使用所有这四套系统的专家,而是帮助你如何识别所使用的面向对象系统的类型以及正确地运用R中的面向对象编程。

一个面向对象系统的核心是其实现的(class)和方法(method)。用来定义一个对象的行为,描述该对象的属性和与其他类之间的关系。方法则是与特定类的对象关联的函数。类可以用来决定方法的选择,即函数根据输入参数的类的不同来选择相应的方法。类通常被定义成分层的结构:如果一个子类没有对应的方法,则调用它父类的方法;子类继承父类的行为。

R中三套面向对象系统的主要区别在于定义方法的方式:

  • S3是一种叫做范型函数的面向对象系统。这不同于其他编程语言(比如Java,C++,和C#)中基于消息传递的面相对象系统。在消息传递中,消息(方法)传给对象,对象来决定使用哪个方法。这种对象有一个典型的调用方法的格式,即对象名通常出现在方法/消息的前面:比如 canvas.drawRect("blue")。S3则不同。S3中计算也是通过方法来执行,但由一种特殊的函数,即范型函数来决定该调用哪个方法,比如 drawRect(canvas, "blue")。S3是一种很随意的体系,它没有对类的正规定义。

  • S4的工作机制和S3类似,但是更加正式。S4和S3有两点主要的区别:首先S4对类有正规的定义,来描述该类的属性和继承关系;另外S4有特定的函数来定义范型和方法。S4还有多重派遣特性,即S4的范型函数可以根据任意多个参数的类来选择多个方法,而不仅仅是根据一个参数的类。

  • 引用类(Reference Classes),简称为RC,不同于以上的的S3和S4对象系统。RC实现的基于信息传递的面向对象系统,因此它的方法属于类,而不是函数。RC的对象和方法之间用$隔开,所以调用方法的形式如:canvas$drawRect(“blue”)。RC的对象是可修改的,它用的不是R平常的拷贝再修改(copy-on-modify)语义,而是可以直接修改。这加大了程序推理的难度,但是却可以帮助解决一些S3、S4难以解决的问题。

R还有另外一种系统,虽然不是完全的面向对象,但还是有必要提一下:

  • 基础类,主要使用C语言代码来操作。了解R中的基础类很重要,因为它为实现R中其他的对象系统提供了基础。

以下内容将从基础类开始,逐个介绍R中的每种面向对象系统。你将学习到怎样判断一个对象是属于哪种对象系统、如何实现方法的调用,以及在该对象系统下如何创建新的对象、类、泛型和方法。最后是针对实际编程中何时使用和如何选择对象系统的一些参考意见。

前提

你需要运行install.packages("pryr")来安装pryr包,以便使用该包的一些有用的函数来查看对象的信息。

测试

做做这个简单的测试看看你是否需要阅读本章内容。如果你能很快地得到答案,你可以轻松地跳过本章。本章最后提供参考答案.

  1. 如何区分一个对象属于哪种面向对象系统(基础类,S3,S4还是引用类)?

  2. 如何确定一个对象的基本类型(如整型或者列表)?

  3. 什么是范型函数?

  4. S3和S4之间的主要差异是什么?S4和RC之间最主要的差异又是什么?

概要
  • 基础类 介绍R的基础类面相对象系统。只有R-core的成员可以给这个系统添加新的类。了解基础类很重要,因为它是实现其他三种对象系统的基础。

  • S3 简要介绍S3面向对象系统。这是R中最简单也是最常用的面向对象系统。

  • S4 讨论R中更正式和更严格的S4面向对象系统。

  • RC 介绍R最新的面向对象系统:引用类(简称RC)。

  • 系统选择 关于在启动项目前如何选择合适的面向对象系统的一些建议。

基础类

R中每一个对象的底层都是一个描述该对象在内存中的存储方式的C结构体(struct)。该结构体包括对象的内容,内存管理的信息,以及一个。这也就是R中的基础类。基础类并不是真的面相对象系统,因为只有R语言的核心团队才能创建新的类型。R团队很少添加新的基础类:最近的一次更新是在2011年,添加了两个你在R中从来没见过的奇异类(NEWSXPFREESXP),但是对于R中的内存诊断很有帮助。在那之前,2005年R团队为S4对象添加的一个特殊的基础类(S4SXP)。

数据结构一章里介绍了最常用的两种基础类:原子向量和列表(atomic vectors和lists)。除此之外,基础类还包括函数(functions),环境(environments)和一些在本书后面会介绍的奇异对象,比如names,calls和promises。你可以使用typeof()来了解对象的基本类型。但基础类型的名字在R中并不总是一致的,并且类型名和对应的"is" 函数名可能会使用不同的名字:

# The type of a function is "closure"
f <- function() {}
typeof(f)
is.function(f)

# The type of a primitive function is "builtin"
typeof(sum)
is.primitive(sum)

你可能已经听说过mode()storage.mode()。但是我建议大家忽略这几个函数,因为他们仅仅是typeof()返回值的一些别称,为了与S语言兼容。如果你想知道他们到底是干什么的,不妨读读他们的源代码。

大部分会判断基础类来做出不同处理的函数都是用C语言实现的,其中任务的分派是使用switch语句来判断的(比如,switch(TYPEOF(x)))。即使你从来没有使用过C语言,了解基础类对你来说也是非常重要的。因为R中的其他对象都是建立在这些基础类之上的:S3对象可以创建在任何基础类之上,S4对象使用一种特殊的基础类,RC对象是S4与环境(另外一种基础类)的结合。检测一个对象是否是纯基础类,比如一个对象没有S3,S4以及RC的行为,那么is.object(x)的返回值应该是FALSE

S3

S3是R中最简单的面向对象系统。S3是在base和stats包中唯一使用到的面向对象系统,也是CRAN包中使用最普遍的面向对象系统。S3既专特又通俗,简洁中又带有几分优雅。

认识对象,泛函数和方法

你所遇到的大部分对象都是S3对象,但是在base R中却没有一个直接的方法来检测一个对象是否为S3。你能想到最接近该目的的方法可能是is.object(x) & !isS4(x),即x是一个对象却不是S4对象。pryr包中提供了一个简单的函数pryr::otype()来输出一个对象的类型。

library(pryr)

df <- data.frame(x = 1:10, y = letters[1:10])
otype(df)    # 数据框属于S3
otype(df$x)  # 数值向量属于base
otype(df$y)  # 因子属于S3

在S3中,方法属于函数,称为泛函数。S3方法不属于对象和类,这和大多数其他的编程语言不同,可是却又是真实的面向对象系统。

检测一个函数是否为S3泛函数,可以通过查看他的源代码是否包含UseMethod():这是S3中用来判断方法调用的函数,也就是方法分派。类似于otype(),pryr提供ftype()函数来描述一个对象的函数调用:

mean
ftype(mean)

有些S3泛函数,比如[sum()cbind(),它们不使用UseMethod(),因为他们是用C语言实现的。实际上,它们调用C函数DispatchGroup()或者DispatchOrEval()。 C语言中执行方法调用的函数被称为内部泛函数(internal generics),可以使用?"internal generic"来查看其文档。 ftype()能够处理这些特殊S3泛函数。

S3泛函数会根据给定对象的类型来调用对应的S3方法。S3方法可以通过它的名字来识别,通常它们的名字形如generic.class()。比如,泛函数mean()对日期(Date)对象的调用方法叫做mean.Date(),而泛函数print()对因子(factor)对象的调用方法为print.factor()

这是为什么近出的一些编程风格不提倡在函数名字中使用.的原因:因为这样的名字会让他们看起来像一个S3方法。比如,函数t.test()是表示对应test对象的t方法吗?同样,在类型名字中加.也会引起混淆:比如print.data.frames是表示对data.framesprint()方法呢,还是表示对framesprint.data()方法?使用pryr::ftype()能够识别一个函数是S3方法还是泛函数:

ftype(t.data.frame) # 对应数据框类的t方法
ftype(t.test)       # 名为t.test的泛函数

可以使用methods()来查看一个泛函数包括的所有方法:

methods("mean")
methods("t.test")

(除了在base包中定义的方法,大部分的S3方法是隐藏的,不过可以使用getS3method()来查看它们的源代码。)

你也可以查看一个类型包含的所有泛函数:

methods(class = "ts")

但是我们不可能列出所有的S3类,下面会讲到原因。

定义类和创建对象

S3是一个简单而又特殊的面向对象系统;它对类没有严格的定义。对于一个对象,你可以简单地通过设置它的类属性来将其实例化为某个类。你可以在创建对象实使用structure()来定义它的类,或则在之后使用class<-()来修改:

# 一步创建对象和定义类
foo <- structure(list(), class = "foo")

# 先创建,后修改类
foo <- list()
class(foo) <- "foo"

S3对象通常是由带特定属性的列表或原向量来创建。也可以将函数转换成S3对象。其他的基础类在R的S3中要么比较罕见,要么因语义比较反常而对属性的支持性较差。

你可以通过使用class(x)来查看任一对象的类,使用inherits(x,"calssname")来查看该对象是否继承自某个特定类。

class(foo)  
inherits(foo, "foo")

一个S3对象的类可以是一个向量,按特异性从高到低排列。比如函数glm()的类为c("glm", "lm"),这表示泛线性模型的行为继承自线性模型。类名称通常都是小写,并且尽量避免在类名称中使用.。对于多个词组成的类名称建议使用下划线my_class或驼峰命名法MyClass

大部分的S3类都有一个构造函数,例如:

foo <- function(x) {
  if (!is.numeric(x)) stop("X 必须是 numeric")
  structure(list(x), class = "foo")
}

如果可以,通常建议使用以上类似的构造函数来创建S3类。这能确保你在使用正确的组成来构建你的类。并且,构造函数通常和类有一样的名称。

除了开发者提供的创建函数,S3对类本身并没有检验测试。这意味着你可以任意修改已有对象的类。

# 创建一个线性模型
mod <- lm(log(mpg) ~ log(disp), data = mtcars)
class(mod)
print(mod)

# 将它转换成 data frame 类(?!)
class(mod) <- "data.frame"
# 可是这样并不成功
print(mod)
# 然而,你的数据依然在那
mod$coefficients

如果你是用其他面向对象语言,这可能让你感觉不适应。可是这种灵活性很少带来麻烦:就像你可以修改一个对象的类,可是你却通常不会这样做。R并不保护你来避免自我伤害:就像你可以轻易地拿着枪射向自己的脚。可是只要你不瞄准被的脚,不扣动扳机,你就不会有被射到的危险。

创建新的方法和泛函数

创建一个泛函数的方法如下:创建一个函数(这里标记为封装函数),在封装函数内部调用UseMethod()函数。UseMethod()函数有两个参数:一个是泛函数的名称,另外一个是决定方法调用的对象名。通常我们在封装函数里会忽略UseMethod()的第二个参数,此时UseMethod()会根据封装函数的第一个参数来做方法调用。这里不需要将任何封装函数的参数传递给UseMethod(),事实上,这样做也是不允许的。UseMethod()可以自身通过某种黑科技来获取需要的参数。

f <- function(x) UseMethod("f")

没有对应的方法,仅仅一个泛函数是没有什么作用的。定义一个泛函数的方法很简单:你只需要创建一个带有特定名称的一般函数。

f.a <- function(x) "Class a"

a <- structure(list(), class = "a")
class(a)
f(a)

同样的方式,你可以为定义的泛函数创建多个方法:

mean.a <- function(x) "a"
mean(a)

正如你说看到的,这里并不确保调用方法后的返回值类型与该泛函数兼容。这给用户很大的自由性,用户可以使用这一特性来满足他们程序的需求即可。

方法调用

S3的方法调用相对比较简单。UseMethod()会创建一个函数向量,比如paste0("generic", ".", c(class(x), "default")),然后按顺序查找并调用。如果输入对象的类没有对应定义的方法,那么该对象会被按"default"类来调用。

f <- function(x) UseMethod("f")
f.a <- function(x) "Class a"
f.default <- function(x) "Unknown class"

f(structure(list(), class = "a"))
# 没有对应b类的方法,则调用a类对应的方法
f(structure(list(), class = c("b", "a")))
# 没有对应c类的方法,那么调用default类的方法
f(structure(list(), class = "c"))

S3泛函数组稍微有点复杂。使用泛函数组可以实现通过一个函数来执行多个泛函数的方法。S3中的是个泛函数组和他们所包含的函数列举如下:

  • "Math":abs, sign, sqrt, floor, cos, sin, log, exp, ...
  • "Ops:+, -, *, /, ^, %%, %/%, &, |, !, ==, !=, <, <=, >=, >
  • "Summary":all, any, sum, prod, min, max, range
  • "Complex":Arg, Conj, Im, Mod, Re

泛函数组是一个相对复杂的技巧,不在本书的讨论范围内。读者可以通过?groupGeneric来查看相关更多的文档。这里我们只需要了解MathOpsSummaryComplex并不是真正的函数,而是代表一组函数。在一个泛函数组中,它是通过一个特殊的变量.Generic来控制其真实的泛函数调用。

如果你有复杂的类继承,有时候我们需要调用其父函数。这里准确地描述这个问题有点麻烦,但是大致上我们可以理解成:如果我们现在调用的方法不存在,那么它的父函数就会被调用。这也是一个相对复杂的技巧,读者可以通过?NextMethod了解更多。

因为泛函数的方法其实是一般的R函数,所以我们是可以直接调用的:

c <- structure(list(), class = "c")
# 调用正确的方法:
f.default(c)
# 强迫R去调用错误的方法:
f.a(c)

可是这样做如同修改一个对象的类一样危险,因此这样做并不推荐。请避免将一把上膛的枪瞄准自己的脚。这样做的唯一理由是:跳过方法调用可以显著地提高你程序的运行速度。详见性能优化一章

你也可以对一个非S3对象来调用S3泛函数。非系统内部的S3泛函数可以通过基础类的隐含类来执行方法调用。(系统内部S3泛函数为了性能的问题不这样做。)如何决定一个一出类的隐含类是比较复杂的,示例如下:

iclass <- function(x) {
  if (is.object(x)) {
    stop("x is not a primitive type", call. = FALSE)
  }

  c(
    if (is.matrix(x)) "matrix",
    if (is.array(x) && !is.matrix(x)) "array",
    if (is.double(x)) "double",
    if (is.integer(x)) "integer",
    mode(x)
  )
}
iclass(matrix(1:5))
iclass(array(1.5))

练习

  1. 阅读t()t.test()的源代码,确认t.test()是一个S3泛函数而不是一个S3方法。试试看,如果你创建一个类为test的对象,然后用t()来调用,会发生什么结果?

  2. 在base R中,哪些类有对应Math泛函数组的方法?查看源代码来了解其工作原理。

  3. R有两种类来表示时间日期数据,分别是POSIXctPOSIXlt,两者都是继承自POSIXt。哪些泛函数对这两种类有不同的操作方法?哪些又有着相同的操作方法?

  4. 哪个基础的base泛函数拥有最多的调用方法?

  5. UseMethod()用一种特殊的方式来调用函数。试着预测下下面代码的结果,通过运行代码和阅读UseMethod()来弄明白它的运行过程。然后用简洁的语言概括你所了解到的规则。

     y <- 1g <- function(x) { y <- 2 UseMethod("g")}
    
     g.numeric <- function(x) y
    
     g(10)
    
     h <- function(x) { x <- 10 UseMethod("h")}
    
     h.character <- function(x) paste("char", x)
    
     h.numeric <- function(x) paste("num", x)
    
     h("a")
    
  6. 内部泛函数不会对基础类的隐含类发生方法调用。请仔细阅读?"internal generic"来搞清楚为什么如下代码中fg是不一样的。哪个函数可以帮助识别fg行为的不同?

    
     f <- function() 1
    
     g <- function() 2
    
     class(g) <- "function"
    
     class(f)
    
     class(g)
    
     length.function <- function(x) "function"
    
     length(f)
    
     length(g)
    

S4

S4和S3的工作原理类似,但是S4更为正式和严格。在S4中,方法依然属于函数而不是类。 又有如下几点不同:

  • S4对类有着更正式的定义,包含了域和继承关系(parent classes)。

  • 方法调用可以基于多个多个参数的类型,而不是基于仅仅一个参数?

  • 可以使用特殊操作符@提取一个S4对象的属性。

所有S4相关的代码都包含在methods包里。在R的交互式运行模式中,method包是默认加载的。在R的批量运行模式中,method包可能没有默认加载。因此,在你使用S4时,请注意最好先运行library(methods)来加载method包。

S4 is a rich and complex system. There's no way to explain it fully in a few pages. Here I'll focus on the key ideas underlying S4 so you can use existing S4 objects effectively. To learn more, some good references are:

Recognising objects, generic functions, and methods

Recognising S4 objects, generics, and methods is easy. You can identify an S4 object because str() describes it as a "formal" class, isS4() returns TRUE, and pryr::otype() returns "S4". S4 generics and methods are also easy to identify because they are S4 objects with well defined classes.

There aren't any S4 classes in the commonly used base packages (stats, graphics, utils, datasets, and base), so we'll start by creating an S4 object from the built-in stats4 package, which provides some S4 classes and methods associated with maximum likelihood estimation:

library(stats4)

# From example(mle)
y <- c(26, 17, 13, 12, 20, 5, 9, 8, 5, 4, 8)
nLL <- function(lambda) - sum(dpois(y, lambda, log = TRUE))
fit <- mle(nLL, start = list(lambda = 5), nobs = length(y))

# An S4 object
isS4(fit)
otype(fit)

# An S4 generic
isS4(nobs)
ftype(nobs)

# Retrieve an S4 method, described later
mle_nobs <- method_from_call(nobs(fit))
isS4(mle_nobs)
ftype(mle_nobs)

Use is() with one argument to list all classes that an object inherits from. Use is() with two arguments to test if an object inherits from a specific class.

is(fit)
is(fit, "mle")

You can get a list of all S4 generics with getGenerics(), and a list of all S4 classes with getClasses(). This list includes shim classes for S3 classes and base types. You can list all S4 methods with showMethods(), optionally restricting selection either by generic or by class (or both). It's also a good idea to supply where = search() to restrict the search to methods available in the global environment.

Defining classes and creating objects

In S3, you can turn any object into an object of a particular class just by setting the class attribute. S4 is much stricter: you must define the representation of a class with setClass(), and create a new object with new(). You can find the documentation for a class with a special syntax: class?className, e.g., class?mle. \index{S4!classes} \index{classes!S4}

An S4 class has three key properties:

  • A name: an alpha-numeric class identifier. By convention, S4 class names use UpperCamelCase.

  • A named list of slots (fields), which defines slot names and permitted classes. For example, a person class might be represented by a character name and a numeric age: list(name = "character", age = "numeric"). \index{slots}

  • A string giving the class it inherits from, or, in S4 terminology, that it contains. You can provide multiple classes for multiple inheritance, but this is an advanced technique which adds much complexity.

    In slots and contains you can use S4 classes, S3 classes registered with setOldClass(), or the implicit class of a base type. In slots you can also use the special class ANY which does not restrict the input.

S4 classes have other optional properties like a validity method that tests if an object is valid, and a prototype object that defines default slot values. See ?setClass for more details.

The following example creates a Person class with fields name and age, and an Employee class that inherits from Person. The Employee class inherits the slots and methods from the Person, and adds an additional slot, boss. To create objects we call new() with the name of the class, and name-value pairs of slot values. \indexc{setClass()} \indexc{new()}

setClass("Person",
  slots = list(name = "character", age = "numeric"))
setClass("Employee",
  slots = list(boss = "Person"),
  contains = "Person")

alice <- new("Person", name = "Alice", age = 40)
john <- new("Employee", name = "John", age = 20, boss = alice)

Most S4 classes also come with a constructor function with the same name as the class: if that exists, use it instead of calling new() directly.

To access slots of an S4 object use @ or slot(): \index{subsetting!S4} \index{S4|subsetting}

alice@age
slot(john, "boss")

(@ is equivalent to $, and slot() to [[.)

If an S4 object contains (inherits from) an S3 class or a base type, it will have a special .Data slot which contains the underlying base type or S3 object: \indexc{.Data}

setClass("RangedNumeric",
  contains = "numeric",
  slots = list(min = "numeric", max = "numeric"))
rn <- new("RangedNumeric", 1:10, min = 1, max = 10)
rn@min
[email protected]

Since R is an interactive programming language, it's possible to create new classes or redefine existing classes at any time. This can be a problem when you're interactively experimenting with S4. If you modify a class, make sure you also recreate any objects of that class, otherwise you'll end up with invalid objects.

Creating new methods and generics

S4 provides special functions for creating new generics and methods. setGeneric() creates a new generic or converts an existing function into a generic. setMethod() takes the name of the generic, the classes the method should be associated with, and a function that implements the method. For example, we could take union(), which usually just works on vectors, and make it work with data frames: \index{S4!generics} \index{S4!methods} \index{generics!S4} \index{methods!S4}

setGeneric("union")
setMethod("union",
  c(x = "data.frame", y = "data.frame"),
  function(x, y) {
    unique(rbind(x, y))
  }
)

If you create a new generic from scratch, you need to supply a function that calls standardGeneric():

setGeneric("myGeneric", function(x) {
  standardGeneric("myGeneric")
})

standardGeneric() is the S4 equivalent to UseMethod().

Method dispatch

If an S4 generic dispatches on a single class with a single parent, then S4 method dispatch is the same as S3 dispatch. The main difference is how you set up default values: S4 uses the special class ANY to match any class and "missing" to match a missing argument. Like S3, S4 also has group generics, documented in ?S4groupGeneric, and a way to call the "parent" method, callNextMethod(). \index{S4!method dispatch rules}

Method dispatch becomes considerably more complicated if you dispatch on multiple arguments, or if your classes use multiple inheritance. The rules are described in ?Methods, but they are complicated and it's difficult to predict which method will be called. For this reason, I strongly recommend avoiding multiple inheritance and multiple dispatch unless absolutely necessary.

Finally, there are two methods that find which method gets called given the specification of a generic call:

```{r, eval = FALSE}

From methods: takes generic name and class names

selectMethod("nobs", list("mle"))

From pryr: takes an unevaluated function call

method_from_call(nobs(fit))

### Exercises

1.  Which S4 generic has the most methods defined for it? Which S4 class 
    has the most methods associated with it?

1.  What happens if you define a new S4 class that doesn't "contain" an 
    existing class?  (Hint: read about virtual classes in `?Classes`.)

1.  What happens if you pass an S4 object to an S3 generic? What happens 
    if you pass an S3 object to an S4 generic? (Hint: read `?setOldClass` 
    for the second case.)

RC

Reference classes (or RC for short) are the newest OO system in R. They were introduced in version 2.12. They are fundamentally different to S3 and S4 because: \index{RC} \index{reference classes|see{RC}} \index{objects!RC|see{RC}}

* RC methods belong to objects, not functions

* RC objects are mutable: the usual R copy-on-modify semantics do not apply

These properties make RC objects behave more like objects do in most other programming languages, e.g., Python, Ruby, Java, and C#. Reference classes are implemented using R code: they are a special S4 class that wraps around an environment.

### Defining classes and creating objects

Since there aren't any reference classes provided by the base R packages, we'll start by creating one. RC classes are best used for describing stateful objects, objects that change over time, so we'll create a simple class to model a bank account. \index{RC!classes} \index{classes!RC}

Creating a new RC class is similar to creating a new S4 class, but you use `setRefClass()` instead of `setClass()`. The first, and only required argument, is an alphanumeric __name__. While you can use `new()` to create new RC objects, it's good style to use the object returned by `setRefClass()` to generate new objects. (You can also do that with S4 classes, but it's less common.) \indexc{setRefClass()}

```{r}
Account <- setRefClass("Account")
Account$new()

setRefClass() also accepts a list of name-class pairs that define class fields (equivalent to S4 slots). Additional named arguments passed to new() will set initial values of the fields. You can get and set field values with $: \index{fields}

Account <- setRefClass("Account",
  fields = list(balance = "numeric"))

a <- Account$new(balance = 100)
a$balance
a$balance <- 200
a$balance

Instead of supplying a class name for the field, you can provide a single argument function which will act as an accessor method. This allows you to add custom behaviour when getting or setting a field. See ?setRefClass for more details.

Note that RC objects are mutable, i.e., they have reference semantics, and are not copied-on-modify: \index{copy-on-modify!exceptions}

b <- a
b$balance
a$balance <- 0
b$balance

For this reason, RC objects come with a copy() method that allow you to make a copy of the object:

c <- a$copy()
c$balance
a$balance <- 100
c$balance

An object is not very useful without some behaviour defined by methods. RC methods are associated with a class and can modify its fields in place. In the following example, note that you access the value of fields with their name, and modify them with <<-. You'll learn more about <<- in Environments. \index{RC!methods} \index{methods!RC} \indexc{<<-}

Account <- setRefClass("Account",
  fields = list(balance = "numeric"),
  methods = list(
    withdraw = function(x) {
      balance <<- balance - x
    },
    deposit = function(x) {
      balance <<- balance + x
    }
  )
)

You call an RC method in the same way as you access a field:

a <- Account$new(balance = 100)
a$deposit(100)
a$balance

The final important argument to setRefClass() is contains. This is the name of the parent RC class to inherit behaviour from. The following example creates a new type of bank account that returns an error preventing the balance from going below 0.

```{r, error = TRUE} NoOverdraft <- setRefClass("NoOverdraft", contains = "Account", methods = list( withdraw = function(x) { if (balance < x) stop("Not enough money") balance <<- balance - x } ) ) accountJohn <- NoOverdraft$new(balance = 100) accountJohn$deposit(50) accountJohn$balance accountJohn$withdraw(200)

All reference classes eventually inherit from `envRefClass`. It provides useful methods like `copy()` (shown above), `callSuper()` (to call the parent field), `field()` (to get the value of a field given its name), `export()` (equivalent to `as()`), and `show()` (overridden to control printing). See the inheritance section in `setRefClass()` for more details.

### Recognising objects and methods

You can recognise RC objects because they are S4 objects (`isS4(x)`) that inherit from "refClass" (`is(x, "refClass")`). `pryr::otype()` will return "RC".  RC methods are also S4 objects, with class `refMethodDef`.

### Method dispatch

Method dispatch is very simple in RC because methods are associated with classes, not functions. When you call `x$f()`, R will look for a method f in the class of x, then in its parent, then its parent's parent, and so on. From within a method, you can call the parent method directly with `callSuper(...)`. \index{RC!method dispatch rules}

### Exercises

1.  Use a field function to prevent the account balance from being directly
    manipulated. (Hint: create a "hidden" `.balance` field, and read the 
    help for the fields argument in `setRefClass()`.)

1.  I claimed that there aren't any RC classes in base R, but that was a 
    bit of a simplification. Use `getClasses()` and find which classes 
    `extend()` from `envRefClass`. What are the classes used for? (Hint: 
    recall how to look up the documentation for a class.)

## Picking a system {#picking-a-system}

Three OO systems is a lot for one language, but for most R programming, S3 suffices. In R you usually create fairly simple objects and methods for pre-existing generic functions like `print()`, `summary()`, and `plot()`. S3 is well suited to this task, and the majority of OO code that I have written in R is S3. S3 is a little quirky, but it gets the job done with a minimum of code. \index{objects!which system?}

```{r, eval = FALSE, echo = FALSE}
packageVersion("Matrix")

library(Matrix)
gs <- getGenerics("package:Matrix")
sum(gs@package == "Matrix")

length(getClasses("package:Matrix", FALSE))

If you are creating more complicated systems of interrelated objects, S4 may be more appropriate. A good example is the Matrix package by Douglas Bates and Martin Maechler. It is designed to efficiently store and compute with many different types of sparse matrices. As of version 1.1.3, it defines 102 classes and 20 generic functions. The package is well written and well commented, and the accompanying vignette (vignette("Intro2Matrix", package = "Matrix")) gives a good overview of the structure of the package. S4 is also used extensively by Bioconductor packages, which need to model complicated interrelationships between biological objects. Bioconductor provides many good resources for learning S4. If you've mastered S3, S4 is relatively easy to pick up; the ideas are all the same, it is just more formal, more strict, and more verbose.

If you've programmed in a mainstream OO language, RC will seem very natural. But because they can introduce side effects through mutable state, they are harder to understand. For example, when you usually call f(a, b) in R you can assume that a and b will not be modified. But if a and b are RC objects, they might be modified in the place. Generally, when using RC objects you want to minimise side effects as much as possible, and use them only where mutable states are absolutely required. The majority of functions should still be "functional", and free of side effects. This makes code easier to reason about and easier for other R programmers to understand.

Quiz answers

  1. To determine the OO system of an object, you use a process of elimination. If !is.object(x), it's a base object. If !isS4(x), it's S3. If !is(x, "refClass"), it's S4; otherwise it's RC.

  2. Use typeof() to determine the base class of an object.

  3. A generic function calls specific methods depending on the class of it inputs. In S3 and S4 object systems, methods belong to generic functions, not classes like in other programming languages.

  4. S4 is more formal than S3, and supports multiple inheritance and multiple dispatch. RC objects have reference semantics, and methods belong to classes, not functions.

results matching ""

    No results matching ""