I’ve been working with sparse matrices in R not too long ago (these created utilizing `Matrix::Matrix`

with the choice `sparse=TRUE`

) and located it tough to trace down documentation about what the slots within the matrix object are. This publish describes the slots in a category `dgCMatrix`

object.

(Click on right here for full documentation of the `Matrix`

package deal (and it’s a lot–like, 215 pages lots).)

**Background**

It seems that there’s some documentation on `dgCMatrix`

objects throughout the `Matrix`

package deal. One can entry it utilizing the next code:

library(Matrix) ?`dgCMatrix-class`

In accordance with the documentation, the `dgCMatrix`

class

…is a category of sparse numeric matrices within the compressed, sparse, column-oriented format. On this implementation the non-zero parts within the columns are sorted into rising row order.

`dgCMatrix`

is the “customary” class for sparse numeric matrices within the`Matrix`

package deal.

**An instance**

We’ll use a small matrix as a working instance on this publish:

library(Matrix) M <- Matrix(c(0, 0, 0, 2, 6, 0, -1, 5, 0, 4, 3, 0, 0, 0, 5, 0), byrow = TRUE, nrow = 4, sparse = TRUE) rownames(M) <- paste0("r", 1:4) colnames(M) <- paste0("c", 1:4) M # Four x Four sparse Matrix of sophistication "dgCMatrix" # c1 c2 c3 c4 # r1 . . . 2 # r2 6 . -1 5 # r3 . Four 3 . # r4 . . 5 .

Working `str`

on `x`

tells us that the `dgCMatrix`

object has 6 slots. (To be taught extra about slots and S4 objects, see this part from Hadley Wickham’s *Superior R*.)

str(M) # Formal class 'dgCMatrix' [package "Matrix"] with 6 slots # [email protected] i : int [1:7] 1 2 1 2 Three Zero 1 # [email protected] p : int [1:5] Zero 1 2 5 7 # [email protected] Dim : int [1:2] Four 4 # [email protected] Dimnames:Listing of two # .. ..$ : chr [1:4] "r1" "r2" "r3" "r4" # .. ..$ : chr [1:4] "c1" "c2" "c3" "c4" # [email protected] x : num [1:7] 6 4 -1 Three 5 2 5 # [email protected] components : record()

`x`

, `i`

and `p`

If a matrix `M`

has `nn`

non-zero entries, then its `x`

slot is a vector of size `nn`

containing all of the non-zero values within the matrix. The non-zero parts in column 1 are listed first (ranging from the highest and ending on the backside), adopted by column 2, Three and so forth.

M # Four x Four sparse Matrix of sophistication "dgCMatrix" # c1 c2 c3 c4 # r1 . . . 2 # r2 6 . -1 5 # r3 . Four 3 . # r4 . . 5 . [email protected] # [1] 6 4 -1 Three 5 2 5 as.numeric(M)[as.numeric(M) != 0] # [1] 6 4 -1 Three 5 2 5

The `i`

slot is a vector of size `nn`

. The `ok`

th factor of `[email protected]`

is the row index of the `ok`

th non-zero factor (as listed in `[email protected]`

). * One huge factor to notice right here is that the primary row has index ZERO, in contrast to R’s indexing conference.* In our instance, the primary non-zero entry, 6, is within the second row, i.e. row index 1, so the primary entry of

`[email protected]`

is 1.M # Four x Four sparse Matrix of sophistication "dgCMatrix" # c1 c2 c3 c4 # r1 . . . 2 # r2 6 . -1 5 # r3 . Four 3 . # r4 . . 5 . [email protected] # [1] 1 2 1 2 Three Zero 1

If the matrix has `nvars`

columns, then the `p`

slot is a vector of size `nvars + 1`

. * If we index the columns such that the primary column has index ZERO,* then

`[email protected][1] = 0`

, and `[email protected][j+2] - [email protected][j+1]`

offers us the variety of non-zero parts in column `j`

.In our instance, when `j = 2`

, `[email protected][2+2] - [email protected][2+1] = 5 - 2 = 3`

, so there are Three non-zero parts in column index 2 (i.e. the third column).

M # Four x Four sparse Matrix of sophistication "dgCMatrix" # c1 c2 c3 c4 # r1 . . . 2 # r2 6 . -1 5 # r3 . Four 3 . # r4 . . 5 . [email protected] # [1] Zero 1 2 5 7

With the `x`

, `i`

and `p`

slots, one can reconstruct the entries of the matrix.

`Dim`

and `Dimnames`

These two slots are pretty apparent. `Dim`

is a vector of size 2, with the primary and second entries denoting the variety of rows and columns the matrix has respectively. `Dimnames`

is a listing of size 2: the primary factor being a vector of row names (if current) and the second being a vector of column names (if current).

`components`

This slot might be probably the most uncommon of the lot, and its documentation was a bit tough to trace down. From the CRAN documentation, it seems like `components`

is

… [an] Object of sophistication “record” – a listing of factorizations of the matrix. Observe that that is sometimes empty, i.e.,

`record()`

, initially and is up to dateevery time a matrix factorization isautomagically

computed.

My understanding is that if we carry out any matrix factorizations or decompositions on a `dgCMatrix`

object, it shops the factorization underneath `components`

in order that if requested for the factorization once more, it will possibly return the cached worth as an alternative of recomputing the factorization. Right here is an instance:

[email protected] # record() Mlu <- lu(M) # carry out triangular decomposition str([email protected]) # Listing of 1 # $ LU:Formal class 'sparseLU' [package "Matrix"] with 5 slots # .. [email protected] L :Formal class 'dtCMatrix' [package "Matrix"] with 7 slots # .. .. .. [email protected] i : int [1:4] Zero 1 2 3 # .. .. .. [email protected] p : int [1:5] Zero 1 2 Three 4 # .. .. .. [email protected] Dim : int [1:2] Four 4 # .. .. .. [email protected] Dimnames:Listing of two # .. .. .. .. ..$ : chr [1:4] "r2" "r3" "r4" "r1" # .. .. .. .. ..$ : NULL # .. .. .. [email protected] x : num [1:4] 1 1 1 1 # .. .. .. [email protected] uplo : chr "U" # .. .. .. [email protected] diag : chr "N" # .. [email protected] U :Formal class 'dtCMatrix' [package "Matrix"] with 7 slots # .. .. .. [email protected] i : int [1:7] Zero 1 Zero 1 2 Zero 3 # .. .. .. [email protected] p : int [1:5] Zero 1 2 5 7 # .. .. .. [email protected] Dim : int [1:2] Four 4 # .. .. .. [email protected] Dimnames:Listing of two # .. .. .. .. ..$ : NULL # .. .. .. .. ..$ : chr [1:4] "c1" "c2" "c3" "c4" # .. .. .. [email protected] x : num [1:7] 6 4 -1 Three 5 5 2 # .. .. .. [email protected] uplo : chr "U" # .. .. .. [email protected] diag : chr "N" # .. [email protected] p : int [1:4] 1 2 Three 0 # .. [email protected] q : int [1:4] Zero 1 2 3 # .. [email protected] Dim: int [1:2] Four 4

Right here is an instance which exhibits that the decomposition is barely carried out as soon as:

set.seed(1) M <- runif(9e6) M[sample.int(9e6, size = 8e6)] <- 0 M <- Matrix(M, nrow = 3e3, sparse = TRUE) system.time(lu(M)) # person system elapsed # 13.527 0.161 13.701 system.time(lu(M)) # person system elapsed # Zero Zero 0