What’s a dgCMatrix object product of? (sparse matrix format in R)

I’ve been working with sparse matrices in R not too long ago (these created utilizing Matrix::Matrix with the choice sparse=TRUE) and located it tough to trace down documentation about what the slots within the matrix object are. This publish describes the slots in a category dgCMatrix object.

(Click on right here for full documentation of the Matrix package deal (and it’s a lot–like, 215 pages lots).)

Background

It seems that there’s some documentation on dgCMatrix objects throughout the Matrix package deal. One can entry it utilizing the next code:

library(Matrix)
?`dgCMatrix-class`

In accordance with the documentation, the dgCMatrix class

…is a category of sparse numeric matrices within the compressed, sparse, column-oriented format. On this implementation the non-zero parts within the columns are sorted into rising row order. dgCMatrix is the “customary” class for sparse numeric matrices within the Matrix package deal.

An instance

We’ll use a small matrix as a working instance on this publish:

library(Matrix)
M <- Matrix(c(0, 0, 0, 2, 6, 0, -1, 5, 0, 4, 3, 0, 0, 0, 5, 0), byrow = TRUE, nrow = 4, sparse = TRUE)
rownames(M) <- paste0("r", 1:4)
colnames(M) <- paste0("c", 1:4)
M
# Four x Four sparse Matrix of sophistication "dgCMatrix"
# c1 c2 c3 c4
# r1 . . . 2
# r2 6 . -1 5
# r3 . Four 3 .
# r4 . . 5 .

Working str on x tells us that the dgCMatrix object has 6 slots. (To be taught extra about slots and S4 objects, see this part from Hadley Wickham’s Superior R.)

str(M)
# Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
# [email protected] i : int [1:7] 1 2 1 2 Three Zero 1
# [email protected] p : int [1:5] Zero 1 2 5 7
# [email protected] Dim : int [1:2] Four 4
# [email protected] Dimnames:Listing of two
# .. ..$ : chr [1:4] "r1" "r2" "r3" "r4"
# .. ..$ : chr [1:4] "c1" "c2" "c3" "c4"
# [email protected] x : num [1:7] 6 4 -1 Three 5 2 5
# [email protected] components : record()

x, i and p

If a matrix M has nn non-zero entries, then its x slot is a vector of size nn containing all of the non-zero values within the matrix. The non-zero parts in column 1 are listed first (ranging from the highest and ending on the backside), adopted by column 2, Three and so forth.

M
# Four x Four sparse Matrix of sophistication "dgCMatrix"
# c1 c2 c3 c4
# r1 . . . 2
# r2 6 . -1 5
# r3 . Four 3 .
# r4 . . 5 . [email protected]
# [1] 6 4 -1 Three 5 2 5 as.numeric(M)[as.numeric(M) != 0]
# [1] 6 4 -1 Three 5 2 5

The i slot is a vector of size nn. The okth factor of [email protected] is the row index of the okth non-zero factor (as listed in [email protected]). One huge factor to notice right here is that the primary row has index ZERO, in contrast to R’s indexing conference. In our instance, the primary non-zero entry, 6, is within the second row, i.e. row index 1, so the primary entry of [email protected] is 1.

M
# Four x Four sparse Matrix of sophistication "dgCMatrix"
# c1 c2 c3 c4
# r1 . . . 2
# r2 6 . -1 5
# r3 . Four 3 .
# r4 . . 5 . [email protected]
# [1] 1 2 1 2 Three Zero 1

If the matrix has nvars columns, then the p slot is a vector of size nvars + 1. If we index the columns such that the primary column has index ZERO, then [email protected][1] = 0, and [email protected][j+2] - [email protected][j+1] offers us the variety of non-zero parts in column j.

In our instance, when j = 2, [email protected][2+2] - [email protected][2+1] = 5 - 2 = 3, so there are Three non-zero parts in column index 2 (i.e. the third column).

M
# Four x Four sparse Matrix of sophistication "dgCMatrix"
# c1 c2 c3 c4
# r1 . . . 2
# r2 6 . -1 5
# r3 . Four 3 .
# r4 . . 5 . [email protected]
# [1] Zero 1 2 5 7

With the x, i and p slots, one can reconstruct the entries of the matrix.

Dim and Dimnames

These two slots are pretty apparent. Dim is a vector of size 2, with the primary and second entries denoting the variety of rows and columns the matrix has respectively. Dimnames is a listing of size 2: the primary factor being a vector of row names (if current) and the second being a vector of column names (if current).

components

This slot might be probably the most uncommon of the lot, and its documentation was a bit tough to trace down. From the CRAN documentation, it seems like components is

… [an] Object of sophistication “record” – a listing of factorizations of the matrix. Observe that that is sometimes empty, i.e., record(), initially and is up to date automagically every time a matrix factorization is
computed.

My understanding is that if we carry out any matrix factorizations or decompositions on a dgCMatrix object, it shops the factorization underneath components in order that if requested for the factorization once more, it will possibly return the cached worth as an alternative of recomputing the factorization. Right here is an instance:

[email protected]
# record() Mlu <- lu(M) # carry out triangular decomposition
str([email protected])
# Listing of 1
# $ LU:Formal class 'sparseLU' [package "Matrix"] with 5 slots
# .. [email protected] L :Formal class 'dtCMatrix' [package "Matrix"] with 7 slots
# .. .. .. [email protected] i : int [1:4] Zero 1 2 3
# .. .. .. [email protected] p : int [1:5] Zero 1 2 Three 4
# .. .. .. [email protected] Dim : int [1:2] Four 4
# .. .. .. [email protected] Dimnames:Listing of two
# .. .. .. .. ..$ : chr [1:4] "r2" "r3" "r4" "r1"
# .. .. .. .. ..$ : NULL
# .. .. .. [email protected] x : num [1:4] 1 1 1 1
# .. .. .. [email protected] uplo : chr "U"
# .. .. .. [email protected] diag : chr "N"
# .. [email protected] U :Formal class 'dtCMatrix' [package "Matrix"] with 7 slots
# .. .. .. [email protected] i : int [1:7] Zero 1 Zero 1 2 Zero 3
# .. .. .. [email protected] p : int [1:5] Zero 1 2 5 7
# .. .. .. [email protected] Dim : int [1:2] Four 4
# .. .. .. [email protected] Dimnames:Listing of two
# .. .. .. .. ..$ : NULL
# .. .. .. .. ..$ : chr [1:4] "c1" "c2" "c3" "c4"
# .. .. .. [email protected] x : num [1:7] 6 4 -1 Three 5 5 2
# .. .. .. [email protected] uplo : chr "U"
# .. .. .. [email protected] diag : chr "N"
# .. [email protected] p : int [1:4] 1 2 Three 0
# .. [email protected] q : int [1:4] Zero 1 2 3
# .. [email protected] Dim: int [1:2] Four 4

Right here is an instance which exhibits that the decomposition is barely carried out as soon as:

set.seed(1)
M <- runif(9e6)
M[sample.int(9e6, size = 8e6)] <- 0
M <- Matrix(M, nrow = 3e3, sparse = TRUE) system.time(lu(M))
# person system elapsed
# 13.527 0.161 13.701 system.time(lu(M))
# person system elapsed
# Zero Zero 0



Should you obtained this far, why not subscribe for updates from the positioning? Select your taste: e-mail, twitter, RSS, or fb

Leave a Reply

Your email address will not be published. Required fields are marked *