iTranslated by AI
The State of Japanese Morphological Analysis in R (February 2025)
About this article
RMeCab::docDF() is a convenient function that allows for various ways of counting words in a text within a single call. On the other hand, it can be a tricky function to handle; if used carelessly, processing can take an unusually long time, or the entire R session might crash depending on MeCab's behavior.
In this article, I will first explain the reason why docDF() takes an unusually long time when used carelessly, and then introduce how such situations can be avoided by setting minFreq to a sufficiently large value. Furthermore, I will explain that docDF() is not originally designed for use cases where you want to adjust various thresholds to filter extracted words. In such scenarios, from the perspective of analysis speed, it is better to use other morphological analysis tools like gibasa.
The problem of return values being too large
To begin with, let's look at why RMeCab::docDF() takes an unusually long time when used carelessly. I will load the livedoor news corpus as an example of a Japanese corpus that is likely to take a considerable amount of time when processed naively with docDF().
This corpus can be easily loaded using the ldccr package as follows:
dat <-
ldccr::read_ldnws(exdir = tempdir(), include_title = FALSE) |>
dplyr::mutate(
doc_id = ldccr::sqids(),
title = stringi::stri_trans_nfkc(title),
text = stringi::stri_trans_nfkc(body)
) |>
dplyr::select(doc_id, category, title, text)
#> Parsing dokujo-tsushin...
#> Parsing it-life-hack...
#> Parsing kaden-channel...
#> Parsing livedoor-homme...
#> Parsing movie-enter...
#> Parsing peachy...
#> Parsing smax...
#> Parsing sports-watch...
#> Parsing topic-news...
#> Done.
dat
#> # A tibble: 7,367 × 4
#> doc_id category title text
#> <chr> <fct> <chr> <chr>
#> 1 se8fMjYzCtd dokujo-tsushin 友人代表のスピーチ、独女はどうこなしている? " もう…
#> 2 qT3fn1M59E0 dokujo-tsushin ネットで断ち切れない元カレとの縁 " 携帯…
#> 3 GKUXVcgT98l dokujo-tsushin 相次ぐ芸能人の“すっぴん”披露 その時、独女の心境は?…… " 「男…
#> 4 W40lYr546YF dokujo-tsushin ムダな抵抗!? 加齢の現実 " ヒッ…
#> 5 fk9LpWmkdq5 dokujo-tsushin 税金を払うのは私たちなんですけど! " 6月…
#> 6 IPvFoBR26YM dokujo-tsushin 読んでみる?描いてみる?大人の女性を癒す絵本の魅力…… " 書店…
#> 7 V0XqBmtTUuG dokujo-tsushin 大人になっても解決しない「お昼休み」という問題…… " 昨年…
#> 8 RneKxNiST74 dokujo-tsushin 結婚しても働くのはなぜ? 既婚女性のつぶやき " 「彼…
#> 9 dv4RWw390v5 dokujo-tsushin お肌に優しいから安心 紫外線が気になる独女の夏の対策とは?…… " これ…
#> 10 5bQL9ZfJoA5 dokujo-tsushin 初回デートで婚カツ女子がゲンメツする行為って?…… " 合コ…
#> # ℹ 7,357 more rows
This corpus is originally intended for testing a 9-category document classification task and contains a total of 7,367 blog articles. The text volume for most articles is around 1,000 characters.
library(ggplot2)
dat |>
dplyr::mutate(n_char = stringr::str_count(text)) |>
ggplot(aes(x = n_char, color = category)) +
geom_density() +
scale_x_log10() +
theme_light()

The memory size (as an R object) of the data frame loaded from this corpus is about 25MB. Of course, reading the entire content of all these articles would be quite demanding, so in that sense, it is certainly not a small amount. However, it is an amount that fits easily in memory, so it is not exactly "large-scale."
In fact, for a collection of texts of this size, simply performing morphological analysis can be done in a few seconds using gibasa in my local environment.
microbenchmark::microbenchmark(
gibasa = {
toks <- gibasa::tokenize(dat, text, doc_id)
},
times = 1
)
#> Unit: seconds
#> expr min lq mean median uq max neval
#> gibasa 4.762919 4.762919 4.762919 4.762919 4.762919 4.762919 1
toks
#> # A tibble: 4,767,480 × 7
#> doc_id category title sentence_id token_id token feature
#> <fct> <fct> <chr> <int> <int> <chr> <chr>
#> 1 se8fMjYzCtd dokujo-tsushin 友人代表のスピーチ、独女はど… 1 1 もうすぐ… 副詞,一般,…
#> 2 se8fMjYzCtd dokujo-tsushin 友人代表のスピーチ、独女はど… 1 2 ジューン… 名詞,固有名…
#> 3 se8fMjYzCtd dokujo-tsushin 友人代表のスピーチ、独女はど… 1 3 ・ 記号,一般,…
#> 4 se8fMjYzCtd dokujo-tsushin 友人代表のスピーチ、独女はど… 1 4 ブライド… 名詞,一般,…
#> 5 se8fMjYzCtd dokujo-tsushin 友人代表のスピーチ、独女はど… 1 5 と 助詞,格助詞…
#> 6 se8fMjYzCtd dokujo-tsushin 友人代表のスピーチ、独女はど… 1 6 呼ば 動詞,自立,…
#> 7 se8fMjYzCtd dokujo-tsushin 友人代表のスピーチ、独女はど… 1 7 れる 動詞,接尾,…
#> 8 se8fMjYzCtd dokujo-tsushin 友人代表のスピーチ、独女はど… 1 8 6 名詞,数,*…
#> 9 se8fMjYzCtd dokujo-tsushin 友人代表のスピーチ、独女はど… 1 9 月 名詞,一般,…
#> 10 se8fMjYzCtd dokujo-tsushin 友人代表のスピーチ、独女はど… 1 10 。 記号,句点,…
#> # ℹ 4,767,470 more rows
Analyzing it with the IPA dictionary, the total number of tokens was 4,767,480. Let's try to count them up appropriately and create a document-term matrix (DTM).
dtm <- toks |>
dplyr::count(doc_id, token) |>
tidytext::cast_sparse(doc_id, token, n)
dim(dtm)
#> [1] 7367 74318
lobstr::obj_sizes(dtm)
#> 28.84 MB
The resulting matrix is 7,367 documents by 74,318 terms. When held as a sparse matrix object, the number of numeric values stored as data is the same as in long-format count data. The size of this document-term matrix in memory is about 28MB, which is also quite small.
On the other hand, if you were to hold a 7,000 document × 70,000 term matrix as a standard dense object, that alone would require 3.92GB.
lobstr::obj_sizes(numeric(7000 * 70000))
#> 3.92 GB
gc()
#> used (Mb) gc trigger (Mb) max used (Mb)
#> Ncells 2520124 134.6 7433249 397.0 9291561 496.3
#> Vcells 38322278 292.4 508863983 3882.4 528345638 4031.0
Incidentally, in R, trying to hold the same number of elements as a data frame generally consumes even more memory than a matrix. A data frame of approximately 7,000 rows by 70,000 columns might fit in memory, but it's not something you'd want to create if you can help it.
lobstr::obj_sizes(
matrix(0, nrow = 100, ncol = 1000),
# This way is not ideal
as.data.frame(matrix(0, nrow = 100, ncol = 1000)),
# Since R data frames are column-oriented, holding it longer in the column direction can reduce size
as.data.frame(t(matrix(0, nrow = 100, ncol = 1000)))
)
#> * 800.22 kB
#> * 920.61 kB
#> * 806.72 kB
gc()
#> used (Mb) gc trigger (Mb) max used (Mb)
#> Ncells 2520173 134.6 7433249 397.0 9291561 496.3
#> Vcells 38322403 292.4 407091187 3105.9 528345638 4031.0
RMeCab::docDF() attempts to return a term-document matrix as a data frame. If used carelessly with this data, it would try to create a data frame of a scale like 70,000 × 7,000 (which should be at least about 3.92GB!). This is the main reason why docDF() becomes abnormally slow when used naively. When this happens while using docDF(), you might see an error message after a long wait, such as "cannot allocate vector of size xx GB."
docDF is fast if used correctly
So, does this mean RMeCab::docDF() is useless? Not at all; in fact, it is quite fast as a function for counting words. However, because it takes time to convert the results of word counting on the C++ side back into an R data frame, it tends to stop working properly as the number of extracted terms increases. Therefore, if you set minFreq to a sufficiently large value relative to the amount of text being analyzed, it can usually be run quickly.
For example, let's say we prepare a nice wrapper like the following, which formats and returns the results of docDF() in a tidy way.
docdf_rmecab <- function(dat,
text_field, docid_field,
minFreq = floor(sqrt(nrow(dat))) * 2,
count_genkei = FALSE) {
text_field <- rlang::enquo(text_field)
docid_field <- rlang::enquo(docid_field)
# if docid is a factor, preserve ordering
col_names <- rlang::as_name(docid_field)
if (is.factor(dat[[col_names]])) {
col_u <- levels(dat[[col_names]])
} else {
col_u <- unique(dat[[col_names]])
}
pos_text <- tidyselect::eval_select(text_field, dat)
rmecab_res <-
RMeCab::docDF(dat,
column = pos_text, minFreq = minFreq,
type = 1, Genkei = as.numeric(!count_genkei), weight = "tf*idf"
)
ndocs <- ncol(rmecab_res) - 3
tidyr::pivot_longer(
rmecab_res,
cols = starts_with("Row"),
names_to = "doc_id",
values_to = "tf_idf",
names_transform = \(.x) {
stringr::str_remove(.x, "Row")
},
values_transform = list(tf_idf = \(.x) {
ifelse(.x == 0, NA_integer_, .x)
}),
values_drop_na = TRUE
) |>
dplyr::arrange(as.integer(doc_id)) |>
dplyr::mutate(
doc_id = as.integer(doc_id),
doc_id = factor(doc_id, labels = col_u[unique(doc_id)]),
token = TERM,
POS1 = dplyr::if_else(POS1 == "*", NA_character_, POS1),
POS2 = dplyr::if_else(POS2 == "*", NA_character_, POS2),
tf_idf = tf_idf
) |>
dplyr::distinct(doc_id, token, POS1, POS2, tf_idf)
}
docdf_rmecab(dat[1:5, ], text, doc_id) |>
dplyr::filter(token %in% c("独", "女"))
#> number of extracted terms = 168
#> now making a data frame. wait a while!
#> # A tibble: 8 × 5
#> doc_id token POS1 POS2 tf_idf
#> <fct> <chr> <chr> <chr> <dbl>
#> 1 se8fMjYzCtd 女 名詞 一般 5.29
#> 2 se8fMjYzCtd 独 名詞 固有名詞 5.29
#> 3 qT3fn1M59E0 女 名詞 一般 2.64
#> 4 qT3fn1M59E0 独 名詞 固有名詞 2.64
#> 5 GKUXVcgT98l 女 名詞 一般 5.29
#> 6 GKUXVcgT98l 独 名詞 固有名詞 5.29
#> 7 fk9LpWmkdq5 女 名詞 一般 2.64
#> 8 fk9LpWmkdq5 独 名詞 固有名詞 2.64
Rewriting this using gibasa results in something like the following process (since words are counted using dplyr::add_count(), the original order of words within the document is preserved in the return value here).
docdf_gibasa <- function(dat,
text_field, docid_field,
minFreq = floor(sqrt(nrow(dat))) * 2) {
text_field <- rlang::enquo(text_field)
docid_field <- rlang::enquo(docid_field)
gibasa::tokenize(dat, !!text_field, !!docid_field) |>
gibasa::prettify(col_select = c("POS1", "POS2")) |>
dplyr::mutate(TERM = paste(token, POS1, POS2, sep = "/")) |>
dplyr::add_count(doc_id, TERM) |>
# minFreq is the threshold for the document frequency where the TERM appears
dplyr::filter(sum(n > 0) >= minFreq, .by = TERM) |>
# Since bind_tf_idf2 will throw an error if an entire document is removed, drop doc_ids that do not appear
dplyr::mutate(doc_id = forcats::fct_drop(doc_id)) |>
gibasa::bind_tf_idf2(TERM, doc_id, norm = FALSE) |>
dplyr::mutate(tf_idf = n * idf) |>
dplyr::distinct(doc_id, token, POS1, POS2, tf_idf)
}
docdf_gibasa(dat[1:5, ], text, doc_id) |>
dplyr::filter(token %in% c("独", "女"))
#> # A tibble: 8 × 5
#> doc_id token POS1 POS2 tf_idf
#> <fct> <chr> <chr> <chr> <dbl>
#> 1 se8fMjYzCtd 独 名詞 固有名詞 5.29
#> 2 se8fMjYzCtd 女 名詞 一般 5.29
#> 3 qT3fn1M59E0 独 名詞 固有名詞 2.64
#> 4 qT3fn1M59E0 女 名詞 一般 2.64
#> 5 GKUXVcgT98l 独 名詞 固有名詞 5.29
#> 6 GKUXVcgT98l 女 名詞 一般 5.29
#> 7 fk9LpWmkdq5 独 名詞 固有名詞 2.64
#> 8 fk9LpWmkdq5 女 名詞 一般 2.64
Let's compare these functions by passing the entire dat to them.
microbenchmark::microbenchmark(
gibasa = docdf_gibasa(dat, text, doc_id) |>
dplyr::distinct(doc_id, tf_idf) |>
dplyr::arrange(doc_id, tf_idf),
rmecab = docdf_rmecab(dat, text, doc_id) |>
dplyr::distinct(doc_id, tf_idf) |>
dplyr::arrange(doc_id, tf_idf),
times = 1,
check = "equal"
)
#> number of extracted terms = 2631
#> now making a data frame. wait a while!
#>
#> Unit: seconds
#> expr min lq mean median uq max neval
#> gibasa 28.23764 28.23764 28.23764 28.23764 28.23764 28.23764 1
#> rmecab 43.76040 43.76040 43.76040 43.76040 43.76040 43.76040 1
While it still seems that using gibasa is faster, this is because floor(sqrt(nrow(dat))) * 2 is too small as a threshold for word extraction. If we set minFreq more aggressively relative to the number of documents so that the return value results in around 1,000 columns, RMeCab can be faster.
microbenchmark::microbenchmark(
gibasa = docdf_gibasa(dat[1:1000, ], text, doc_id) |>
dplyr::distinct(doc_id, tf_idf) |>
dplyr::arrange(doc_id, tf_idf),
rmecab = docdf_rmecab(dat[1:1000, ], text, doc_id) |>
dplyr::distinct(doc_id, tf_idf) |>
dplyr::arrange(doc_id, tf_idf),
times = 1,
check = "equal"
)
#> number of extracted terms = 1224
#> now making a data frame. wait a while!
#>
#> Unit: seconds
#> expr min lq mean median uq max neval
#> gibasa 4.440494 4.440494 4.440494 4.440494 4.440494 4.440494 1
#> rmecab 3.735932 3.735932 3.735932 3.735932 3.735932 3.735932 1
Separate processes that require trial and error
However, when using RMeCab::docDF(), our goal isn't necessarily to make docDF() work well. Ultimately, we want to count words. It defeats the purpose if we concentrate so much on making docDF() run that we set minFreq too high and fail to extract the words we should be analyzing.
Filtering extracted words based on document frequency, as docDF()'s minFreq does, is a common practice, but it's often difficult to decide on a specific threshold until after the words have been extracted. Therefore, if you try to experiment with document frequency thresholds using docDF(), you would have to run docDF() multiple times while changing the minFreq value. Each time, the entire corpus would be morphologically analyzed, words counted, and the results converted into a data frame over dozens of seconds. Furthermore, if minFreq is too small, the return value might not even fit in memory.
Probably, docDF() is not very suitable for this kind of trial and error. In scenarios requiring such experimentation, it is usually less stressful to be able to execute each step of the trial and error in short spans. Therefore, for processes like those you'd want to achieve with docDF(), it would be more convenient to implement the step of segmenting the text and counting words separately from the step of filtering words by document frequency.
In other words, it makes sense to keep this:
toks <-
gibasa::tokenize(dat, text, doc_id) |>
gibasa::prettify(col_select = c("POS1", "POS2")) |>
dplyr::mutate(token = paste(token, POS1, POS2, sep = "/")) |>
dplyr::count(doc_id, token)
Separate from this:
minFreq <- 100 ## Try various values here for experimentation
dtm <- toks |>
dplyr::filter(sum(n > 0) >= minFreq, .by = token) |>
dplyr::mutate(doc_id = forcats::fct_drop(doc_id)) |>
gibasa::bind_tf_idf2(token, doc_id) |>
dplyr::mutate(tf_idf = n * idf) |>
tidytext::cast_sparse(doc_id, token, tf_idf)
str(dtm)
#> Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
#> ..@ i : int [1:1420472] 0 9 12 15 20 26 27 28 31 38 ...
#> ..@ p : int [1:3026] 0 1353 1596 7800 13783 15080 15420 15558 17638 17899 ...
#> ..@ Dim : int [1:2] 7367 3025
#> ..@ Dimnames:List of 2
#> .. ..$ : chr [1:7367] "se8fMjYzCtd" "qT3fn1M59E0" "GKUXVcgT98l" "W40lYr546YF" ...
#> .. ..$ : chr [1:3025] "!」/名詞/サ変接続" "!』/名詞/サ変接続" "(/名詞/サ変接続" ")/名詞/サ変接続" ...
#> ..@ x : num [1:1420472] 6.89 3.44 3.44 3.44 6.89 ...
#> ..@ factors : list()
Is RMeCabC not that fast?
If you're taking this approach, since all you need is segmentation, you might think of using RMeCab::RMeCabC(). In fact, code like lapply(dat$text, \(x) unlist(RMeCab::RMeCabC(x))) is very frequently seen. However, this is not particularly fast for segmentation.
microbenchmark::microbenchmark(
gibasa = gibasa::tokenize(dat$text) |>
gibasa::prettify(col_select = c("POS1", "Original")) |>
dplyr::mutate(token = dplyr::if_else(is.na(Original), token, Original)) |>
gibasa::as_tokens() |>
unname(),
rmecab = lapply(dat$text, \(x) unlist(RMeCab::RMeCabC(x, mypref = 1))),
times = 1,
check = "equal"
)
#> Unit: seconds
#> expr min lq mean median uq max neval
#> gibasa 10.90253 10.90253 10.90253 10.90253 10.90253 10.90253 1
#> rmecab 15.49842 15.49842 15.49842 15.49842 15.49842 15.49842 1
Despite the fact that the code using gibasa involves more complex steps, the code using RMeCabC() is slower. To be clear, this isn't because the RMeCab implementation is bad; it's mostly due to the fact that lapply() is slow when passed a long vector. If you don't need part-of-speech information at all, you can do something like the following, but it is still slow.
microbenchmark::microbenchmark(
gibasa = {
toks <- gibasa::tokenize(dat$text, mode = "wakati")
unname(toks)
},
rmecab = lapply(dat$text, \(x) unlist(RMeCab::RMeCabC(x), use.names = FALSE)),
times = 1,
check = "equal"
)
#> Unit: seconds
#> expr min lq mean median uq max neval
#> gibasa 3.751153 3.751153 3.751153 3.751153 3.751153 3.751153 1
#> rmecab 14.697065 14.697065 14.697065 14.697065 14.697065 14.697065 1
str(toks[1:2])
#> List of 2
#> $ 1: chr [1:796] "もうすぐ" "ジューン" "・" "ブライド" ...
#> $ 2: chr [1:738] "携帯" "電話" "が" "普及" ...
Incidentally, RMeCab also has a function called RMeCab::RMeCabDF() that can do something similar to RMeCabC() using a data frame as an argument. However, this is actually a wrapper for RMeCabC() that just performs RMeCabC() |> unlist() inside a for loop for the specified data frame columns, so it doesn't change what is happening under the hood compared to the processing above.
With RMeCabDF(), you can write it as follows, but the processing time will be almost the same as in the previous examples.
df_gibasa <- function(dat, text_field, docid_field) {
text_field <- rlang::enquo(text_field)
docid_field <- rlang::enquo(docid_field)
gibasa::tokenize(dat, !!text_field, !!docid_field) |>
gibasa::prettify(col_select = "POS1") |>
dplyr::select(doc_id, token, POS1)
}
df_rmecab <- function(dat, text_field, docid_field) {
text_field <- rlang::enquo(text_field)
docid_field <- rlang::enquo(docid_field)
# if docid is a factor, preserve ordering
col_names <- rlang::as_name(docid_field)
if (is.factor(dat[[col_names]])) {
col_u <- levels(dat[[col_names]])
} else {
col_u <- unique(dat[[col_names]])
}
pos_text <- tidyselect::eval_select(text_field, dat)
docid_field <- dat[[col_names]]
RMeCab::RMeCabDF(dat, pos_text) |>
rlang::as_function(~ {
sizes <- lengths(.)
ret <- unlist(.)
dplyr::tibble(
doc_id = factor(rep(docid_field, sizes), levels = col_u),
token = unname(ret),
POS1 = names(ret)
)
})()
}
microbenchmark::microbenchmark(
gibasa = df_gibasa(dat, text, doc_id),
rmecab = df_rmecab(dat, text, doc_id),
times = 1,
check = "equal"
)
#> Unit: seconds
#> expr min lq mean median uq max neval
#> gibasa 9.395185 9.395185 9.395185 9.395185 9.395185 9.395185 1
#> rmecab 16.606452 16.606452 16.606452 16.606452 16.606452 16.606452 1
Summary
You should use gibasa
Based on the above, in practical scenarios where RMeCab::docDF() feels slow, I think it is better to use gibasa instead of RMeCab. For detailed usage of gibasa, please refer to the following:
- An Alternative Rcpp Wrapper of MeCab • gibasa
- Preprocessing for Japanese Text Mining with R and MeCab
Note that while I haven't explicitly compared them here, there are several other R packages available for Japanese segmentation and tokenization. However, except for RcppJagger, which I will introduce below, all of them are significantly slower than RMeCab when viewed as a means of segmentation. For example, while it is likely possible to get spacyr working with some effort, if you are going to use spaCy from R via reticulate, there is a suspicion that you might as well just use Python from the start, so I don't particularly recommend it.
If you're still not satisfied with gibasa
Since gibasa uses RcppParallel as a backend for multi-threaded processing on the C++ side, there is a risk of running into mysterious bugs if you are using hardware that is incompatible with oneTBB. You should be able to switch the backend by doing something like Sys.setenv(RCPP_PARALLEL_BACKEND="tinythread"), but it has not been verified whether that makes it usable.
Recently, I have also been developing an R package called vibrrt, which wraps vibrato, a morphological analyzer implemented in Rust. Since that is single-threaded, it feels safer, but I don't think it will become as fast as gibasa in the future.
If you are seeking the ultimate processing speed, the R package RcppJagger is probably the fastest for Japanese morphological analysis. Jagger is a C++ implementation of a morphological analyzer proposed in the following paper, and it is considered the fastest Japanese morphological analyzer at present.
Naoki Yoshinaga
Back to Patterns: Efficient Japanese Morphological Analysis with Feature-Sequence Trie
The 61st Annual Meeting of the Association for Computational Linguistics (ACL-23).
Toronto, Canada. July 2023
However, RcppJagger (or rather, Jagger itself) is not very easy to handle, and in particular, it is likely very difficult to prepare dictionaries in a license-compliant manner, so realistically, I feel it can only be used for research purposes.
Session information
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.4.2 (2024-10-31)
#> os Ubuntu 24.04.2 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate ja_JP.UTF-8
#> ctype ja_JP.UTF-8
#> tz Asia/Tokyo
#> date 2025-02-18
#> pandoc 3.1.3 @ /usr/bin/ (via rmarkdown)
#> quarto 1.6.40 @ /opt/quarto/bin/quarto
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> bit 4.5.0.1 2024-12-03 [1] CRAN (R 4.4.2)
#> bit64 4.6.0-1 2025-01-16 [1] CRAN (R 4.4.2)
#> cachem 1.1.0 2024-05-16 [1] CRAN (R 4.4.1)
#> cli 3.6.4 2025-02-13 [1] CRAN (R 4.4.2)
#> colorspace 2.1-1 2024-07-26 [1] CRAN (R 4.4.2)
#> crayon 1.5.3 2024-06-20 [1] CRAN (R 4.4.1)
#> digest 0.6.37 2024-08-19 [1] CRAN (R 4.4.1)
#> dplyr 1.1.4 2023-11-17 [1] CRAN (R 4.4.1)
#> evaluate 1.0.3 2025-01-10 [1] CRAN (R 4.4.2)
#> farver 2.1.2 2024-05-13 [1] CRAN (R 4.4.2)
#> fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.1)
#> forcats 1.0.0 2023-01-29 [1] CRAN (R 4.4.2)
#> generics 0.1.3 2022-07-05 [1] CRAN (R 4.4.1)
#> ggplot2 * 3.5.1 2024-04-23 [1] CRAN (R 4.4.2)
#> gibasa 1.1.2 2025-02-16 [1] https://paithiov909.r-universe.dev (R 4.4.2)
#> glue 1.8.0 2024-09-30 [1] CRAN (R 4.4.1)
#> gtable 0.3.6 2024-10-25 [1] CRAN (R 4.4.2)
#> hms 1.1.3 2023-03-21 [1] CRAN (R 4.4.1)
#> htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.1)
#> janeaustenr 1.0.0 2022-08-26 [1] CRAN (R 4.4.2)
#> jsonlite 1.8.9 2024-09-20 [1] CRAN (R 4.4.1)
#> knitr 1.49 2024-11-08 [1] CRAN (R 4.4.2)
#> labeling 0.4.3 2023-08-29 [1] CRAN (R 4.4.2)
#> lattice 0.22-5 2023-10-24 [4] CRAN (R 4.3.1)
#> ldccr 2025.02.02 2025-02-02 [1] local
#> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.4.1)
#> lobstr 1.1.2 2022-06-22 [1] CRAN (R 4.4.1)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.4.1)
#> Matrix 1.7-2 2025-01-23 [4] CRAN (R 4.4.2)
#> memoise 2.0.1 2021-11-26 [1] CRAN (R 4.4.1)
#> microbenchmark 1.5.0 2024-09-04 [1] CRAN (R 4.4.1)
#> munsell 0.5.1 2024-04-01 [1] CRAN (R 4.4.2)
#> pillar 1.10.1 2025-01-07 [1] CRAN (R 4.4.2)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.4.1)
#> prettyunits 1.2.0 2023-09-24 [1] CRAN (R 4.4.1)
#> purrr 1.0.4 2025-02-05 [1] CRAN (R 4.4.2)
#> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.4.1)
#> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.4.1)
#> R.oo 1.27.0 2024-11-01 [1] CRAN (R 4.4.1)
#> R.utils 2.12.3 2023-11-18 [1] CRAN (R 4.4.1)
#> R6 2.6.1 2025-02-15 [1] CRAN (R 4.4.2)
#> ragg 1.3.3 2024-09-11 [1] CRAN (R 4.4.1)
#> Rcpp 1.0.14 2025-01-12 [1] CRAN (R 4.4.2)
#> RcppParallel 5.1.10 2025-01-24 [1] CRAN (R 4.4.2)
#> readr 2.1.5 2024-01-10 [1] CRAN (R 4.4.1)
#> rlang 1.1.5 2025-01-17 [1] CRAN (R 4.4.2)
#> rmarkdown 2.29 2024-11-04 [1] CRAN (R 4.4.2)
#> RMeCab 1.14 2025-02-17 [1] Github (IshidaMotohiro/RMeCab@e65a5ee)
#> scales 1.3.0 2023-11-28 [1] CRAN (R 4.4.2)
#> sessioninfo 1.2.3 2025-02-05 [1] CRAN (R 4.4.2)
#> SnowballC 0.7.1 2023-04-25 [1] CRAN (R 4.4.2)
#> stringi 1.8.4 2024-05-06 [1] CRAN (R 4.4.1)
#> stringr 1.5.1 2023-11-14 [1] CRAN (R 4.4.1)
#> styler 1.10.3 2024-04-07 [1] CRAN (R 4.4.1)
#> systemfonts 1.2.1 2025-01-20 [1] CRAN (R 4.4.2)
#> textshaping 1.0.0 2025-01-20 [1] CRAN (R 4.4.2)
#> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.4.1)
#> tidyr 1.3.1 2024-01-24 [1] CRAN (R 4.4.1)
#> tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.4.1)
#> tidytext 0.4.2 2024-04-10 [1] CRAN (R 4.4.2)
#> tokenizers 0.3.0 2022-12-22 [1] CRAN (R 4.4.2)
#> tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.4.1)
#> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.4.1)
#> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.4.1)
#> vroom 1.6.5 2023-12-05 [1] CRAN (R 4.4.1)
#> withr 3.0.2 2024-10-28 [1] CRAN (R 4.4.1)
#> xfun 0.50 2025-01-07 [1] CRAN (R 4.4.2)
#> yaml 2.3.10 2024-07-26 [1] CRAN (R 4.4.1)
#>
#> [1] /home/paithiov909/R/x86_64-pc-linux-gnu-library/4.4
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library
#> * ── Packages attached to the search path.
#>
#> ──────────────────────────────────────────────────────────────────────────────
Discussion