I have big dataset (but the following is small one for example). I can split the dataframe and then I want to output to multiple text file corresponding to lavel used to split.
mydata <- data.frame (var1 = rep(c("k", "l", "c"), each = 5), var2 = rnorm(5),
var3 = rnorm(5))
mydata
var1 var2 var3
1 k 0.5406022 0.3654706
2 k -0.6356879 -0.9160001
3 k 0.2946240 -0.1072241
4 k -0.2609121 0.1036626
5 k 0.6206579 0.6111655
6 l 0.5406022 0.3654706
7 l -0.6356879 -0.9160001
8 l 0.2946240 -0.1072241
9 l -0.2609121 0.1036626
10 l 0.6206579 0.6111655
11 c 0.5406022 0.3654706
12 c -0.6356879 -0.9160001
13 c 0.2946240 -0.1072241
14 c -0.2609121 0.1036626
15 c 0.6206579 0.6111655
Now split
> spt1 <- split(mydata, mydata$var1)
> spt1
$c
var1 var2 var3
11 c 0.5406022 0.3654706
12 c -0.6356879 -0.9160001
13 c 0.2946240 -0.1072241
14 c -0.2609121 0.1036626
15 c 0.6206579 0.6111655
$k
var1 var2 var3
1 k 0.5406022 0.3654706
2 k -0.6356879 -0.9160001
3 k 0.2946240 -0.1072241
4 k -0.2609121 0.1036626
5 k 0.6206579 0.6111655
$l
var1 var2 var3
6 l 0.5406022 0.3654706
7 l -0.6356879 -0.9160001
8 l 0.2946240 -0.1072241
9 l -0.2609121 0.1036626
10 l 0.6206579 0.6111655
I want to write.table in name of outputc
, outputk
, and outputl
. Thus output is common prefix followed by name of label for grouping variable.
write.table (spt1)
Using lapply over the names of spt1 will allow us to access the dataframes in spt1 and the name that we can use in paste to create our files.
lapply(names(spt1), function(x){write.table(spt1[[x]], file = paste("output", x, sep = ""))})
You could add a common extension in the paste if you want.
You can also use a really fast data.table
solution. In this case, there is no need to split the dataframe
into a list
.
library(data.table) # v1.9.7 (devel version)
setDT(mydata) # convert your dataframe into a data.table
# save files
mydata[, fwrite(.SD, paste0("output", var1,".csv")), by = var1]
In case you want to keep var1
in the output, you can do this:
mydata[, fwrite(copy(.SD)[, var1 := var1] paste0("output", var1,".csv")), by = var1]
ps. note that this answer uses fwrite
, which is still in the development version of data.table
. Go here for install instructions. You could simply use write.csv
or write.table
, however you probably want a fast solution in case you're dealing with a big dataset and fwrite
is certainly one of the fastest alternatives.
var1
in the output but.SD
does not contain it. You could tryc(.BY, .SD)
(not sure if that works) or use the newsplit.data.table
method (currently in the devel version https://github.com/Rdatatable/data.table/issues/1389 - Frank 2016-05-19 16:07