Sorry but this is just crashing too often to keep trying with it. I
can't get a really reproducible example, so I'll just explain the
sorts of circumstances that seem to make it happen:
1) It always seems to need to be a really large dataset. For instance,
mine was about 2.5 million rows and 20 columns.
2) My dataset has a factor that is unique to every row as a key, so a
factor with 2.5 million levels (don't know if that matters but
throwing it out there).
3) Crashes seem to happen most when trying to make a new column, and
also bizarrely when trying to use ggplot. A lot of crashes happen
when I try to plot subsets of the data with ggplot.
As one more piece of data, I tried to take a subset of my data.table,
and do str on that subset. so
d <- DT[x<10]
str(d)
and got this error:
*** caught segfault ***
address (nil), cause 'unknown'
Traceback:
1: encodeString(lev.att, na.encode = FALSE, quote = "\"")
2: str.default(object[[i]], nest.lev = nest.lev + 1, indent.str =
paste(indent.str, ".."), nchar.max = nchar.max, max.level =
max.level, vec.len = vec.len, digits.d = digits.d, give.attr =
give.attr, give.head = give.head, give.length = give.length, width
= width, envir = envir, list.len = list.len)
3: str(object[[i]], nest.lev = nest.lev + 1, indent.str =
paste(indent.str, ".."), nchar.max = nchar.max, max.level =
max.level, vec.len = vec.len, digits.d = digits.d, give.attr =
give.attr, give.head = give.head, give.length = give.length, width
= width, envir = envir, list.len = list.len)
4: str.default(d, give.length = FALSE)
5: NextMethod("str", give.length = FALSE, ...)
6: str.data.frame(d)
7: str(d)
Once again it is hard to reproduce though.
At this point I have to get some real work done so I'm reverting back
to 1.7.1 until someone comes up with a new fix or thing for me to try.
Just posting things as I find them. I run my script (and it makes it
through no complaints), but then I just try to modify it slightly more
DT[, w := x*y]
where x,y are both integer columns of DT (and w doesn't previously
'translateCharUTF8' must be called on a CHARSXP
'getCharCE' must be called on a CHARSXP
The problem is I cant get this to reproduce on simpler code. So I
just have to tell you what I see when I see it.
Post by Chris NeffOn the current latest SVN build, with debugging enabled as listed
below, I get the following when trying to even print the contents of a
'getCharCE' must be called on a CHARSXP
Never saw this error without debugging. I tried printing a few times
in a row, got this same error, and then like the 4th time it
segfaulted.
Having a hard time reproducing that, but at least it is something?
Post by Matthew DowleOne thought ... how about turning on debugging. That way when it crashes
at least you can report the file and line number. Btw, I've installed
2.12.0 on 64bit in case that managed to reproduce, but it still works
for me ok as does 32bit 2.12.0, and both 32 and 64bit 2.14.0. So we're
left with you debugging at your end, but should be fairly easy ...
sudo MAKEFLAGS='CFLAGS=-O0\ -g\ -Wall\ -pedantic' R CMD INSTALL
data.table_1.7.7.tar.gz
R -d gdb
run
Do the stuff that crashes it. Does it report a C file and line number?
Just to rule out possible svn / R CMD build strangeness, please also use
the data.table_1.7.7.tar.gz that's on CRAN. It still hasn't run checks
for 1.7.7 so on tenterhooks for that.
Just to come back, it still crashes at seemingly random times. I'm
reverting back to an earlier version (1.7.1) to see if that fixes my
problem.
Internal build of R. Can't upgrade until they do. I think it is
unlikely to see 2.14 any time soon.
On 15 December 2011 10:50, Steve Lianoglou
Post by Steve LianoglouHi,
Out of curiosity, is it impossible for you to upgrade R to the latest, or?
-steve
Post by Chris NeffI always use svn up. I'll reboot and reinstall just to make sure. As
for reproducible, it still doesn't seem to crash in any consistent
place but I'll give it a stronger try with a test data set.
All 480 tests in test.data.table() completed ok in 7.395sec
R version 2.12.1 (2010-12-16)
Platform: x86_64-pc-linux-gnu (64-bit)
[1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
[5] LC_MONETARY=C LC_MESSAGES=en_US.utf8
LC_PAPER=en_US.utf8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
[1] stats graphics grDevices utils datasets grid
methods base
[1] hexbin_1.26.0 lattice_0.19-33 RColorBrewer_1.0-5
data.table_1.7.8 ggplot2_0.8.9 reshape_0.8.4
[6] plyr_1.6
Post by Matthew DowleAnd you did an 'svn up' (or equivalent)? Grabbing daily tar.gz snapshot
from R-Forge won't include the fix yet. So svn up, then R CMD build, then
R CMD INSTALL, right? (Just checking quick basics first).
Post by Matthew DowleResult of test.data.table(), sessionInfo() and confirm it's a clean
install after a reboot to make sure no old .so is still knocking around
somehow please. Definitely installed to the right library? If it's
crashing a lot then it should be reproducible?
Still waiting for CRAN check results for 1.7.7 in old-rel. If it's not
fixed there either that'll help to know....
Latest SVN version, no alloccol set, still crashing a lot. I don't
use [<- or $<-, the only times I modify a data.table are with := or
by doing DT=merge(DT,blah).
Any more info I can provide?
Post by Matthew DowleGreat fingers and toes crossed. If you could unset alloccol option just
to
be sure please, that would be great. You're our best hope of confirming
it's fixed since it was biting you several times an hour. If you use
[<-
or $<- syntax then R will copy via *tmp* and at that point the *tmp*
data.table is similar to a data.table loaded from disk in that it isn't
over-allocated anymore, I realised. Also a copy() will lose
over-allocation until the next column addition. That 'should' all be
fine
now in both <=2.13.2 and >=2.14.0, although the bug was something
simpler.
1.7.7 is on CRAN now and been built for windows so if CRAN check
results
tick over from "ERROR" to "OK" later today (for both windows and mac
old-rel), and, you're ok too, then it's fixed.
Post by Chris NeffI've updated to the latest SVN version, and I'll be sure to let you
know if it still crashes (however I do have the alloccol option set to
1000, so I shouldn't be bumping into reallocation very often). Thanks
for finding the bug so fast!
Post by Matthew DowleHm. Sounds like it could be a different problem then if it was in R
2.14. There have been quite a few fixes since 1.7.4 so if you can
reproduce with 1.7.7 would be great. Or, we've sometimes seen that
just
after a package upgrade that a clean re-install can often fix things.
Perhaps if the .so was in use by another R process or a zombie, or
something. R seems to report data.table v1.7.4 (say) but it hasn't
fully
installed it properly and is still (perhaps partially) at 1.7.3. So
quit
all R (reboot to clear zombies too perhaps) and try reinstalling
using
R
CMD INSTALL. Next time it happens I mean. Can also run
test.data.table()
to check the install.
Post by Timothée CarayolHi --
I have been having many unreproducible bugs with R 2.14, data.table
1.7.4 and ubuntu 64 bits about 10 days ago. Data was getting
corrupted, and then R crashed. I had to go back to data.frame for
the
bits of code affected. I was doing a lot of rather unsafe
manipulations with row names, rbind and cbinds.
I didn't file a report, nor signal it, as it was occurring seemingly
at random, and I was doing operations which aren't really what
data.table was made for (tons of little manipulations on small
data);
still I guess I should now signal that 2.14 didn't fix everything
for
me. I do not know whether bugs subsist on post-1.7.4 versions.
t
On Wed, Dec 14, 2011 at 5:31 PM, Matthew Dowle
Post by Matthew DowleMaybe, worth a try. Are you loading any data.table objects from
disk?
Post by Matthew DowlePost by Chris Neff64 bit 2.12.1 linux.
Is there an option I can set in my session in order to work
around
the
Post by Matthew DowlePost by Chris Nefftruelength issue? I don't care if I lose some of the
over-allocation
Post by Matthew DowlePost by Chris Neffniceties if it stops things from crashing. Looking at the
truelength
Post by Matthew DowlePost by Chris Neffoptions(datatable.alloc=quote(1000))
stop this? I never have more than about 50 columns at a time.
You're R < 2.14.0, right? I'm really struggling in R < 2.14.0
to
make
Post by Matthew DowlePost by Chris Neffover-allocation work because R only started to initialize
truelength to
Post by Matthew DowlePost by Chris Neff0
in R 2.14.0+. Before that it's unitialized (random). Trouble is
my
Post by Matthew DowlePost by Chris Neffattempts in R < 2.14.0 to work around that work fine for me in
linux
Post by Matthew DowlePost by Chris Neff32bit
when I test in R 2.13.2, and I even test in 2.12.0 too. I test
on
64bit
Post by Matthew DowlePost by Chris Nefftoo but just 2.14.0. CRAN is also showing errors on 2.13.2
(old-rel)
Post by Matthew DowlePost by Chris Nefffor
both mac and windows.
So, this is a pre-2.14.0 (only) problem that I'll continue to
try
and
Post by Matthew DowlePost by Chris Nefffix.
Are you 64bit pre-2.14.0? Which OS? If you are 64bit linux then
it
adds
Post by Matthew DowlePost by Chris Neffweight to me installing pre-2.14.0 on my 64bit instance in an
effort to
Post by Matthew DowlePost by Chris Neffreproduce.
Post by Chris NeffThis will be a crappy help request because I can't seem to
reproduce
Post by Matthew DowlePost by Chris NeffPost by Chris Neffit, but the past few days I've been getting a lot of segfaults.
The
Post by Matthew DowlePost by Chris NeffPost by Chris Neffonly common thing between every crash is that it happens when I
do
Post by Matthew DowlePost by Chris NeffPost by Chris NeffDT[, z := x]
where z was not a column that existed in DT before, and x is
either an
Post by Matthew DowlePost by Chris NeffPost by Chris Neffexisting column of DT or a separate variable, doesn't matter.
Beyond
Post by Matthew DowlePost by Chris NeffPost by Chris Neffthat I can't reproduce a set of steps that gets R to crash.
This
is
Post by Matthew DowlePost by Chris NeffPost by Chris Neffwith the latest SVN version.
Is there more information I can provide to help track this
down?
Post by Matthew DowlePost by Chris NeffPost by Chris Neff_______________________________________________
datatable-help mailing list
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
_______________________________________________
datatable-help mailing list
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
_______________________________________________
datatable-help mailing list
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
_______________________________________________
datatable-help mailing list
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
_______________________________________________
datatable-help mailing list
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/datatable-help
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact