Discussion:
stack space check fails in threaded application - tcl8.5.12
(too old to reply)
a***@dunsmoor.com
2012-08-28 22:25:03 UTC
Permalink
I just upgraded from tcl 8.4.x to tcl 8.5.12 and I'm finding that some code that used to work is now failing occasionally.

Background:

* I'm embedding tcl in a threaded application
* Compiled Tcl without thread support with gcc 4.1
* Running on a Linux RedHat 4 system with 8 cpus
* I ensure that I only call tcl from the context of a global lock. So Tcl is only "active" from 1 thread at a time but can be called from any thread.
* I set up a tcl proc that is called via Tcl_EvalObjv();

In the past this worked fine. Now I get an error return in some of my calls where the error is similar to that described in bug #1815573

Has anyone else dealt with this recently?

I'm going to recompile with -DTCL_NO_STACK_CHECK and see if that works-around the problem.

Thanks in advance.

Ahran
a***@dunsmoor.com
2012-08-29 02:20:24 UTC
Permalink
I found a post from a year ago that references the wiki. http://wiki.tcl.tk/1339

It seems my use model is in violation of the "rules". I can't see why there should be a rule that unthreaded tcl care about which thread it is called from, however. Why would unthreaded Tcl have any thread specific data?
Donal K. Fellows
2012-08-29 09:45:49 UTC
Permalink
Post by a***@dunsmoor.com
It seems my use model is in violation of the "rules". I can't see
why there should be a rule that unthreaded tcl care about which
thread it is called from, however. Why would unthreaded Tcl have any
thread specific data?
The issue is how the stack space check is implemented. It works by
taking the address of a variable on the stack very close to the base
frame of the stack, and doing a difference between that and a variable
in the current stack frame. The difference gives a crude (and
non-portable!) measure of how large the C stack has grown, provided the
two are called from within the same thread. (The main portability
problem is that the stack growth direction is not defined; different
platforms work differently.) This can be compared with what the system
limit on the stack size is (also non-portable) to guess whether there is
enough space left in the stack to evaluate some code without a nasty C
stack smash.

But this doesn't work when the code is evaluated in multiple threads, as
their stacks can be in completely different parts of memory. The
difference ends up being completely bogus (and can even have the wrong
sign). Unsurprisingly, this confuses the stack space checking code and
you get the results you observed.

I can suggest two workarounds.

1) Define TCL_NO_STACK_CHECK when building Tcl (requires a recompile
of Tcl after editing the Makefile, but that should be easy) to
disable the stack checking code completely. If you do this, be
aware that you can easily smash the real C stack with a recursive
program (especially if not running in the main thread, as worker
threads often have significantly smaller stacks) and get a real
crash (and core dump).

2) Switch to Tcl 8.6 (still in beta, alas) which doesn't do stack
checking *at all* because it has a different script evaluation
strategy that doesn't use much stack space at all. (It's really a
"stackless" Tcl, and that enables some very cool features like
tailcalls and coroutines.)

Donal.

Loading...