Bug 33751 - Classpath fails to build on Solaris 2.9 (SPARC)
Summary: Classpath fails to build on Solaris 2.9 (SPARC)
Alias: None
Product: classpath
Classification: Unclassified
Component: classpath (show other bugs)
Version: unspecified
: P3 normal
Target Milestone: ---
Assignee: Andrew John Hughes
Depends on: 33967
  Show dependency treegraph
Reported: 2007-10-12 15:26 UTC by Andrew John Hughes
Modified: 2011-07-19 17:11 UTC (History)
4 users (show)

See Also:
Host: sparc-sun-solaris2.9
Target: sparc-sun-solaris2.9
Build: sparc-sun-solaris2.9
Known to work:
Known to fail:
Last reconfirmed: 2008-02-27 23:04:24

Patch from JikesRVM (289 bytes, patch)
2007-10-16 14:18 UTC, Andrew John Hughes
Details | Diff
Testcase for reading files in a directory (209 bytes, text/x-java)
2008-02-22 02:49 UTC, Andrew John Hughes
Patch to remove readdir_r (committed) (897 bytes, patch)
2008-02-22 03:06 UTC, Andrew John Hughes
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew John Hughes 2007-10-12 15:26:30 UTC
/share/nlp/projects/cashew/solaris/bin/gcc -DHAVE_CONFIG_H -I. -I/share/nlp/projects/cashew/sources/classpath/native/jni/native-lib -I../../../include -I/share/nlp/projects/cashew/sources/classpath/include -I/share/nlp/projects/cashew/sources/classpath/native/jni/classpath -I/share/nlp/projects/cashew/sources/classpath/native/jni/native-lib -I/share/nlp/projects/cashew/solaris/lib/gcc/sparc-sun-solaris2.8/4.0.2/include/libffi -I/share/nlp/projects/cashew/solaris/include -W -Wall -Wmissing-declarations -Wwrite-strings -Wmissing-prototypes -Wno-long-long -Wstrict-prototypes -pedantic -Werror -g -O2 -MT cpio.lo -MD -MP -MF .deps/cpio.Tpo -c /share/nlp/projects/cashew/sources/classpath/native/jni/native-lib/cpio.c  -fPIC -DPIC -o .libs/cpio.o
cc1: warnings being treated as errors
/share/nlp/projects/cashew/sources/classpath/native/jni/native-lib/cpio.c: In function 'cpio_readDir':
/share/nlp/projects/cashew/sources/classpath/native/jni/native-lib/cpio.c:529: warning: implicit declaration of function 'readdir_r'
Comment 1 Ian Rogers 2007-10-16 08:33:19 UTC
There is a patch against the Jikes RVM that allows it to build with Solaris by using the slightly different declaration of readdir_r that Solaris has. The Jikes RVM patch is here:
and was contributed by Georgios Gousios.
Comment 2 Andrew John Hughes 2007-10-16 14:18:05 UTC
Created attachment 14360 [details]
Patch from JikesRVM
Comment 3 Dalibor Topic 2008-01-13 19:14:01 UTC
From looking at the readdir spec at http://www.opengroup.org/onlinepubs/009695399/functions/readdir.html I'd say that the current code is broken in several ways:

* does not set errno to 0 prior to calling readdir. violates 'Applications wishing to check for error situations should set errno to 0 before calling readdir(). If errno is set to non-zero on return, an error occurred.'

* mistreats end of directory by setting errno. violates 'Upon successful return, the pointer returned at *result shall have the same value as the argument entry. Upon reaching the end of the directory stream, this pointer shall have the value NULL.'

* allows buffer overflows via a directory path longer than FILENAME_MAX, as on systems without a path length limit such paths are possible. I'd suggest that the d_name is strdup'ed and that value returned, rather than copying it around into an array of a fixed size, due to this footnote in the ISO C 9989:1999 draft at  http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf, and due to the POSIXy NAME_MAX being -1 for implementations not having a limit on path length:

'255) If the implementation imposes no practical limit on the length of file name strings, the value of FILENAME_MAX should instead be the recommended size of an array intended to hold a file name string. Of course, file name string contents are subject to other system-specific constraints; therefore all possible strings of length FILENAME_MAX cannot be expected to be opened successfully.'

* doesn't use return value or readdir_r to determine if readdir_r succeeded, uses a check if dBuf is NULL instead. While that's a valid strategy for readdir, it's invalid for readdir_r, since dBuf can be set to NULL upon reaching the end of the directory stream after a successful readdir_r call.

* Use of readdir_r seems unwarranted to me, as readdir is threadsafe as long as it used from a single thread, which can be ensured by making the only user of cpio_readDir, the method list in the VMFile interface synchronized. That would have the added benefit of fixing any readdir_r related portability issues, and avoiding potential race issues outlined in http://womble.decadentplace.org.uk/readdir_r-advisory.html , which specifically lists gcj as vulnerable (though it uses a somewhat different implementation).

In closing, I'd suggest rewriting the list method's native code to simply use glib's GDir interface to open a directory, read the entries and close it, and to make the method synchronized to make it thread safe.
Comment 4 Dalibor Topic 2008-01-14 18:16:02 UTC
Casey, since you hacked in the fixes for darwin, would just using readdir on darwin (and making list synchornized) work, too?
Comment 5 Casey Marshall 2008-01-14 19:26:40 UTC
Yeah, just using readdir should work on Darwin, too. The errno stuff is not broken, the function just needs to return nonzero when the end of directory or an error is reached.

It would be nice to make this function work with readdir_r, given that otherwise you'll need to synchronize on a static object.

I'm not sure what you're saying about overflows. Will readdir_r overflow even a struct dirent that has a d_name of FILENAME_MAX + 1 bytes? The code is using strncpy, and should use FILENAME_MAX - 1 to ensure that the buffer ends with a null, but otherwise won't overflow anything. The exploit you linked to seems to just say "don't use pathconf". I didn't know that some platforms define d_name to be 1 byte, so that does need to be fixed.

Do we have a dependency on glib now? For all platforms? If not, then no, we shouldn't use glib.
Comment 6 Dalibor Topic 2008-01-14 20:03:15 UTC
We don't have an explicit dependency on glib, just an implicit one via the gtk peers.

Basically, beside the strncpy not 0 terminating filename if the name is at least FILENAME characters long, causing the follow up strlen to potentially crash, I think the interface of the cpio_readDir function is not as useful as it should be.

What cpio_readDir should do is to return a const char pointer representing a file name, given a directory handle, and to return NULL when it's run out of entries, or an error occurred, i.e. pretty much the plan old readdir interface. 

(Conveniently enough, glib provides exactly that in a way that works across posixy platforms by using readdir, and suitable wrapper functions for win32. Moreover, glib already implements filtering out '.' and '..', so we wouldn't need to have that code in the list method, never mind all the setup code to deal with copying strings around into a vector, etc. So I'd like to see classpath move towards glib eventually instead of the cpio layer approach with a lot of manual workarounds for lack of simple and useful facilities in POSIX, but that's a discussion for the mailing list.)

Comment 7 Casey Marshall 2008-01-14 21:47:44 UTC
I personally prefer API styles where the caller passes in storage for the value he's getting, instead of returning pointers to static data. It would be nice if we can avoid global locks on that function for systems that support it.

If glib can be used for more than replacing readdir, if someone is willing to port Classpath's native code to use it, AND if the license is compatible (isn't glib LGPL? Does that pose a problem? I can use Classpath as GPL+exception if I don't use the GTK peers. Can I use Classpath with only that license if Classpath depends on LGPL code?), then it's OK to switch to it.
Comment 8 Dalibor Topic 2008-01-15 15:01:03 UTC
Added Mark for the GPL question. 

I don't see why the licensing situation would be different if we used glib in the core from the way we are using it now (or the various other LGPLd libraries we link to (alsa, for example)), but I'd like an answer from the maintainer. :)
Comment 9 Casey Marshall 2008-01-15 23:43:54 UTC
The point is that LGPL code (alsa, gtk) is optional, and if you build Classpath with one of those optional components, the license *for the combination as a whole* is necessarily GPL+exception and LGPL. If you explicitly require an LGPL package, you don't have that option. My understanding is that the LGPL doesn't permit relicensing unless the license is the GPL itself.

This is feeling a little bikeshed at the moment, and I don't generally have anything against using a portable runtime library for the native code. Using glib sounds like it has issues, and the target for the classpath native code was notionally POSIXy platforms.
Comment 10 Andrew John Hughes 2008-02-22 02:49:19 UTC
Created attachment 15204 [details]
Testcase for reading files in a directory
Comment 11 Andrew John Hughes 2008-02-22 03:06:42 UTC
Created attachment 15205 [details]
Patch to remove readdir_r (committed)
Comment 12 Andrew John Hughes 2008-02-22 03:08:13 UTC
CVSROOT:        /sources/classpath
Module name:    classpath
Changes by:     Andrew John Hughes <gnu_andrew> 08/02/22 03:06:05

Modified files:
       .              : ChangeLog configure.ac
       native/jni/native-lib: cpio.c
       vm/reference/java/io: VMFile.java

Log message:
       2008-02-22  Andrew John Hughes  <gnu_andrew@member.fsf.org>

               PR classpath/33751:
               * configure.ac:
               Don't check for readdir_r.
               * native/jni/native-lib/cpio.c:
               (cpio_readDir): Remove use of readdir_r, zero errno
               before starting and always leave a \0 at the end after
               * vm/reference/java/io/VMFile.java:
               (list(String)): Make synchronized.

Comment 13 Andrew John Hughes 2008-02-27 23:04:24 UTC
The next failure is:

/share/nlp/projects/cashew/sources/classpath/native/jni/java-lang/gnu_java_lang_management_VMOperatingSystemMXBeanImpl.c: In function 'Java_gnu_java_lang_management_VMOperatingSystemMXBeanImpl_getSystemLoadAverage':
/share/nlp/projects/cashew/sources/classpath/native/jni/java-lang/gnu_java_lang_management_VMOperatingSystemMXBeanImpl.c:54: warning: implicit declaration of function 'getloadavg'
make[3]: *** [gnu_java_lang_management_VMOperatingSystemMXBeanImpl.lo] Error 1

getloadavg is available on Solaris, we just don't pull in the right header.
Comment 14 Andrew John Hughes 2008-02-27 23:43:13 UTC
The previous error is fixed by bringing in <sys/loadavg.h> on Solaris instead of <stdlib.h>.  The next error is:

/share/nlp/projects/cashew/sources/classpath/native/jni/java-net/gnu_java_net_VMPlainSocketImpl.c: In function 'Java_gnu_java_net_VMPlainSocketImpl_setMulticastInterface6':
/share/nlp/projects/cashew/sources/classpath/native/jni/java-net/gnu_java_net_VMPlainSocketImpl.c:415: warning: unused parameter 'ifname'
/share/nlp/projects/cashew/sources/classpath/native/jni/java-net/gnu_java_net_VMPlainSocketImpl.c: In function 'Java_gnu_java_net_VMPlainSocketImpl_joinGroup6':
/share/nlp/projects/cashew/sources/classpath/native/jni/java-net/gnu_java_net_VMPlainSocketImpl.c:735: warning: unused parameter 'ifname'
/share/nlp/projects/cashew/sources/classpath/native/jni/java-net/gnu_java_net_VMPlainSocketImpl.c: In function 'Java_gnu_java_net_VMPlainSocketImpl_leaveGroup6':
/share/nlp/projects/cashew/sources/classpath/native/jni/java-net/gnu_java_net_VMPlainSocketImpl.c:836: warning: unused parameter 'ifname'
Comment 15 Rainer Orth 2011-07-18 17:56:42 UTC
I build libgcj on Solaris 8/9/10/11, both SPARC and x86 all the time, so this
seems to be fixed on at least mainline, the 4.5 and 4.6 branches.
Comment 16 Andrew John Hughes 2011-07-18 23:30:20 UTC
Do you build with or without Werror?
Comment 17 ro@CeBiTec.Uni-Bielefeld.DE 2011-07-19 10:54:35 UTC
> --- Comment #16 from Andrew John Hughes <gnu_andrew at member dot fsf.org> 2011-07-18 23:30:20 UTC ---
> Do you build with or without Werror?

Just using the defaults, i.e. without -Werror.  libjava/configure.ac has

# -Werror causes unavoidable problems in code using alsa.
ac_configure_args="$ac_configure_args --disable-Werror"

Comment 18 Andrew John Hughes 2011-07-19 17:11:26 UTC
This is a bug against GNU Classpath, not libjava.  However, it does seem to now be disabled by default there too, though I guess it wasn't when this bug was filed.

It would be nice to check the behaviour with Werror if possible.  I no longer have access to the SPARC machine this was filed against.