Discussion:
Made a start with CHICKEN 5 proposal
Peter Bex
2014-08-23 15:35:15 UTC
Permalink
Hello hackers!

I've made a start on the wiki, at what we'd like CHICKEN 5 to be about.
Please, do not make this into another "pony page", only add things that
we really need to look at which require a rework in core which may be
backwards-incompatible.

http://wiki.call-cc.org/chicken-5-roadmap

I already fear I may have gone a bit overboard with adding too many
things I'd really like to see myself :)

I'd especially appreciate feedback on the core library names and
the things to kill from core. I will be expanding this page over the
next few days/weeks.

Cheers,
Peter
--
http://www.more-magic.net
Oleg Kolosov
2014-08-23 19:35:26 UTC
Permalink
Post by Peter Bex
Hello hackers!
I've made a start on the wiki, at what we'd like CHICKEN 5 to be about.
Please, do not make this into another "pony page", only add things that
we really need to look at which require a rework in core which may be
backwards-incompatible.
http://wiki.call-cc.org/chicken-5-roadmap
I already fear I may have gone a bit overboard with adding too many
things I'd really like to see myself :)
I'd especially appreciate feedback on the core library names and
the things to kill from core. I will be expanding this page over the
next few days/weeks.
Cheers,
Peter
Thanks for the write up, it looks really promising.

While at it I would like to propose changing build system to CMake with
the following considerations:

* Build files (CMakeLists) are cleaner and smaller (IMHO) than the
current Makefiles.
* Much faster build times, especially for larger projects (parallel
compilation and separate build directories are fully supported).
* Supports testing natively - with dependency based rebuilding and
rerunning by regex match etc. - this can come really handy during
development. Tests are also portable - no more ugly bash hacks.
* Native Windows support - works fine with a few small core patches
(alas due to egg incompatibilities I've not tried on any serious projects).
* Native MacOSX - I've not tested it much myself, but got reports that
it's working.
* Can figure out many things about host system automatically - this
can be leveraged to generate larger chicken.h and simplify files module
for example. Also we can use it's knowledge to generate library files
and native executables to simplify command line build tools (csc et al.).
* Cross-compiling works.
* Can generate project files for popular IDEs.

With some help from the compiler (namely extracting module dependencies)
it can be even better. Current implementation is quite hackish in this
regard.

By dropping some backward compatibility requirements it can be
implemented cleaner, so CHICKEN 5 branch looks like a good place for that.

------------

It would be great to clean up chicken.h and runtime.c somewhat,
currently they look quite messy to me. Let's move some ifdefed code
around or break them to few separate files. Maybe drop support for some
rare platforms - I know, CHICKEN compatibility list looks impressive,
but I doubt that all of it actually works due to bitrot. The support can
be added back later to the smaller and modular core. Also let's move all
inline C from .scm to separate files .c - this will make indexing and
searching easier and the mess more apparent.

Is swig stuff in the core really used?

I also would like to see extended C interface for introspecting
scheme-objects's for example - no concrete proposals - still investigating.

------------

I'm working on the process-stuff - basically wrapping posix_spawn, it
also looks quite compatible with the Windows version. But I need to make
this really fail-safe which turned out to be harder than I imagined.
--
Regards, Oleg
Peter Bex
2014-08-23 20:25:32 UTC
Permalink
Post by Oleg Kolosov
Thanks for the write up, it looks really promising.
While at it I would like to propose changing build system to CMake with
I'd be willing to take a close look at it, if you can send a patchset.
I don't promise anything, since as you know, we've used CMake before with
disastrous results. We can't have a build system that's dependent on the
knowledge of one person (I know, our current system doesn't completely
fit that profile either).
Post by Oleg Kolosov
* Build files (CMakeLists) are cleaner and smaller (IMHO) than the
current Makefiles.
No doubt. The current system is an ugly hack.
Post by Oleg Kolosov
* Much faster build times, especially for larger projects (parallel
compilation and separate build directories are fully supported).
(How exactly) is this guaranteed to work correctly, dependency-wise?
Post by Oleg Kolosov
* Supports testing natively - with dependency based rebuilding and
rerunning by regex match etc. - this can come really handy during
development. Tests are also portable - no more ugly bash hacks.
Can you go into detail about that?
Post by Oleg Kolosov
* Native Windows support - works fine with a few small core patches
(alas due to egg incompatibilities I've not tried on any serious projects).
Great, how does this work?
Post by Oleg Kolosov
* Native MacOSX - I've not tested it much myself, but got reports that
it's working.
What do you call "native"? I'd say our current build is about as native
as it gets.
Post by Oleg Kolosov
* Can figure out many things about host system automatically - this
can be leveraged to generate larger chicken.h and simplify files module
for example. Also we can use it's knowledge to generate library files
and native executables to simplify command line build tools (csc et al.).
* Cross-compiling works.
It would be great if you could supply some instructions (in the README?)
on how to do this, so people can test it.
Post by Oleg Kolosov
* Can generate project files for popular IDEs.
With some help from the compiler (namely extracting module dependencies)
it can be even better. Current implementation is quite hackish in this
regard.
I think we talked about that, it can't be done in a fully generalised way.
Post by Oleg Kolosov
It would be great to clean up chicken.h and runtime.c somewhat,
currently they look quite messy to me. Let's move some ifdefed code
around or break them to few separate files. Maybe drop support for some
rare platforms - I know, CHICKEN compatibility list looks impressive,
but I doubt that all of it actually works due to bitrot.
The supported platform list is _fully_ supported. We tested the
platforms before we made the 4.9.0 release. What platforms in particular
would you propose we drop?
Post by Oleg Kolosov
The support can
be added back later to the smaller and modular core. Also let's move all
inline C from .scm to separate files .c - this will make indexing and
searching easier and the mess more apparent.
I'd like that, it would also make emacs (paredit) work better. However,
I don't like the idea to move it into separate C files. This would result
in way too many files (I think what we have now is pretty much the limit of
what's acceptable, already).
Post by Oleg Kolosov
Is swig stuff in the core really used?
No clue. AFAIK we don't really support it anymore since CHICKEN 3, but
I could be totally wrong.
Post by Oleg Kolosov
I also would like to see extended C interface for introspecting
scheme-objects's for example - no concrete proposals - still investigating.
The support we have is comprehensive; core relies on it after all, and
it needs to be able to distinguish everything. It's just not very
well-documented which may be your main problem.
Post by Oleg Kolosov
------------
I'm working on the process-stuff - basically wrapping posix_spawn, it
also looks quite compatible with the Windows version. But I need to make
this really fail-safe which turned out to be harder than I imagined.
I don't know whether that's supported on all platforms we support, but
you can give it a try!

Cheers,
Peter
--
http://www.more-magic.net
Felix Winkelmann
2014-09-03 21:23:22 UTC
Permalink
Post by Peter Bex
Post by Oleg Kolosov
Is swig stuff in the core really used?
No clue. AFAIK we don't really support it anymore since CHICKEN 3, but
I could be totally wrong.
This should be dropped. I would be surprised if the SWIG module for
CHICKEN stil works, and using "bind" is in the end much simpler.


felix
Ivan Raikov
2014-08-24 03:10:49 UTC
Permalink
I think these are lofty goals, but it is way too much work for a single
release.
Perhaps modularising the compiler and refactoring the core modules should
be the goals for 5.0 release,
and points 1.3-1.8 would be done as 5.x releases leading up to 6.0.

As for library names, I favor fully spelled out names instead of
abbreviations, i.e. chicken.fixnum, chicken.flonum, etc.
Also, shouldn't it be CHICKEN.fixnum now that we are using the FORTRAN
convention? ;-)

-Ivan
Post by Peter Bex
Hello hackers!
I've made a start on the wiki, at what we'd like CHICKEN 5 to be about.
Please, do not make this into another "pony page", only add things that
we really need to look at which require a rework in core which may be
backwards-incompatible.
http://wiki.call-cc.org/chicken-5-roadmap
I already fear I may have gone a bit overboard with adding too many
things I'd really like to see myself :)
I'd especially appreciate feedback on the core library names and
the things to kill from core. I will be expanding this page over the
next few days/weeks.
Cheers,
Peter
--
http://www.more-magic.net
_______________________________________________
Chicken-hackers mailing list
https://lists.nongnu.org/mailman/listinfo/chicken-hackers
Peter Bex
2014-08-24 06:24:31 UTC
Permalink
Post by Ivan Raikov
I think these are lofty goals, but it is way too much work for a single
release.
Perhaps modularising the compiler and refactoring the core modules should
be the goals for 5.0 release,
Those are the major breaking changes. I agree we should focus on those.
Post by Ivan Raikov
and points 1.3-1.8 would be done as 5.x releases leading up to 6.0.
We should at least give it some thought, so we can communicate which
things will be deprecated, otherwise people start porting code to the
new version using constructs we want to drop.

I think refactoring the ports system isn't much work, but it would
possibly be a breaking change as well. That's why I wanted to include
it here.
Post by Ivan Raikov
As for library names, I favor fully spelled out names instead of
abbreviations, i.e. chicken.fixnum, chicken.flonum, etc.
Noted. I don't care much which way it goes, so if nobody argues
strongly for the abbreviations, we'll use the fully spelled out forms.
Post by Ivan Raikov
Also, shouldn't it be CHICKEN.fixnum now that we are using the FORTRAN
convention? ;-)
hehe, clever :)

Cheers,
Peter
--
http://www.more-magic.net
John Cowan
2014-08-24 07:50:29 UTC
Permalink
Post by Peter Bex
Post by Ivan Raikov
As for library names, I favor fully spelled out names instead of
abbreviations, i.e. chicken.fixnum, chicken.flonum, etc.
Noted. I don't care much which way it goes, so if nobody argues
strongly for the abbreviations, we'll use the fully spelled out forms.
I like the full names better. What I don't like, and don't understand
the reason for, is the dots. Why does the R7RS egg map (scheme base)
into scheme.base instead of scheme-base? Using the hyphen bidirectionally
makes the existing eggs visible to R7RS in a very nice way:

Chicken R7RS
sdl (sdl)
sdl-base (sdl base)
x11-colors (x11 colors)
9p (9p)
http-client (http client)
parley (parley)
awful (awful)
awful-path-matchers (awful path matchers)
srfi-19 (srfi 19)
srfi-25 (srfi 25)

etc. etc. Admittedly, some are not so great-looking, like (persistent
hash map), but this would be an incentive to rename it to (map hash
persistent), known on the Chicken side as map-hash-persistent.

Can we fix this in the R7RS egg before it's locked in forever? Or
alternatively, can someone explain why using dot is necessary or even a
good idea? Thanks.
--
John Cowan http://www.ccil.org/~cowan ***@ccil.org
On the Semantic Web, it's too hard to prove you're not a dog.
--Bill de hOra
Alex Shinn
2014-08-24 23:27:17 UTC
Permalink
Post by John Cowan
Post by Peter Bex
Post by Ivan Raikov
As for library names, I favor fully spelled out names instead of
abbreviations, i.e. chicken.fixnum, chicken.flonum, etc.
Noted. I don't care much which way it goes, so if nobody argues
strongly for the abbreviations, we'll use the fully spelled out forms.
I like the full names better. What I don't like, and don't understand
the reason for, is the dots. Why does the R7RS egg map (scheme base)
into scheme.base instead of scheme-base?
Specifically because scheme.base doesn't conflict with
the existing eggs.

Using the hyphen bidirectionally
Post by John Cowan
Chicken R7RS
sdl (sdl)
sdl-base (sdl base)
x11-colors (x11 colors)
9p (9p)
http-client (http client)
parley (parley)
awful (awful)
awful-path-matchers (awful path matchers)
You're inventing namespaces here that didn't exist before. These
names were all created using a flat namespace, so one would
expect them to map to (sdl-base), etc. In these examples it may
work to assume "-" indicates nesting, but that fails for many other
examples, such as

F-operator
define-record-and-printer
static-modules
strictly-pretty
sql-de-lite
etc.

srfi-19 (srfi 19)
Post by John Cowan
srfi-25 (srfi 25)
SRFI's are special anyway, but if we wanted we could provide both
names for them.

The other concern was the ambiguity of using the most
common intra-namespace separator as the namespace
separator. If you have the likely module names (html-parser)
and (html parser), they would map to the same Chicken name
using your proposal.
--
Alex
John Cowan
2014-08-25 13:08:41 UTC
Permalink
Post by Alex Shinn
F-operator
define-record-and-printer
static-modules
strictly-pretty
sql-de-lite
Okay, I'm convinced. Dot it is.
Post by Alex Shinn
SRFI's are special anyway, but if we wanted we could provide both
names for them.
Indeed, it turns out that the r7rs egg special-cases them so that
(srfi n) becomes srfi-n, not srfi.n. Fair enough.
Post by Alex Shinn
If you have the likely module names (html-parser) and (html parser),
they would map to the same Chicken name using your proposal.
Concedo.
--
John Cowan http://www.ccil.org/~cowan ***@ccil.org
The man that wanders far from the walking tree
--first line of a non-existent poem by me
John Cowan
2014-08-24 08:26:58 UTC
Permalink
Post by Peter Bex
I'd especially appreciate feedback on the core library names and
the things to kill from core. I will be expanding this page over the
next few days/weeks.
I've added lots of comments. Feel free to merge them in or strike them out.
--
John Cowan http://www.ccil.org/~cowan ***@ccil.org
With techies, I've generally found
If your arguments lose the first round
Make it rhyme, make it scan / Then you generally can
Make the same stupid point seem profound! --Jonathan Robie
Peter Bex
2014-08-24 09:33:59 UTC
Permalink
Post by John Cowan
Post by Peter Bex
I'd especially appreciate feedback on the core library names and
the things to kill from core. I will be expanding this page over the
next few days/weeks.
I've added lots of comments. Feel free to merge them in or strike them out.
Thank you for the feedback. I've added my replies inline. I've added
fluid-let to the list of things to be removed. It's dangerously unsafe,
and it is completely unnecessary to have it in core. There are a handful
of uses of it in core that are potentially disastrous, so we should
rewrite those bits.

Cheers,
Peter
--
http://www.more-magic.net
Peter Bex
2014-08-24 09:55:56 UTC
Permalink
Post by Peter Bex
Thank you for the feedback. I've added my replies inline. I've added
fluid-let to the list of things to be removed. It's dangerously unsafe,
and it is completely unnecessary to have it in core. There are a handful
of uses of it in core that are potentially disastrous, so we should
rewrite those bits.
It's not as bad as I initially thought; the current-*-port is handled as
a special case, so it's actually safe to update them with fluid-let.
This could be simplified by using parameters, though.

The compiler uses it in a few places to temporarily override the value
of a global variable. This is fine, but if we ever decide to turn the
compiler into a library, it wouldn't be thread-safe. So this might
better be rewritten.

Cheers,
Peter
--
http://www.more-magic.net
John Cowan
2014-08-24 16:21:24 UTC
Permalink
Post by Peter Bex
It's not as bad as I initially thought; the current-*-port is handled as
a special case, so it's actually safe to update them with fluid-let.
This could be simplified by using parameters, though.
Actually, current-*-port are already parameters (or at any rate conform
to the parameter API), so we can just replace fluid-let of them with
parameterize directly.
--
John Cowan http://www.ccil.org/~cowan ***@ccil.org
Original line from The Warrior's Apprentice by Lois McMaster Bujold:
"Only on Barrayar would pulling a loaded needler start a stampede toward one."
English-to-Russian-to-English mangling thereof: "Only on Barrayar you risk to
lose support instead of finding it when you threat with the charged weapon."
John Cowan
2014-08-24 16:31:56 UTC
Permalink
Post by Peter Bex
Thank you for the feedback. I've added my replies inline.
My only response is about blobs vs. u8vectors: I am arguing that
there is no reason why these should be disjoint types in future.
--
John Cowan http://www.ccil.org/~cowan ***@ccil.org
We call nothing profound that is not wittily expressed.
--Northrop Frye (improved)
Felix Winkelmann
2014-09-03 21:34:53 UTC
Permalink
From: Peter Bex <***@xs4all.nl>
Subject: Re: [Chicken-hackers] Made a start with CHICKEN 5 proposal
Date: Sun, 24 Aug 2014 11:33:59 +0200
Post by Peter Bex
Post by John Cowan
Post by Peter Bex
I'd especially appreciate feedback on the core library names and
the things to kill from core. I will be expanding this page over the
next few days/weeks.
I've added lots of comments. Feel free to merge them in or strike them out.
Thank you for the feedback. I've added my replies inline. I've added
fluid-let to the list of things to be removed. It's dangerously unsafe,
and it is completely unnecessary to have it in core. There are a handful
of uses of it in core that are potentially disastrous, so we should
rewrite those bits.
Peter, it's just a binding construct and very useful in certain
situations (like binding lexically visible variables in nested
procedures).


felix
Oleg Kolosov
2014-08-25 06:31:44 UTC
Permalink
Post by Peter Bex
Hello hackers!
I've made a start on the wiki, at what we'd like CHICKEN 5 to be about.
Please, do not make this into another "pony page", only add things that
we really need to look at which require a rework in core which may be
backwards-incompatible.
http://wiki.call-cc.org/chicken-5-roadmap
I already fear I may have gone a bit overboard with adding too many
things I'd really like to see myself :)
I'd especially appreciate feedback on the core library names and
the things to kill from core. I will be expanding this page over the
next few days/weeks.
Cheers,
Peter
I have a few questions:

Is there a way to get rid of ##sys# prefix everywhere? It might be
matter of preference, but it makes the sources harder to read.

Regarding continuations I think that call/cc is an advanced feature
people brag about when advocating Scheme but not so useful in practise.
So a "better API" is just TL;DR for most. Isn't the new API makes life
easier for implementors too? And benefit greatly from direct support
from performance standpoint. Especially considering that
handle-exceptions form essentially combines capture and graft in an
ad-hoc manner. So maybe we could go other way and define call/cc in
terms of capture and return just as in Feeley paper and discourage it's
usage in favour of simpler API.

Mentioning exceptions, I find whole SRFI-12 quite useless - it's just
too clunky to use. And looking at newer proposals I find it
disappointing that Scheme community, given it's rich computer science
heritage, haven't produced anything better than condition properties. I
still want to research the topic a bit more, and studying other
implementations, maybe someone could help me with some links or examples
of interesting solutions of the problem?

Regarding egg versioning, let's make CHICKEN itself a dependency too,
with the default (if not declared) to <= 4. This way the egg authors can
clearly declare which releases they support.
--
Regards, Oleg
Peter Bex
2014-08-25 11:30:31 UTC
Permalink
Post by Oleg Kolosov
Is there a way to get rid of ##sys# prefix everywhere? It might be
matter of preference, but it makes the sources harder to read.
It was originally used as a "guaranteed" separate namespace. R5RS
doesn't allow # to appear in an identifier and has no way of "escaping"
them, so portable programs cannot ever clash with the core names.
When CHICKEN grew a module system, this was kept around for backwards
compatibility but also because "library" is by default loaded at
toplevel. This means that a set! or define on any name exported by
library would override the given internal definition.

I think if we finally split up core into multiple modules (and make them
into "proper" modules, using the module form), this may be less of a
problem. However, we must take care that the modules loaded by default
at toplevel do not expose any non-R5RS names. Otherwise, we're back at
square one. (toplevel scripts that USE or IMPORT some CHICKEN names
are expected to know what they're importing, so if they override one of
the builtin names, we may assume that's probably intentional)

So, I'm in favor of this idea, but we have to be *very* careful to not
break existing and future R5RS/non-modular applications.
Post by Oleg Kolosov
Regarding continuations I think that call/cc is an advanced feature
people brag about when advocating Scheme but not so useful in practise.
Heresy! :)
Post by Oleg Kolosov
So a "better API" is just TL;DR for most. Isn't the new API makes life
easier for implementors too? And benefit greatly from direct support
from performance standpoint.
I think this would require a complete overhaul of the compiler (the CPS
internal representation probably needs to be changed), which I think is
unnecessary, and whether it gives any benefits seems to be questionable
at best. Besides, today's consensus seems that call/cc is too
general/problematic and delimited continuations are the way to go. I
really dislike going with whatever idea is fashionable at the moment,
that would keep us busy forever. Any changes we do in this should
provide real, tangible, immediate benefits.

In short, it's too much work for no benefit, and we already have enough
on our plate for CHICKEN 5 and perhaps even CHICKEN 6.
Post by Oleg Kolosov
Mentioning exceptions, I find whole SRFI-12 quite useless - it's just
too clunky to use.
I really like the SRFI-12 system. It's simple. The API for constructing
conditions is a little clunky, but I think that can be easily improved
with a simpler way to construct them (I've been planning on doing this
for over a year now, so I guess I should post a patch to do that soon).
Post by Oleg Kolosov
And looking at newer proposals I find it
disappointing that Scheme community, given it's rich computer science
heritage, haven't produced anything better than condition properties.
What's wrong with condition properties? One alternative to that is the
R6RS exception system, which is overly complex and requires inheritance.
It's like they had Java envy when writing it:
http://www.r6rs.org/final/html/r6rs-lib/r6rs-lib-Z-H-8.html#node_idx_382
Post by Oleg Kolosov
I still want to research the topic a bit more, and studying other
implementations, maybe someone could help me with some links or examples
of interesting solutions of the problem?
I don't know of any.
Post by Oleg Kolosov
Regarding egg versioning, let's make CHICKEN itself a dependency too,
with the default (if not declared) to <= 4. This way the egg authors can
clearly declare which releases they support.
I'd have to think about that. Perhaps this could be another way to
simplify fetching of eggs: henrietta-cache chould just fetch all the eggs
that have ever existed. Then only henrietta itself would need to be
changed. However, the disadvantage is that this makes it harder to
easily list which eggs are available for any given (major) CHICKEN
version. By simply separating the master egg list by version it is
trivial to check this.

Cheers,
Peter
--
http://www.more-magic.net
Oleg Kolosov
2014-08-26 06:29:23 UTC
Permalink
Post by Peter Bex
Post by Oleg Kolosov
Is there a way to get rid of ##sys# prefix everywhere? It might be
matter of preference, but it makes the sources harder to read.
It was originally used as a "guaranteed" separate namespace. R5RS
doesn't allow # to appear in an identifier and has no way of "escaping"
them, so portable programs cannot ever clash with the core names.
When CHICKEN grew a module system, this was kept around for backwards
compatibility but also because "library" is by default loaded at
toplevel. This means that a set! or define on any name exported by
library would override the given internal definition.
I suspected that it is related to name clashes, now I see how exactly,
thanks for the detailed explanation!
Post by Peter Bex
Post by Oleg Kolosov
So a "better API" is just TL;DR for most. Isn't the new API makes life
easier for implementors too? And benefit greatly from direct support
from performance standpoint.
I think this would require a complete overhaul of the compiler ...
No-no. No need to change anything internally. I was thinking about the
documentation issue. Sorry, I wasn't clear enough. Do not sweep those
under the rug too far, but expand the documentation and try to put the
API to good use (or let the users try). Considering how small the
CHICKEN implementation is. I'm fine with the eggification as long as
benefit of direct support by the core is not lost. For example how
exactly it interacts with the dynamic environment (winds), threads?
There a lot of material on the topic but it usually contains computer
science. After some fiddling with the primitives described in Feeley's
paper it looks like after some explanation it should be a lot easier to
understand for people coming from mainstream languages compared to the
single ultimate control flow operator - call/cc.
Post by Peter Bex
Post by Oleg Kolosov
Mentioning exceptions, I find whole SRFI-12 quite useless - it's just
too clunky to use.
I really like the SRFI-12 system. It's simple. The API for constructing
conditions is a little clunky, but I think that can be easily improved
with a simpler way to construct them (I've been planning on doing this
for over a year now, so I guess I should post a patch to do that soon).
What's wrong with condition properties? One alternative to that is the
R6RS exception system, which is overly complex and requires inheritance.
http://www.r6rs.org/final/html/r6rs-lib/r6rs-lib-Z-H-8.html#node_idx_382
Please consider something like:

(with-exception-handler
(match-lambda
('some-error (print "error"))
((and (? (cut memq 'file <>)) err) (print "file " err))
((and ($ some-type) (= some-type-first err)) (print err)))
(lambda ()
(signal 'some-error) ; primitive error
(signal '(exn file whatever)) ; "composite" error
(signal (make-some-type '(exn net i/o whatever) more information))))

Sorry my lame use of match, there should be a better way, but it gives
the idea: if user already can throw whatever he pleases, why bother with
conditions at all? For interoperability the standard could provide a
list of recommended symbols (the things to match) and the intended usage
patterns, like CHICKEN do: (exn arity), (exn type), (exn syntax) etc.
Better yet - provide specialized match operator for decomposing
implementation specific error type and let the user spell out only
symbols, like:

(match '(exn i/o net)
((or (! file) (! net)) (print "open error"))

No need for predicates, accessors, composite constructors, etc. and
condition-case (guard) starting to look redundant.

Condition properties (like location, arguments, etc.) look useful at
first sight, but I could not think of any use for them besides printing.
I think the implementation could find a way to provide this information
without burdening the user.
--
Regards, Oleg
Peter Bex
2014-08-26 07:06:20 UTC
Permalink
[explanation about ##sys# prefix]
Post by Oleg Kolosov
I suspected that it is related to name clashes, now I see how exactly,
thanks for the detailed explanation!
You're welcome. We agree that removing the prefix is a good goal, but
it might not be 100% achievable.
Post by Oleg Kolosov
Post by Peter Bex
Post by Oleg Kolosov
So a "better API" is just TL;DR for most. Isn't the new API makes life
easier for implementors too? And benefit greatly from direct support
from performance standpoint.
I think this would require a complete overhaul of the compiler ...
No-no. No need to change anything internally. I was thinking about the
documentation issue.
Ah, I didn't understand that. If we make it part of an egg it gets its
own dedicated wiki page. That could have as long an explanation as you
want.
Post by Oleg Kolosov
Sorry my lame use of match, there should be a better way, but it gives
the idea: if user already can throw whatever he pleases, why bother with
conditions at all? For interoperability the standard could provide a
list of recommended symbols (the things to match) and the intended usage
patterns, like CHICKEN do: (exn arity), (exn type), (exn syntax) etc.
Better yet - provide specialized match operator for decomposing
implementation specific error type and let the user spell out only
(match '(exn i/o net)
((or (! file) (! net)) (print "open error"))
No need for predicates, accessors, composite constructors, etc. and
condition-case (guard) starting to look redundant.
Your matchable example assumes the code raising the condition is written
by the same people who wrote the code which is handling it (or at least
they read the docs). If conditions belong to a particular type, generic
code can catch them and at least inspect the components it knows about.
You're right that the ability to raise arbitrary objects might cause
trouble for code dealing with conditions, and that's a bit of a mistake.
Post by Oleg Kolosov
Condition properties (like location, arguments, etc.) look useful at
first sight, but I could not think of any use for them besides printing.
They're extremely useful when you go beyond the usual simple errors.
For example, the http-client egg provides the HTTP status code in a
condition property. The postgresql egg provides the entire information
you can obtain from the server in condition properties, like error class,
error status, primary and detail message, "hint" for the user, position
of the cursor where the error was triggered, source file and line where
the error was reported, table and column name etc.

Most of the stuff you'll find in a condition object is information for
the user, to aid debugging, so of course it's mostly useful for printing!
However, there are situations where you'd like to log more information
or statistics about errors, and then it's useful to have machine-readable
properties. If CHICKEN ever gets a debugger, it would be able to do
more with the conditions, like jumping to the source location stored in
the exn component.

I really like the fact that the Scheme tradition is about arbitrarily
composable conditions: you can combine an i/o error with a file error,
giving (exn i/o file), but you can do the same if there's for example
a hardware error with an attached printer, giving (exn i/o printer).
If the i/o type contained specific details (which it currently doesn't),
both types would carry the same properties, so a condition-case of
(exn i/o) would be able to catch both exceptions and display the common
properties, regardless of the other types it has.

This is strictly more powerful than a typical one-dimensional class
hierarchy you'll find in most OO languages, where an exception class
must inherit from one (and only one) other exception. The example I gave
would be possible if you have (exn) -> (exn i/o) -> (exn i/o print),
however, this precludes non-i/o type of print errors like
(exn queue print) or (exn permission print), where the "print" component
carries the same properties as the print component from (exn i/o print).
R6RS, in typical feature-piling fashion, allows both a composite list of
conditions, which also belong to a hierarchy, because why not?

The biggest problem with the SRFI-12 conditions is that the properties
are a matter of convention: there is no guarantee that the "exn"
condition really contains any of the usual properties (and in fact, most
or all of them are optional).
Post by Oleg Kolosov
I think the implementation could find a way to provide this information
without burdening the user.
Don't hand-wave this away. A replacement API should be well thought-out,
and deal with all the use cases of the current system.

Cheers,
Peter
--
http://www.more-magic.net
John Cowan
2014-08-26 15:21:21 UTC
Permalink
Post by Peter Bex
R6RS, in typical feature-piling fashion, allows both a composite list of
conditions, which also belong to a hierarchy, because why not?
And it's a horrible mess, because deriving from the record type &condition
is a magic back door.
--
John Cowan http://www.ccil.org/~cowan ***@ccil.org
Si hoc legere scis, nimium eruditionis habes.
Oleg Kolosov
2014-08-26 21:50:13 UTC
Permalink
Post by Peter Bex
Ah, I didn't understand that. If we make it part of an egg it gets its
own dedicated wiki page. That could have as long an explanation as you
want.
This could be even better. Looks like I've made a lot of noise for
nothing. Will study the implementation harder. Maybe I will be able to
write something there.
Post by Peter Bex
... patterns, like CHICKEN do: (exn arity), (exn type), (exn syntax) etc.
Better yet - provide specialized match operator for decomposing
implementation specific error type and let the user spell out only
(match '(exn i/o net)
((or (! file) (! net)) (print "open error"))
No need for predicates, accessors, composite constructors, etc. and
condition-case (guard) starting to look redundant.
Your matchable example assumes the code raising the condition is written
by the same people who wrote the code which is handling it (or at least
they read the docs). If conditions belong to a particular type, generic
code can catch them and at least inspect the components it knows about.
You're right that the ability to raise arbitrary objects might cause
trouble for code dealing with conditions, and that's a bit of a mistake.
It is a good point. But this use case is the one that I'm trying to
improve here.
Post by Peter Bex
Condition properties (like location, arguments, etc.) look useful at
first sight, but I could not think of any use for them besides printing.
They're extremely useful when you go beyond the usual simple errors.
For example, ... [ practical examples of composite conditions skipped ]
I agree, all of this clearly shows that the condition system could be
put to good use, looks like we have complex cases pretty much covered.

But what if I'm writing something like a shell scripts or quickly
prototyping? For example consider an embedded system where the script
can have some resources opened which can not be cleared by the OS (real
world non embedded example: loop mount file -> check some files -> copy
them -> unmount -- all of this should be a transaction with graceful
rollback). A crash is not an option, so I still need to handle all
errors carefully and want to distinguish between syntax (arity, type,
whatever), library (file not found, socket already in use, etc.) and my
own errors. Because the exception handler receives a single argument
which can be a condition, this pretty much forces me to use conditions
too. And given many things that can go wrong, this quickly goes out of
hand without special macros or helper procedures - handling code could
be larger than application logic.

The essential part is the "type" of the error that happened, ideally a
list of symbols. It is equally important for the caller and the handler.
The symbols are words that form vocabulary. It is easier to agree on
than data structures. There are a rich set of procedures for dealing
with lists. They are simple to understand, and the benefit of
composition is not lost: just cons and throw. There could be other
interesting usages like map/filter and re-throw, etc. Simple syntax.
This could be more than enough for many programs.

So the condition system is really flexible and powerful, especially for
larger programs, but bulky and restrictive for smaller. And the
standardization of properties will make the situation even worse, IMO.

Let the Java be an extreme example: it is almost impossible to write
robust code without an IDE assistance considering multitude of error
conditions, so it is quite common to declare everything as runtime error
(which is not checked by compiler) when prototyping and forget to go
back and handle everything properly once the initial implementation is
complete.

Given that we are using CHICKEN Scheme in production for almost 2 years
now, as the project grows, I am starting to see this kind of problems,
hence the argument that error handling should be as unobtrusive as
possible. This is anecdotal but still valid point IMO.
Post by Peter Bex
The biggest problem with the SRFI-12 conditions is that the properties
are a matter of convention: there is no guarantee that the "exn"
condition really contains any of the usual properties (and in fact, most
or all of them are optional).
I view the providing of supplementary error information as a separate
issue. It can be extremely useful as in http client and postgres
examples but also very implementation specific as in debugger example.
It depends on the good discipline of the module implementor to provide
it, not everyone will bother, sometimes it does not make sense, but just
the possibility of having it makes the life harder IMO.
Post by Peter Bex
I think the implementation could find a way to provide this information
without burdening the user.
Don't hand-wave this away. A replacement API should be well thought-out,
and deal with all the use cases of the current system.
This is a complex problem, I have no experience in developing a full
blown programming language implementation, need more time to think about it.

Or maybe I'm missing something obvious and making an elephant out of a
fly, sorry for the wall of text then.
--
Regards, Oleg
Peter Bex
2014-08-27 07:00:53 UTC
Permalink
Post by Oleg Kolosov
Post by Peter Bex
Ah, I didn't understand that. If we make it part of an egg it gets its
own dedicated wiki page. That could have as long an explanation as you
want.
This could be even better. Looks like I've made a lot of noise for
nothing. Will study the implementation harder. Maybe I will be able to
write something there.
Thanks!
Post by Oleg Kolosov
Post by Peter Bex
Post by Oleg Kolosov
Condition properties (like location, arguments, etc.) look useful at
first sight, but I could not think of any use for them besides printing.
They're extremely useful when you go beyond the usual simple errors.
For example, ... [ practical examples of composite conditions skipped ]
I agree, all of this clearly shows that the condition system could be
put to good use, looks like we have complex cases pretty much covered.
But what if I'm writing something like a shell scripts or quickly
prototyping? For example consider an embedded system where the script
can have some resources opened which can not be cleared by the OS (real
world non embedded example: loop mount file -> check some files -> copy
them -> unmount -- all of this should be a transaction with graceful
rollback). A crash is not an option, so I still need to handle all
errors carefully and want to distinguish between syntax (arity, type,
whatever), library (file not found, socket already in use, etc.) and my
own errors. Because the exception handler receives a single argument
which can be a condition, this pretty much forces me to use conditions
too. And given many things that can go wrong, this quickly goes out of
hand without special macros or helper procedures - handling code could
be larger than application logic.
It sounds like you're looking for condition-case. I think this takes
care of the handling of various different kinds of exceptions in a very
elegant way, and it's actually my favorite part of the whole condition
system in CHICKEN (though, strictly speaking, not part of SRFI-12):
http://wiki.call-cc.org/man/4/Exceptions#additional-api
Post by Oleg Kolosov
The essential part is the "type" of the error that happened, ideally a
list of symbols. It is equally important for the caller and the handler.
The symbols are words that form vocabulary. It is easier to agree on
than data structures.
That's exactly what condition-case leverages in order to offer its
convenience.
Post by Oleg Kolosov
So the condition system is really flexible and powerful, especially for
larger programs, but bulky and restrictive for smaller. And the
standardization of properties will make the situation even worse, IMO.
How can standarization make it worse?
Post by Oleg Kolosov
Given that we are using CHICKEN Scheme in production for almost 2 years
now, as the project grows, I am starting to see this kind of problems,
hence the argument that error handling should be as unobtrusive as
possible. This is anecdotal but still valid point IMO.
Most definitely! CHICKEN is promoted as a "*practical*, portable Scheme
system", and experiences in practice should drive our design and progress.
If something turns out to be unwieldy, it has to change.
Post by Oleg Kolosov
I view the providing of supplementary error information as a separate
issue. It can be extremely useful as in http client and postgres
examples but also very implementation specific as in debugger example.
It depends on the good discipline of the module implementor to provide
it, not everyone will bother, sometimes it does not make sense, but just
the possibility of having it makes the life harder IMO.
I disagree. Any really *mature* library should raise fully decorated
and conditions that make different situations easily distinguishable, so
that whatever goes wrong, a program will always be able to either
recover from it, or provide detailed information to the user as to what
went wrong and why.

This should not get in the way when writing quick scripts, of course!
However, with condition-case I find it easy to only handle the type of
exceptions that I'm interested in and ignore others (which will cause
them to bubble up the environment of handlers, to the next one that
decides to handle it, or bomb out with an error message).
Post by Oleg Kolosov
Post by Peter Bex
Don't hand-wave this away. A replacement API should be well thought-out,
and deal with all the use cases of the current system.
This is a complex problem, I have no experience in developing a full
blown programming language implementation, need more time to think about it.
Or maybe I'm missing something obvious and making an elephant out of a
fly, sorry for the wall of text then.
I really get the feeling you've missed out on the existence of
condition-case :)

Cheers,
Peter
--
http://www.more-magic.net
Oleg Kolosov
2014-08-28 05:41:01 UTC
Permalink
Post by Peter Bex
But what if I'm writing something like a shell scripts ...
It sounds like you're looking for condition-case. I think this takes
care of the handling of various different kinds of exceptions in a very
elegant way, and it's actually my favorite part of the whole condition
http://wiki.call-cc.org/man/4/Exceptions#additional-api
Yes, looks like the condition-case is the way to go for the mentioned
use cases. I've never seen it in real code though. Not sure why people
avoid it.
Post by Peter Bex
The essential part is the "type" of the error that happened, ideally a
list of symbols. It is equally important for the caller and the handler.
The symbols are words that form vocabulary. It is easier to agree on
than data structures.
That's exactly what condition-case leverages in order to offer its
convenience.
I was trying to prove the point that having the simplest data model will
pay off in the end, because I find the conditions (and records) very
awkward to use in Scheme. This can be improved with macros and special
forms of course, but isn't the LISP is all about manipulating a lists of
things?

Let's stop this. I feel guilty of taking the discussion too far from
it's intended subject.

Could you please look at an adjacent thread by Mario about the behaviour
of set! on unbound variables?
--
Regards, Oleg
Peter Bex
2014-08-28 06:44:43 UTC
Permalink
Post by Oleg Kolosov
Post by Peter Bex
It sounds like you're looking for condition-case. I think this takes
care of the handling of various different kinds of exceptions in a very
elegant way, and it's actually my favorite part of the whole condition
http://wiki.call-cc.org/man/4/Exceptions#additional-api
Yes, looks like the condition-case is the way to go for the mentioned
use cases. I've never seen it in real code though. Not sure why people
avoid it.
I've seen it used in several eggs, and I always use it whenever I want
to catch exceptions of a specific type. Core doesn't use it because it
doesn't really catch any specific exceptions, it just generates them :)
Post by Oleg Kolosov
Post by Peter Bex
Post by Oleg Kolosov
The essential part is the "type" of the error that happened, ideally a
list of symbols. It is equally important for the caller and the handler.
The symbols are words that form vocabulary. It is easier to agree on
than data structures.
That's exactly what condition-case leverages in order to offer its
convenience.
I was trying to prove the point that having the simplest data model will
pay off in the end, because I find the conditions (and records) very
awkward to use in Scheme. This can be improved with macros and special
forms of course, but isn't the LISP is all about manipulating a lists of
things?
I sort-of understand what you mean.
Post by Oleg Kolosov
Let's stop this. I feel guilty of taking the discussion too far from
it's intended subject.
OK, end of discussion :)
Post by Oleg Kolosov
Could you please look at an adjacent thread by Mario about the behaviour
of set! on unbound variables?
We've talked about this on IRC a bit. I think it is desirable to fix it,
but this may be very difficult. Christian said he'd dig into it, and he
also came to the conclusion that further refactorings are necessary
before it is achievable. In particular, the removal of
##sys#alias-global-hook (ticket #1131), which has been a bit of a pet
project of mine (the compiler modularisation is the first step towards
that, strange as that may sound - this allows us to remove a small piece
of functionality in ##sys#alias-global-hook, which would allow us to
also fix #1077). So, we're working on it :)

Cheers,
Peter
--
http://www.more-magic.net
John Cowan
2014-08-26 14:45:52 UTC
Permalink
Post by Oleg Kolosov
Sorry my lame use of match, there should be a better way, but it gives
the idea: if user already can throw whatever he pleases, why bother with
conditions at all?
Because they are a convenient way to pack up miscellaneous information
about the failure situation, that's all. If you prefer SRFI-9 or SRFI-99
records, you can use them too.

For standardization purposes, I am mostly going with having predicates
and accessors, simply because they can hide almost any kind of condition
system under the covers. One of the barriers to R6RS adoption was the
imposition of a highly specific condition system incompatible with what
the implementation already provided.
--
John Cowan http://www.ccil.org/~cowan ***@ccil.org
SAXParserFactory [is] a hideous, evil monstrosity of a class that should
be hung, shot, beheaded, drawn and quartered, burned at the stake,
buried in unconsecrated ground, dug up, cremated, and the ashes tossed
in the Tiber while the complete cast of Wicked sings "Ding dong, the
witch is dead." --Elliotte Rusty Harold on xml-dev
John Cowan
2014-08-25 12:46:25 UTC
Permalink
Post by Oleg Kolosov
Regarding continuations I think that call/cc is an advanced feature
people brag about when advocating Scheme but not so useful in practise.
Well, I've been using it a lot in writing my set/bag package for SRFI 113,
albeit in a very stereotyped way:

(call/cc
(lambda (return)
(hash-table-for-each
(lambda (key value)
...
(if (whatever) (return))
...))))

In short, where in a C-like language I would use "break". When iterating
down lists I can use named-let, but when using the internal iterator
of hash tables, there is no protocol for stopping the iteration,
so it's necessary to escape directly.
Post by Oleg Kolosov
Mentioning exceptions, I find whole SRFI-12 quite useless - it's just
too clunky to use. And looking at newer proposals I find it
disappointing that Scheme community, given it's rich computer science
heritage, haven't produced anything better than condition properties.
IMO condition objects are not very Schemely. It would be much better
to allow `raise` to accept multiple arguments and deliver multiple
values to the exception handler. Then the messiness of condition objects
and their incredibly verbose accessors (`hash-table-key-not-found-key',
anybody?) go away.
--
John Cowan http://www.ccil.org/~cowan ***@ccil.org
Mark Twain on Cecil Rhodes: I admire him, I freely admit it,
and when his time comes I shall buy a piece of the rope for a keepsake.
Mario Domenech Goulart
2014-08-26 17:10:53 UTC
Permalink
Hi,
Post by Peter Bex
I've made a start on the wiki, at what we'd like CHICKEN 5 to be about.
Please, do not make this into another "pony page", only add things that
we really need to look at which require a rework in core which may be
backwards-incompatible.
http://wiki.call-cc.org/chicken-5-roadmap
I already fear I may have gone a bit overboard with adding too many
things I'd really like to see myself :)
I'd especially appreciate feedback on the core library names and
the things to kill from core. I will be expanding this page over the
next few days/weeks.
Thanks for writing that up, Peter.

I remember there is another breaking change that maybe we can fix in
CHICKEN 5: the behavior of set! on unbound variables.

I personally don't like that behavior. It can lead to some subtle and
hard to find bugs (specially for bad typists like me).

I'd expect the compiler to detect when set! is applied to an unbound
variable and raise an error (or at least a warning).

Should we consider this for CHICKEN 5?

Best wishes.
Mario
--
http://parenteses.org/mario
John Cowan
2014-08-26 17:30:45 UTC
Permalink
Post by Mario Domenech Goulart
I remember there is another breaking change that maybe we can fix in
CHICKEN 5: the behavior of set! on unbound variables.
R7RS takes an intermediate position between R5RS (allowed) and R6RS
(forbidden): it's allowed in the REPL, but forbidden in a module.
Post by Mario Domenech Goulart
I personally don't like that behavior. It can lead to some subtle and
hard to find bugs (specially for bad typists like me).
+1
--
John Cowan http://www.ccil.org/~cowan ***@ccil.org
Linguistics is arguably the most hotly contested property in the academic
realm. It is soaked with the blood of poets, theologians, philosophers,
philologists, psychologists, biologists and neurologists, along with
whatever blood can be got out of grammarians. - Russ Rymer
Arthur Maciel
2014-08-26 17:44:59 UTC
Permalink
Post by John Cowan
Post by Mario Domenech Goulart
I remember there is another breaking change that maybe we can fix in
CHICKEN 5: the behavior of set! on unbound variables.
R7RS takes an intermediate position between R5RS (allowed) and R6RS
(forbidden): it's allowed in the REPL, but forbidden in a module.
Post by Mario Domenech Goulart
I personally don't like that behavior. It can lead to some subtle and
hard to find bugs (specially for bad typists like me).
+1
+1
Oleg Kolosov
2014-08-26 18:15:27 UTC
Permalink
Post by Mario Domenech Goulart
I'd expect the compiler to detect when set! is applied to an unbound
variable and raise an error (or at least a warning).
I second that. I can easily remember few recent bugs in our Scheme
project related to this behaviour.

On the other hand, looking at the way CHICKEN does code expansion, this
might be really hard to implement. I hope I'm wrong.
--
Regards, Oleg
Oleg Kolosov
2014-08-29 18:50:31 UTC
Permalink
Post by Peter Bex
I've made a start on the wiki, at what we'd like CHICKEN 5 to be about.
I've remembered one more thing: why not stick the terminating '\0' at
the end of all strings in internal representation? This looks pretty
harmless but could make some common FFI uses a breeze.
--
Regards, Oleg
Peter Bex
2014-08-29 19:01:06 UTC
Permalink
Post by Oleg Kolosov
Post by Peter Bex
I've made a start on the wiki, at what we'd like CHICKEN 5 to be about.
I've remembered one more thing: why not stick the terminating '\0' at
the end of all strings in internal representation? This looks pretty
harmless but could make some common FFI uses a breeze.
We should only do that when the \0 is rejected up front inside strings.
Right now, \0 is allowed in a string and if you pass it to a C function,
it is detected and an exception is raised. Doing it with the current
system wouldn't buy us anything, and would just make potential misuse
more attractive, because a user would be tempted to just pass the
string's internal buffer directly to the C API "for performance".
This would then open up a can of worms containing plenty of potential
vulnerabilities.

Cheers,
Peter
--
http://www.more-magic.net
Arthur Maciel
2014-08-29 19:11:21 UTC
Permalink
Post by Peter Bex
Post by Oleg Kolosov
Post by Peter Bex
I've made a start on the wiki, at what we'd like CHICKEN 5 to be about.
I've remembered one more thing: why not stick the terminating '\0' at
the end of all strings in internal representation? This looks pretty
harmless but could make some common FFI uses a breeze.
We should only do that when the \0 is rejected up front inside strings.
Right now, \0 is allowed in a string and if you pass it to a C function,
it is detected and an exception is raised. Doing it with the current
system wouldn't buy us anything, and would just make potential misuse
more attractive, because a user would be tempted to just pass the
string's internal buffer directly to the C API "for performance".
This would then open up a can of worms containing plenty of potential
vulnerabilities.
Cheers,
Peter
--
http://www.more-magic.net
Peter, I remember you wrote about this on 2012, right?

http://www.more-magic.net/posts/lessons-learned-from-nul-byte-bugs.html
Peter Bex
2014-08-29 19:14:52 UTC
Permalink
Post by Arthur Maciel
Peter, I remember you wrote about this on 2012, right?
http://www.more-magic.net/posts/lessons-learned-from-nul-byte-bugs.html
Correct, I think this is an important safety feature of a high-level
language. In fact, I was the one to originally propose adding the
detection of embedded NUL bytes to CHICKEN in the first place :)

Cheers,
Peter
--
http://www.more-magic.net
Oleg Kolosov
2014-08-29 21:04:25 UTC
Permalink
Post by Peter Bex
Post by Oleg Kolosov
Post by Peter Bex
I've made a start on the wiki, at what we'd like CHICKEN 5 to be about.
I've remembered one more thing: why not stick the terminating '\0' at
the end of all strings in internal representation? This looks pretty
harmless but could make some common FFI uses a breeze.
We should only do that when the \0 is rejected up front inside strings.
Right now, \0 is allowed in a string and if you pass it to a C function,
it is detected and an exception is raised. Doing it with the current
system wouldn't buy us anything, and would just make potential misuse
more attractive, because a user would be tempted to just pass the
string's internal buffer directly to the C API "for performance".
This would then open up a can of worms containing plenty of potential
vulnerabilities.
I didn't know about automatic embedded null checks for c-string - it's a
nifty feature! Will try to dig it in the sources.

It is due to fact that the c-string-list is only valid as a return type
I've ended up copying contents of scheme-object (actually passed as a
list of strings) manually in C to construct argv argument for my
"improved" process-run replacement. Also it looks like CHICKEN's own
process-execute does exactly the same. I'm not sure if the automatic
null check happens in this case.

So, following the ideas from your blog post (thanks Arthur!), I guess we
should fix c-string-list handling to allow it as an argument to let the
CHICKEN do all the checks.

I still don't see how it prevents adding the terminating null. It may
actually increase safety because users will pass raw scheme objects to C
anyway "for performance". Yes, I'm guilty, doing some embedded work as
you know. Without nulls virtually nothing prevents pointers to run
far-far away from it's intended destination in case of simple mistakes.
--
Regards, Oleg
Oleg Kolosov
2014-09-09 15:39:09 UTC
Permalink
Post by Oleg Kolosov
Post by Peter Bex
I've made a start on the wiki, at what we'd like CHICKEN 5 to be about.
I've remembered one more thing: why not stick the terminating '\0' at
the end of all strings in internal representation? This looks pretty
harmless but could make some common FFI uses a breeze.
Don't, please.
a) This would make "some forms of abuse" "safe anyway". But internal
\0 would have to handled for correct programs anyway. Eventually one
ends up rewriting mostly working code at that point.
b) I don't recall the full story. But we had precisely this problem
once with RScheme (which does have this terminating \0). In the end
Donovan Kolbly (creator of RScheme) commented on the topic: "I was
young."
_______________________________________________
Chicken-hackers mailing list
https://lists.nongnu.org/mailman/listinfo/chicken-hackers
This might be entirely point of view issue, but I don't see the problem
in allowing some abuse to gain performance, with the assumption that the
user knows what he is doing. In our commercial system we have few
applications where CHICKEN's string handling was found as a bottleneck
(indexing large number of database entries, filtering lists of thousands
of file names, etc.). Equivalent pure C version is few times (sometimes
order of magnitude) faster with the same algorithms. Even doing heavy
lifting in C and just pushing data back to CHICKEN hurts performance a
lot (our GUI is written in Scheme).

So, all the convenience and safety of high level language is meaningless
because we ended up not using it for those applications.
--
Regards, Oleg Kolosov
Jörg F. Wittenberger
2014-09-10 10:39:15 UTC
Permalink
Post by Oleg Kolosov
Post by Oleg Kolosov
Post by Peter Bex
I've made a start on the wiki, at what we'd like CHICKEN 5 to be about.
I've remembered one more thing: why not stick the terminating '\0' at
the end of all strings in internal representation? This looks pretty
harmless but could make some common FFI uses a breeze.
Don't, please.
a) This would make "some forms of abuse" "safe anyway". But internal
\0 would have to handled for correct programs anyway. Eventually one
ends up rewriting mostly working code at that point.
b) I don't recall the full story. But we had precisely this problem
once with RScheme (which does have this terminating \0). In the end
Donovan Kolbly (creator of RScheme) commented on the topic: "I was
young."
_______________________________________________
Chicken-hackers mailing list
https://lists.nongnu.org/mailman/listinfo/chicken-hackers
This might be entirely point of view issue,
Probably it is.
Post by Oleg Kolosov
but I don't see the problem
in allowing some abuse to gain performance, with the assumption that the
user knows what he is doing.
In the case I cited the problem was that the assumption was at some
point built deep into the system. When we wanted to get it of the
terminating \0 for some reason, we could not with reasonable effort.
That's seen as bad here.
Post by Oleg Kolosov
In our commercial system we have few
applications where CHICKEN's string handling was found as a bottleneck
(indexing large number of database entries, filtering lists of thousands
of file names, etc.). Equivalent pure C version is few times (sometimes
order of magnitude) faster with the same algorithms. Even doing heavy
lifting in C and just pushing data back to CHICKEN hurts performance a
lot (our GUI is written in Scheme).
I'm avoiding chickens string handling as well, if I can.

Felix Winkelmann
2014-09-03 22:44:54 UTC
Permalink
Post by Peter Bex
I've made a start on the wiki, at what we'd like CHICKEN 5 to be about.
Please, do not make this into another "pony page", only add things that
we really need to look at which require a rework in core which may be
backwards-incompatible.
Hello, Peter!

I generally agree with most proposed changes on this list (with the
exception of the idea to drop "fluid-let", of course.) But it must be
clear to you that you already created a "pony" page. It is impossible
to do all of that, so before we go crazy with ideas, we should perhaps
get back to what we want to achieve with CHICKEN 5.

As I understand it, the idea is to decruftify (that is drop or eggify
library code), give proper names and modularize. This is all related
in one way or the other and looks like it can be done with the little
resources we have, especially considering that it will take ages until
only a reasonable subset of the existing eggs compiles and runs
properly in the new system.

* Designing a decent POSIX API is a hard task. I have not seen any
reasonably good API wrapper for that yet - they are either too
lowlevel (Basis, Ocaml, etc.), or too highlevel.

* Changing the string representation is much harder than you think
(quoting John: "If Chibi can do it, so can we" completely ignores
the fact that writing a string-representation implementation from
scratch is something vastly different than modifying an existing
one, one that is much older and much more widely used from
foreign/native code.)

* Numeric tower support: this is also hard, and will have a
considerable performance impact, needs changes in the compiler, in
all the icky C glue code and particularly in foreign code - which
means things will break all over the place in user code.

* Port-refactoring: again - basically a good idea, but tricky to
design, and may have a large performance impact, and the refactoring
will be work-intensive (all the direct peeking and poking in port
records needs to be localized and changed). This change should also
ideally be considered to be done in tandem with changing the string
representation.

* chicken-install/setup-files: a major and very important project on
its own. I started thinking about this some time ago, but didn't get
anywhere. Something very simple needs to be found that covers most
use cases, but this is something that needs input by many people
that have experience with the egg system and applicastions written
in CHICKEN. Perhaps we should plan to think about this the next time
some CHICKEN-hackers meet?

I _do_ think all the proposed changes make sense more or less, but
it's unrealistic to think that we achieve anything more than one or
two of the big parts.

A few more notes:

* I think John's idea of putting all the little SRFIs in a few (or a
single) module is better that splitting everything up into
modules. Having modules for each and everything looks nice on paper
but quickly gets old when you have to modify your module imports
every time you use a common but nonstandard language construct. I
understand that some people like this kind of bureaucracy, but
what's wrong with making things easier for the user?

* Please use long, explicit library names, it's easier to remember
("there are many ways to abbreviate something, but only one way not
to" - I forgot who said this, John will tell me, I'm sure.) And I
would also suggest to avoid using "srfi-XXX" as a module name, and
to use something meaningful (yes, I know that in the past I was
largely responsible for that mistake in numerous situations.) That
would also allow adding our own extensions.

* I can't resist to add a pony on my own: I fear that integrating the
R7RS syntax-rules cleanly and transparently inside an egg will be
tricky. What about changing syntax-rules to have R7RS semantics in
general? I'm not sure if I understand the differences well enough,
perhaps someone (Peter?) can comment on this.

So, in short: forget about unicode, the full numeric tower,
chicken-install, port-refactoring and everything but modularization,
the internal structure (and size!) and the necessary issues of doing a
major release (e.g. the question of how to integrate that with
henrietta.)

The major problem is that re-modularization will be the biggest
barrier in migrating user code. Once that is done we have a groundwork
for the really tricky things, and for smaller API changes that are
easier to detect via the module system.


felix
Oleg Kolosov
2014-09-04 05:29:22 UTC
Permalink
Post by Felix Winkelmann
The major problem is that re-modularization will be the biggest
barrier in migrating user code. Once that is done we have a groundwork
for the really tricky things, and for smaller API changes that are
easier to detect via the module system.
Hello Felix!

If you are still here, perhaps you can answer my questions.

How hard is it would be to add to chicken an option to output a list of
module dependencies: files which it will finally try to dload
considering all renaming, functors, multiple definitions or whatever
corner cases? Something like:

chicken -emit-depends depends.out -analyze-only module.scm

Build system can use this information to build files in proper order,
this also will fix parallel build issues.

I've implemented very crude parser which just extracts an arguments from
use, import, etc. declarations - it's not very generic solution but
helps immensely in our project which have 50+ interdependent modules and
people constantly move things around.

I think this feature will help to ease building of CHICKEN itself as the
number of modules grow.

Also a related problem. Why CHICKEN searches for an extensions in
repository only, import libraries also in the supplied -include-path's.
This behaviour does not make sense for projects with many modules
installed at once. Isn't it better to always search user supplied paths
first, and built-in last? Also with an option to disable built-in.
Similar to how C compilers handle <> includes. This will make crude
scripting in CHICKEN's own test suite unnecessary, among the other things.
--
Regards, Oleg
Felix Winkelmann
2014-09-04 19:14:25 UTC
Permalink
Post by Oleg Kolosov
How hard is it would be to add to chicken an option to output a list of
module dependencies: files which it will finally try to dload
considering all renaming, functors, multiple definitions or whatever
chicken -emit-depends depends.out -analyze-only module.scm
You can try the "-debug M" option, which lists files referenced via
"require" and its variants, including "uses" declarations. This does
currently not handle included files, but that should be easy to add.

Perhaps you can give it a try and suggest a more usable output format.
Or should it even output makefile-rules?
Post by Oleg Kolosov
Also a related problem. Why CHICKEN searches for an extensions in
repository only, import libraries also in the supplied -include-path's.
This behaviour does not make sense for projects with many modules
installed at once. Isn't it better to always search user supplied paths
first, and built-in last? Also with an option to disable built-in.
Similar to how C compilers handle <> includes. This will make crude
scripting in CHICKEN's own test suite unnecessary, among the other things.
Yes, this probably needs some clean up as well. I have to check the
code and will try to figure something out.


felix
Oleg Kolosov
2014-09-08 18:28:50 UTC
Permalink
Post by Felix Winkelmann
Post by Oleg Kolosov
How hard is it would be to add to chicken an option to output a list of
module dependencies: ...
You can try the "-debug M" option, which lists files referenced via
"require" and its variants, including "uses" declarations. This does
currently not handle included files, but that should be easy to add.
This option looks interesting, it event seems to differentiate between
built-in and installed modules, but AFAICT it requires import libraries
to be already in place to output something. So, it can not be used to
gather compilation dependencies.

What I really need is a way for a given name to tell if it's installed
extension module (egg or built-it) or some unknown (to assume it's
coming from currently compiling application).

With changed include-path behaviour to allow additional modules to be
found in directories separate from EGG_DIR it would make life much
easier for multi-module applications. Especially taking into account
cross-compile setups.
Post by Felix Winkelmann
Perhaps you can give it a try and suggest a more usable output format.
Or should it even output makefile-rules?
Please don't. Simple s-exps are fine and the easiest to handle.
--
Regards, Oleg
Felix Winkelmann
2014-09-08 20:24:23 UTC
Permalink
Post by Oleg Kolosov
This option looks interesting, it event seems to differentiate between
built-in and installed modules, but AFAICT it requires import libraries
to be already in place to output something. So, it can not be used to
gather compilation dependencies.
What I really need is a way for a given name to tell if it's installed
extension module (egg or built-it) or some unknown (to assume it's
coming from currently compiling application).
With changed include-path behaviour to allow additional modules to be
found in directories separate from EGG_DIR it would make life much
easier for multi-module applications. Especially taking into account
cross-compile setups.
I see. Hm, difficult. If the import-libs are missing the compile will
abort, catching this would require a special compilation
mode. However, detecting whether an import lib is installed can be
done via "extension-information".


felix
Peter Bex
2014-09-08 20:57:02 UTC
Permalink
Post by Felix Winkelmann
Hello, Peter!
I generally agree with most proposed changes on this list (with the
exception of the idea to drop "fluid-let", of course.) But it must be
clear to you that you already created a "pony" page. It is impossible
to do all of that, so before we go crazy with ideas, we should perhaps
get back to what we want to achieve with CHICKEN 5.
Yeah, I kind of had that "pony" impression while writing it up :)
Post by Felix Winkelmann
As I understand it, the idea is to decruftify (that is drop or eggify
library code), give proper names and modularize.
That's the main reason we're breaking back-compat, I think.
Post by Felix Winkelmann
This is all related
in one way or the other and looks like it can be done with the little
resources we have, especially considering that it will take ages until
only a reasonable subset of the existing eggs compiles and runs
properly in the new system.
* Designing a decent POSIX API is a hard task. I have not seen any
reasonably good API wrapper for that yet - they are either too
lowlevel (Basis, Ocaml, etc.), or too highlevel.
For now a modest refactoring would be enough.

[begin of short brain dump about the POSIX situation]

Putting things like, for example, "directory" in some other unit would
make more sense to me, because there's nothing inherently POSIXy in
reading the contents of a directory. (though the _implementation_
happens to rely on the C POSIX API, of course), and I think it belongs
with make-pathname and friends (ie, a "paths" or "files" module).

Ideally, there wouldn't be much left of the "posix" unit except some
deeply POSIXy things like fork, signal, fcntl, environment vars etc.
Probably this means the really high-level things move elsewhere.
In time, we might even move the POSIX unit out of core into an egg
and keep only truly "portable" (or essential) things in core. I'm
not sure what will happen to POSIX in the future, but I think its
hegemony will end sooner rather than later. the landscape is shifting
so quickly with these mobile devices (think Windows Phone, Firefox OS
but also the crippled POSIX support on iOS and Android), OS research
is slowly picking up again and the Linux crowd seems to be taking an
increasingly aggressive stance against "backwards compatibility" (think
Wayland, systemd etc).

So, I'm not against any POSIX support, but relying too much on it in
core itself is probably a mistake in the (very) long run.

[end of braindump]
Post by Felix Winkelmann
* Changing the string representation is much harder than you think
(quoting John: "If Chibi can do it, so can we" completely ignores
the fact that writing a string-representation implementation from
scratch is something vastly different than modifying an existing
one, one that is much older and much more widely used from
foreign/native code.)
Agreed. Recall that my suggestion was simply to "bless" UTF-8 as the
canonical internal representation (which is the case, de facto, anyway)
and *maybe* adding some detection code to reject invalid sequences rather
than just continuing with bogus data. Possibly making the default
string ops the ones from the UTF-8 egg. Anything beyond that is
overkill and I would definitely not support changing the encoding in
this effort.

Of course if someone sent in a patch, that might change my mind...
but that's just wishful "pony" thinking ;)
Post by Felix Winkelmann
* Numeric tower support: this is also hard, and will have a
considerable performance impact, needs changes in the compiler, in
all the icky C glue code and particularly in foreign code - which
means things will break all over the place in user code.
There is strong support from the community to do this, and I'm willing
to put in the required effort. I feel very strongly about adding at
least bignum support to core. I don't care as much about ratnums and
I don't care at all about compnums, but it may be simpler to add them;
the code to support them too is relatively straightforward.

Not having bignums in core causes too much headache:
- When dealing with foreign procedures returning full-width 64-bit
integers, as those simply cannot be fully represented by flonums.
- Having bignums be external to the core causes a lot of headaches when
one generates them and passes them to some library. For instance,
storing very large numbers in a database is perfectly sane and
generally possible with the DECIMAL type, but this requires all the
database eggs to pull in the numbers egg, which they currently don't.
In short, the numbers egg is "contagious".
- There are several hard to fix bugs that become trivial once bignums
are supported: #1096, #1000, #1139, #823. There have been other
such problems.
- Also, it confuses the newbies :)

If I don't make it before all the other things have been taken care of,
feel free to release CHICKEN 5 without it.
Post by Felix Winkelmann
* Port-refactoring: again - basically a good idea, but tricky to
design, and may have a large performance impact, and the refactoring
will be work-intensive (all the direct peeking and poking in port
records needs to be localized and changed). This change should also
ideally be considered to be done in tandem with changing the string
representation.
Here too, a modest change would be enough. Just using a proper
struct/record type would make later refactorings easier. The best
part is that the performance impact of adding an offset to the write
buffer is a positive one. But if we won't be able to make this work,
I won't be too sad, I promise ;)

We don't have to make a perfect design, just one that scales better
with future changes. I was thinking to make the constructors accept
keyword arguments, so that we can later add things (like position
setting etc) without breaking existing programs.
Post by Felix Winkelmann
* chicken-install/setup-files: a major and very important project on
its own. I started thinking about this some time ago, but didn't get
anywhere. Something very simple needs to be found that covers most
use cases, but this is something that needs input by many people
that have experience with the egg system and applicastions written
in CHICKEN. Perhaps we should plan to think about this the next time
some CHICKEN-hackers meet?
Sounds like a good plan. I also think this one may be too difficult and
too much work to do it for CHICKEN 5.0 unless lots of people chip in.
Post by Felix Winkelmann
I _do_ think all the proposed changes make sense more or less, but
it's unrealistic to think that we achieve anything more than one or
two of the big parts.
Agreed. I'll put in some extra effort this week to get the numbers
egg in good shape for importing it into core, and maybe try to get
started on a core patch.
Post by Felix Winkelmann
* I think John's idea of putting all the little SRFIs in a few (or a
single) module is better that splitting everything up into
modules. Having modules for each and everything looks nice on paper
but quickly gets old when you have to modify your module imports
every time you use a common but nonstandard language construct. I
understand that some people like this kind of bureaucracy, but
what's wrong with making things easier for the user?
Yeah, I said much the same at the start of the section about SRFIs.
However, I think it *does* make it easier for the user to _also_ offer
the SRFI libraries separately. There's already a hacky workaround for
require-extension's builtin-features in eval.scm so that you can say,
for example, (require-extension (srfi 2)), so I think it makes sense to
also provide "full" library declarations, to make it simpler to use and
write portable R7RS programs.

Note that this does not mean this needs to be the only library to export
said SRFI procedures!
Post by Felix Winkelmann
* Please use long, explicit library names, it's easier to remember
("there are many ways to abbreviate something, but only one way not
to" - I forgot who said this, John will tell me, I'm sure.) And I
would also suggest to avoid using "srfi-XXX" as a module name, and
to use something meaningful (yes, I know that in the past I was
largely responsible for that mistake in numerous situations.) That
would also allow adding our own extensions.
For portability, I prefer at least also allowing the srfi numbers.
But yes, long names are good. However, there will be so few SRFIs
that will still be left as part of core that it makes very little
sense to rename the existing SRFIs, except when grouping several
constructs together.
Post by Felix Winkelmann
* I can't resist to add a pony on my own: I fear that integrating the
R7RS syntax-rules cleanly and transparently inside an egg will be
tricky. What about changing syntax-rules to have R7RS semantics in
general? I'm not sure if I understand the differences well enough,
perhaps someone (Peter?) can comment on this.
I think we already did the important bits (ellipsis identifiers and
tail patterns - ie, SRFI-46). There are two more changes, AFAIK:

- The "new" syntax-rules foolishly changed the underscore to act as
a wildcard symbol, making it - strictly speaking - incompatible with
R5RS. I don't think it's a good idea to support this in core.
- For no good reason, R7RS syntax-rules allows not only renaming
ellipsis identifiers, but also quoting them (which I think is
a bit ugly). I *think* this is entirely backwards compatible,
so we could add that to core.

This is easily put in the R7RS egg, though. Remember, any use of
syntax-rules simply expands into one big ER macro transformer, and it
is a completely self-contained file which may be taken and copied into
the R7RS egg, and tweaked there to support these two cases. But it could
be simpler to do as a simple preprocessor which generates a "core"
syntax-rules expansion.
Post by Felix Winkelmann
So, in short: forget about unicode, the full numeric tower,
chicken-install, port-refactoring and everything but modularization,
the internal structure (and size!) and the necessary issues of doing a
major release (e.g. the question of how to integrate that with
henrietta.)
I think we can do minor things to make changing things in a backwards-
compatible way. These are important to postpone the need to "break the
world" a third time as much as possible.

I'd really like to hear other people's ideas about what would be the
best way to integrate the changes with Henrietta. Personally, I think
the easiest way is to simply deploy a second copy of henrietta which
reads from a different cache, populated by a second henrietta-cache cron
job which reads from a different master list.
Post by Felix Winkelmann
The major problem is that re-modularization will be the biggest
barrier in migrating user code. Once that is done we have a groundwork
for the really tricky things, and for smaller API changes that are
easier to detect via the module system.
Agreed. How will we attack the problem of bootstrapping? We will make
some breaking changes which might mean CHICKEN 4 may be unable to
bootstrap the CHICKEN-5-in-progress at some point. Now that we're on
a separate branch we can't really release snapshots in the 4.9.x series.
Maybe fall back to a simple date, or git hash versioning scheme for the
time being? We don't need to make them public "official" releases of
course. I just don't know how well our infrastructure will cope with
a different naming strategy. Should we do this by hand?

Cheers,
Peter
--
http://www.more-magic.net
Felix Winkelmann
2014-09-09 10:05:09 UTC
Permalink
Post by Peter Bex
Post by Felix Winkelmann
* Designing a decent POSIX API is a hard task. I have not seen any
reasonably good API wrapper for that yet - they are either too
lowlevel (Basis, Ocaml, etc.), or too highlevel.
For now a modest refactoring would be enough.
[begin of short brain dump about the POSIX situation]
Putting things like, for example, "directory" in some other unit would
make more sense to me, because there's nothing inherently POSIXy in
reading the contents of a directory. (though the _implementation_
happens to rely on the C POSIX API, of course), and I think it belongs
with make-pathname and friends (ie, a "paths" or "files" module).
Ideally, there wouldn't be much left of the "posix" unit except some
deeply POSIXy things like fork, signal, fcntl, environment vars etc.
Probably this means the really high-level things move elsewhere.
In time, we might even move the POSIX unit out of core into an egg
and keep only truly "portable" (or essential) things in core. I'm
not sure what will happen to POSIX in the future, but I think its
hegemony will end sooner rather than later. the landscape is shifting
so quickly with these mobile devices (think Windows Phone, Firefox OS
but also the crippled POSIX support on iOS and Android), OS research
is slowly picking up again and the Linux crowd seems to be taking an
increasingly aggressive stance against "backwards compatibility" (think
Wayland, systemd etc).
Quite true.
Post by Peter Bex
Post by Felix Winkelmann
* Changing the string representation is much harder than you think
(quoting John: "If Chibi can do it, so can we" completely ignores
the fact that writing a string-representation implementation from
scratch is something vastly different than modifying an existing
one, one that is much older and much more widely used from
foreign/native code.)
Agreed. Recall that my suggestion was simply to "bless" UTF-8 as the
canonical internal representation (which is the case, de facto, anyway)
and *maybe* adding some detection code to reject invalid sequences rather
than just continuing with bogus data. Possibly making the default
string ops the ones from the UTF-8 egg. Anything beyond that is
overkill and I would definitely not support changing the encoding in
this effort.
I basically agree, but please note that UTF8-aware string-mutation
would have to invole "become!", which is very inefficient.
Post by Peter Bex
Post by Felix Winkelmann
* Numeric tower support: this is also hard, and will have a
considerable performance impact, needs changes in the compiler, in
all the icky C glue code and particularly in foreign code - which
means things will break all over the place in user code.
There is strong support from the community to do this, and I'm willing
to put in the required effort. I feel very strongly about adding at
least bignum support to core. I don't care as much about ratnums and
I don't care at all about compnums, but it may be simpler to add them;
the code to support them too is relatively straightforward.
Again, you are basically right, but currently we can distinguish
between numeric types by testing a single bit. Any additional numeric
type will have a performance cost. We also will need a C API to access
bignums, and it's not clear to me how to handle bignums being passed
to foreign functions (simply throw an error?) Many ugly issues are
hiding in the details.
Post by Peter Bex
Post by Felix Winkelmann
* Port-refactoring: again - basically a good idea, but tricky to
design, and may have a large performance impact, and the refactoring
will be work-intensive (all the direct peeking and poking in port
records needs to be localized and changed). This change should also
ideally be considered to be done in tandem with changing the string
representation.
Here too, a modest change would be enough. Just using a proper
struct/record type would make later refactorings easier. The best
part is that the performance impact of adding an offset to the write
buffer is a positive one. But if we won't be able to make this work,
I won't be too sad, I promise ;)
Ok, that sounds reaonable.
Post by Peter Bex
Post by Felix Winkelmann
* I think John's idea of putting all the little SRFIs in a few (or a
single) module is better that splitting everything up into
modules. Having modules for each and everything looks nice on paper
but quickly gets old when you have to modify your module imports
every time you use a common but nonstandard language construct. I
understand that some people like this kind of bureaucracy, but
what's wrong with making things easier for the user?
Yeah, I said much the same at the start of the section about SRFIs.
However, I think it *does* make it easier for the user to _also_ offer
the SRFI libraries separately. There's already a hacky workaround for
require-extension's builtin-features in eval.scm so that you can say,
for example, (require-extension (srfi 2)), so I think it makes sense to
also provide "full" library declarations, to make it simpler to use and
write portable R7RS programs.
Note that this does not mean this needs to be the only library to export
said SRFI procedures!
Ah, I see. So you mean that we provide multiple modules, then?
Post by Peter Bex
Post by Felix Winkelmann
* Please use long, explicit library names, it's easier to remember
("there are many ways to abbreviate something, but only one way not
to" - I forgot who said this, John will tell me, I'm sure.) And I
would also suggest to avoid using "srfi-XXX" as a module name, and
to use something meaningful (yes, I know that in the past I was
largely responsible for that mistake in numerous situations.) That
would also allow adding our own extensions.
For portability, I prefer at least also allowing the srfi numbers.
But yes, long names are good. However, there will be so few SRFIs
that will still be left as part of core that it makes very little
sense to rename the existing SRFIs, except when grouping several
constructs together.
Ok.
Post by Peter Bex
Post by Felix Winkelmann
* I can't resist to add a pony on my own: I fear that integrating the
R7RS syntax-rules cleanly and transparently inside an egg will be
tricky. What about changing syntax-rules to have R7RS semantics in
general? I'm not sure if I understand the differences well enough,
perhaps someone (Peter?) can comment on this.
I think we already did the important bits (ellipsis identifiers and
- The "new" syntax-rules foolishly changed the underscore to act as
a wildcard symbol, making it - strictly speaking - incompatible with
R5RS. I don't think it's a good idea to support this in core.
- For no good reason, R7RS syntax-rules allows not only renaming
ellipsis identifiers, but also quoting them (which I think is
a bit ugly). I *think* this is entirely backwards compatible,
so we could add that to core.
This is easily put in the R7RS egg, though. Remember, any use of
syntax-rules simply expands into one big ER macro transformer, and it
is a completely self-contained file which may be taken and copied into
the R7RS egg, and tweaked there to support these two cases. But it could
be simpler to do as a simple preprocessor which generates a "core"
syntax-rules expansion.
IIRC, full R7RS-compatibility requires "(import-for-syntax (r7rs))" or
something like this. I was wondering about that, since it would be
quite a barrier for portable code to have to take care of this. Or
can we simply make this implicit in the "define-library" macro?

Or is the incompatibility small enough to be ignored?
Post by Peter Bex
I'd really like to hear other people's ideas about what would be the
best way to integrate the changes with Henrietta. Personally, I think
the easiest way is to simply deploy a second copy of henrietta which
reads from a different cache, populated by a second henrietta-cache cron
job which reads from a different master list.
Perhaps also add an entry to egg's "release-info" file for which major
version(s) this release applies?
Post by Peter Bex
Agreed. How will we attack the problem of bootstrapping? We will make
some breaking changes which might mean CHICKEN 4 may be unable to
bootstrap the CHICKEN-5-in-progress at some point. Now that we're on
a separate branch we can't really release snapshots in the 4.9.x series.
Maybe fall back to a simple date, or git hash versioning scheme for the
time being? We don't need to make them public "official" releases of
course. I just don't know how well our infrastructure will cope with
a different naming strategy. Should we do this by hand?
We can at least tag commits in the repository that are known to work
and we should try to avoid doing such changes as much as possible. If
refactoring and cleanup is the major issue for CHICKEN 5, then I see
no problem here (yet). When it comes to things like FFI-barriers and
internal representation, then we can worry about this. But we have to
see how we get along, it's hard to tell in advance into what issues we
run. Any "major" change should have a verified bootstrap build, from a
known state, completely from sources, ideally from a 4.9
"chicken-boot".


felix
Alex Shinn
2014-09-09 12:20:28 UTC
Permalink
On Tue, Sep 9, 2014 at 7:05 PM, Felix Winkelmann <
Post by Felix Winkelmann
IIRC, full R7RS-compatibility requires "(import-for-syntax (r7rs))" or
something like this. I was wondering about that, since it would be
quite a barrier for portable code to have to take care of this. Or
can we simply make this implicit in the "define-library" macro?
There's no import-for-syntax in R7RS small. In fact,
there's nothing related to phasing at all - since only
syntax-rules is supported, there are trivially no phasing
issues.

The only potentially controversial decision wrt macros
is that we did not provide the equivalent of Chicken's

(export (syntax: macro-id ids-expanded-to ...))

instead requiring the macro expander simply detect
such indirect references. It's not all that commonly
needed, so if it's difficult for Chicken to do this it can
be considered a low-priority todo. Authors of libraries
which want to use this idiom can always cond-expand
to provide the additional exports explicitly for Chicken.
--
Alex
John Cowan
2014-09-09 15:06:57 UTC
Permalink
Post by Felix Winkelmann
I basically agree, but please note that UTF8-aware string-mutation
would have to invole "become!", which is very inefficient.
True, but it is also very rare in core: only 8 uses, most of which
could be easily removed. (I exclude of course the
SRFI-13 and SRFI-14 libraries themselves.)
Post by Felix Winkelmann
Again, you are basically right, but currently we can distinguish
between numeric types by testing a single bit. Any additional numeric
type will have a performance cost. We also will need a C API to access
bignums, and it's not clear to me how to handle bignums being passed
to foreign functions (simply throw an error?) Many ugly issues are
hiding in the details.
Note my previous post on bignums vs. biggernums. If there is also a
way to pass arbitrary exact integers as strings, that should do it.
Post by Felix Winkelmann
Post by Felix Winkelmann
* Please use long, explicit library names, it's easier to remember
("there are many ways to abbreviate something, but only one way not
to" - I forgot who said this, John will tell me, I'm sure.)
It's the .sig of David B. Lamkins, but whether he invented it, he doesn't say.
--
John Cowan http://www.ccil.org/~cowan ***@ccil.org
Ambassador Trentino: I've said enough. I'm a man of few words.
Rufus T. Firefly: I'm a man of one word: scram! --Duck Soup
Felix Winkelmann
2014-09-09 20:17:02 UTC
Permalink
Post by John Cowan
Post by Felix Winkelmann
* Please use long, explicit library names, it's easier to remember
("there are many ways to abbreviate something, but only one way not
to" - I forgot who said this, John will tell me, I'm sure.)
It's the .sig of David B. Lamkins, but whether he invented it, he doesn't say.
Thanks for clearing that up. I knew that your encyclopedic brain can
be relied on. I darkly recall having read it in CLTL2 by Guy Steele,
but wasn't sure where he got it from.


felix
John Cowan
2014-09-09 20:25:23 UTC
Permalink
Post by Felix Winkelmann
Thanks for clearing that up. I knew that your encyclopedic brain can
be relied on. I darkly recall having read it in CLTL2 by Guy Steele,
but wasn't sure where he got it from.
In this case it's Dr. Google who can be relied upon.
--
John Cowan http://www.ccil.org/~cowan ***@ccil.org
Work hard / play hard, ***@ccil.org
die young / rot quickly.
Alex Shinn
2014-09-09 11:02:38 UTC
Permalink
Post by Peter Bex
- The "new" syntax-rules foolishly changed the underscore to act as
a wildcard symbol, making it - strictly speaking - incompatible with
R5RS. I don't think it's a good idea to support this in core.
I think this has been discussed here before, but it's R6RS
which is incompatible and breaks many macros here. R7RS
"fixed" this for basically all realistic cases by requiring that
using _ as a literal overrides the wildcard behavior.

- For no good reason, R7RS syntax-rules allows not only renaming
Post by Peter Bex
ellipsis identifiers, but also quoting them (which I think is
a bit ugly). I *think* this is entirely backwards compatible,
so we could add that to core.
The ellipsis renaming (and also already supported by Chicken tail
patterns) are just Taylor Campbell's SRFI 46. It makes many
macros more readable than (... <x>) escaping. The latter, however,
is very widely supported. Probably providing both forms of
escape was overkill, but these are easy to implement and entirely
backwards compatible.
--
Alex
Mario Domenech Goulart
2014-09-09 14:22:23 UTC
Permalink
Hi gentlemen,
Post by Peter Bex
Post by Felix Winkelmann
So, in short: forget about unicode, the full numeric tower,
chicken-install, port-refactoring and everything but modularization,
the internal structure (and size!) and the necessary issues of doing a
major release (e.g. the question of how to integrate that with
henrietta.)
I think we can do minor things to make changing things in a backwards-
compatible way. These are important to postpone the need to "break the
world" a third time as much as possible.
I'd really like to hear other people's ideas about what would be the
best way to integrate the changes with Henrietta. Personally, I think
the easiest way is to simply deploy a second copy of henrietta which
reads from a different cache, populated by a second henrietta-cache cron
job which reads from a different master list.
I think that approach sounds good. We'd have /release/5/egg-locations
in svn and a new henrieta-cache instance would use it to populate a
separate cache. We'd need a CHICKEN 5-specific henrietta providing an
URI for CHICKEN 5 eggs (chicken-install 5 would use that URI).

What is not clear to me is how to manage the egg repositories to work
with both CHICKEN 4 and 5. We'll certainly have to maintain egg code
for CHICKEN 4 until CHICKEN 5 really catches. During this period, how
will egg authors manage their egg repositories? How will egg authors
specify "this egg version is for CHICKEN N" in the source repository (by
branching, I suppose). And how to manage versions?

Will we be able to get along with changes by simply cond-expand'ing
against chicken-5?

Assuming egg authors won't create new repositories for their eggs for
CHICKEN 5, the henrietta-cache instance for CHICKEN 5 would still cache
versions that are only compatible with CHICKEN 4, although they will be
only meant to be used by CHICKEN 5. I think that's not a really serious
issue, but I thought it should be mentioned, just in case somebody tries
"chicken-install5 <egg>:<version-for-chicken-4>". We can work around
that by providing .release-info files specific to CHICKEN 5 (e.g., in a
chicken-5 branch), containing the versions compatible with CHICKEN 5
only. Maybe there are better options?

Best wishes.
Mario
--
http://parenteses.org/mario
Felix Winkelmann
2014-09-09 14:29:21 UTC
Permalink
Post by Mario Domenech Goulart
Assuming egg authors won't create new repositories for their eggs for
CHICKEN 5, the henrietta-cache instance for CHICKEN 5 would still cache
versions that are only compatible with CHICKEN 4, although they will be
only meant to be used by CHICKEN 5. I think that's not a really serious
issue, but I thought it should be mentioned, just in case somebody tries
"chicken-install5 <egg>:<version-for-chicken-4>". We can work around
that by providing .release-info files specific to CHICKEN 5 (e.g., in a
chicken-5 branch), containing the versions compatible with CHICKEN 5
only. Maybe there are better options?
Well, the problem are eggs that are not in our central repository. There
must be a way to mark for which versions an egg is intended. That's why
I propose another entry in the .release-info files.


felix
Mario Domenech Goulart
2014-09-09 14:49:43 UTC
Permalink
Post by Felix Winkelmann
Post by Mario Domenech Goulart
Assuming egg authors won't create new repositories for their eggs for
CHICKEN 5, the henrietta-cache instance for CHICKEN 5 would still cache
versions that are only compatible with CHICKEN 4, although they will be
only meant to be used by CHICKEN 5. I think that's not a really serious
issue, but I thought it should be mentioned, just in case somebody tries
"chicken-install5 <egg>:<version-for-chicken-4>". We can work around
that by providing .release-info files specific to CHICKEN 5 (e.g., in a
chicken-5 branch), containing the versions compatible with CHICKEN 5
only. Maybe there are better options?
Well, the problem are eggs that are not in our central repository.
I think the versioning thing also affects the eggs in svn too. For
example, suppose right now you have egg foo at version 1.0. When
CHICKEN 5 is released, foo will be ported to CHICKEN 5. What version
will it be in CHICKEN 4 and 5?

After we have the egg for both CHICKEN 4 and 5, how to manage new
versions?

I'm quite sure we have a solution for this problem, which is basically
the same as we had in the CHICKEN 3->4 transition. Unfortunately, since
I was the last person I know to switch to CHICKEN 4, I didn't actually
faced that problem with my eggs. Since nobody else was using CHICKEN 3,
I just froze egg versions for that CHICKEN version and continued to
update eggs for CHICKEN 4 only.
Post by Felix Winkelmann
There must be a way to mark for which versions an egg is
intended. That's why I propose another entry in the .release-info
files.
Another alternative would be using another .release-info file. That
approach wouldn't require changes in henrietta-cache.

For example, we'd have

https://raw.github.com/mario-goulart/awful/master/awful.release-info

for CHICKEN 4, and

https://raw.github.com/mario-goulart/awful/chicken-5/awful.release-info

or

https://raw.github.com/mario-goulart/awful/master/awful.release-info.5

for CHICKEN 5. Of course, other variations are possible.

It'd be just a matter of pointing to a CHICKEN 5-specific
.release-info.

Best wishes.
Mario
--
http://parenteses.org/mario
Felix Winkelmann
2014-09-09 20:08:53 UTC
Permalink
Post by Mario Domenech Goulart
Post by Felix Winkelmann
Well, the problem are eggs that are not in our central repository.
I think the versioning thing also affects the eggs in svn too. For
example, suppose right now you have egg foo at version 1.0. When
CHICKEN 5 is released, foo will be ported to CHICKEN 5. What version
will it be in CHICKEN 4 and 5?
But we already have branches in the egg repository, and we can take
care of that ourselves. But we should put to much burden on those that
provide eggs in their own repositories, by forcing them to maintain
multiple branches.
Post by Mario Domenech Goulart
https://raw.github.com/mario-goulart/awful/master/awful.release-info
for CHICKEN 4, and
https://raw.github.com/mario-goulart/awful/chicken-5/awful.release-info
or
https://raw.github.com/mario-goulart/awful/master/awful.release-info.5
for CHICKEN 5. Of course, other variations are possible.
It'd be just a matter of pointing to a CHICKEN 5-specific
.release-info.
That might be possible, yes.


felix
Peter Bex
2014-09-10 09:11:38 UTC
Permalink
Post by Mario Domenech Goulart
I think the versioning thing also affects the eggs in svn too. For
example, suppose right now you have egg foo at version 1.0. When
CHICKEN 5 is released, foo will be ported to CHICKEN 5. What version
will it be in CHICKEN 4 and 5?
I wrote up a little about that in the "roadmap" document. I think the
simplest is to just bump the major version of an egg when porting it.
Then, bugfix egg releases can still be made for CHICKEN 4 by bumping only
the minor version.
Post by Mario Domenech Goulart
I'm quite sure we have a solution for this problem, which is basically
the same as we had in the CHICKEN 3->4 transition. Unfortunately, since
I was the last person I know to switch to CHICKEN 4, I didn't actually
faced that problem with my eggs. Since nobody else was using CHICKEN 3,
I just froze egg versions for that CHICKEN version and continued to
update eggs for CHICKEN 4 only.
I think most authors won't have enough energy to maintain two
simultaneous egg release branches anyway, aside from important bugfixes.
I suspect most authors will either switch immediately and ignore
CHICKEN 4 (except for important bugfixes, of course), or keep using
CHICKEN 4 for a while longer like you did, and then switch later.

Only for those who seriously want to maintain eggs for a very long time
(making large changes and making major releases on both branches) would
this pose a real problem.
Post by Mario Domenech Goulart
Another alternative would be using another .release-info file. That
approach wouldn't require changes in henrietta-cache.
For example, we'd have
https://raw.github.com/mario-goulart/awful/master/awful.release-info
for CHICKEN 4, and
https://raw.github.com/mario-goulart/awful/chicken-5/awful.release-info
or
https://raw.github.com/mario-goulart/awful/master/awful.release-info.5
for CHICKEN 5. Of course, other variations are possible.
I also mentioned this in the roadmap. It's also possible to move the
chicken 4 stuff to another branch, but that would require some
coordination to update the master egg list. The biggest problem with
creating a branch for CHICKEN 5 is that the master branch would not be
where development takes place, which is very unusual.

So far I prefer the awful.release-info.5 alternative. The existing "old"
release-info file would continue to be correct (and can even be updated
for newer CHICKEN 4 releases, which can be tags in another branch), and
development continues in master. The only confusing bit is that when
development continues on a CHICKEN 4 branch, the CHICKEN 4 release-info
file in the _master_ branch needs to be updated (unless we update the
master egg list, of course). This has a big potential for mistakes.

Alternatively, one could initialize a new repo for the CHICKEN 5 version,
but that's even uglier, I think.
Post by Mario Domenech Goulart
It'd be just a matter of pointing to a CHICKEN 5-specific
.release-info.
Yeah, it can be really simple :)

Cheers,
Peter
--
http://www.more-magic.net
John Cowan
2014-09-09 14:58:45 UTC
Permalink
Post by Peter Bex
Post by Felix Winkelmann
* Designing a decent POSIX API is a hard task. I have not seen any
reasonably good API wrapper for that yet - they are either too
lowlevel (Basis, Ocaml, etc.), or too highlevel.
Part of the problem is that Posix is huge. There are 1191 system interfaces
in 82 modules, in the 2013 version. Some of these are language-specific,
like setjmp/longjmp, but most are not. Any Posix binding (other than ones
that involve invoking arbitrary C code directly) has to be selective, and
I have never seen any principles set forth for selecting them.
Post by Peter Bex
Putting things like, for example, "directory" in some other unit would
make more sense to me, because there's nothing inherently POSIXy in
reading the contents of a directory.
+1
Post by Peter Bex
I'm not sure what will happen to POSIX in the future, but I think
its hegemony will end sooner rather than later. the landscape is
shifting so quickly with these mobile devices (think Windows Phone,
Firefox OS but also the crippled POSIX support on iOS and Android),
I think "crippled" is an exaggeration, at least for Android.
There is no Sys V IPC, but that's always been a marginal
feature for most users. Likewise, there is no pthread_cancel,
but arguably canceling a thread from outside the thread
is a design mistake anyway, except perhaps for a debugger. See
<http://docs.oracle.com/javase/1.5.0/docs/guide/misc/threadPrimitiveDeprecation.html>
for why Java doesn't have it.
Post by Peter Bex
the Linux crowd seems to be taking an increasingly aggressive stance
against "backwards compatibility" (think Wayland, systemd etc).
But these don't affect POSIX system interface conformance. Linux is
almost 100% conformant and I don't see it ceasing to be so.
Post by Peter Bex
Agreed. Recall that my suggestion was simply to "bless" UTF-8 as the
canonical internal representation (which is the case, de facto, anyway)
That is what I meant also, and perhaps adding the Chibi string-cursor API
for people who need more efficiency.
Post by Peter Bex
and *maybe* adding some detection code to reject invalid sequences
rather than just continuing with bogus data.
That doesn't concern me too much.
Post by Peter Bex
Possibly making the default string ops the ones from the UTF-8 egg.
+1 to that.
Post by Peter Bex
Anything beyond that is overkill [...]
+1
Post by Peter Bex
There is strong support from the community to do this, and I'm willing
to put in the required effort. I feel very strongly about adding at
least bignum support to core. I don't care as much about ratnums and
I don't care at all about compnums, but it may be simpler to add them;
the code to support them too is relatively straightforward.
+1 to bignums, +0 to the rest.
Post by Peter Bex
- When dealing with foreign procedures returning full-width 64-bit
integers, as those simply cannot be fully represented by flonums.
Right. Bignums that fit in 64 bits should be properly marshaled and
unmarshaled by the core. Biggernums that don't should provoke errors
at the FFI interface, in the same way (and for the same reasons)
that strings containing NUL do.
Post by Peter Bex
- Having bignums be external to the core causes a lot of headaches when
one generates them and passes them to some library. For instance,
storing very large numbers in a database is perfectly sane and
generally possible with the DECIMAL type, but this requires all the
database eggs to pull in the numbers egg, which they currently don't.
In short, the numbers egg is "contagious".
Long ago I proposed to Felix an implementation of the numbers egg using
run-time hooks rather than modular renaming of procedures, such that +
would always go through a hook which would invoke either the fixnum-flonum
definition in core, or else the definition in any egg that had loaded
itself into this hook (with the understanding that such eggs don't compose).

I don't think this would play well with types, though.
--
John Cowan http://www.ccil.org/~cowan ***@ccil.org
Income tax, if I may be pardoned for saying so, is a tax on income.
--Lord Macnaghten (1901)
Felix Winkelmann
2014-09-09 20:14:57 UTC
Permalink
Post by John Cowan
Long ago I proposed to Felix an implementation of the numbers egg using
run-time hooks rather than modular renaming of procedures, such that +
would always go through a hook which would invoke either the fixnum-flonum
definition in core, or else the definition in any egg that had loaded
itself into this hook (with the understanding that such eggs don't compose).
That is basically te right approach, but will be terribly inefficient,
and I mean _really_ inefficient. Any attempt to get fast code must
somehow compile to primitive arithmetic instructions, integer or
float. We do "reasonably" well right now, but still need better
performance (i.e. unboxed flonum operations). If a tight loop with an
integer counter needs an extra out-of-line (or even CPS)
procedure-call, we lose heavily.


felix
Loading...