This Question is Assumed Answered

1 "correct" answer available (5 pts) 15 "helpful" answers available (3 pts)
1 Replies Last post: Feb 15, 2008 6:05 PM by Duke

Dealing with unexpected char encodings

Feb 15, 2008 12:26 PM

Click to view rshiplett's profile MVP rshiplett 36 posts since
Oct 17, 2007

In my computer hobby (flight simulators) I pull down a lot of zipped files which means folks send me their thumbs.db files and all there .-files

For simulators that run on Mac and linux as well as Windows, the uninvited files just mean routine cleanup or deal with protests from Win32 EXPLORER whenever you move a directory (real common when updating aircraft, scenery, cockpits etc.)

In CURL the unexpected usually comes to you as an SCURL file which, when you go to save it under Windows, suddenly expects you to choose a character encoding for its UNICODE content. You try Windows Latin-1 but it is rejected. So you can use UTF-8 and ignore this or you can ask why. I ask why.

At the moment the only way I know to get an answer is CTRL-A CTRL-C and then to paste the contents of that SCURL into a DOS editor and save as DOS 7-bit ASCII.

Then I delete the contents of the SCURL with a CTRL-X and SAVE with CTRL-S then paste the 'clean' text into the editor and now I can save without being forced to choose a decoder. Which means that now I can choose my encoding.

When we create a new SCURL in the IDE, we are being offered a char-encoding and that is great, Offered is not forced. Mine comes up as

{ curl-file-attributes character-encoding = "windows-latin-1" }

So what will you find if you browse a suspect file? Often nothing. The offending codes may be non-printing. They may have been left over as 'blanks' at the end of a revised but untrimmed string comment.

Often they are evil. Sometimes they are silly. I rank as 'evil' those magic delimiters which were considered 'non-printing' characters and so used as delimiters of data fields.

Had they been declared constants, they could at least have been in a commented file explaining the need for the bizarre encoding being forced on you.

The `silly` are double quotes which ain`t.. If you are lucky they appear as strong, thicker, almost bold quotes. They are opening and closing quotes which your font is displaying as pairs of ticks instead of left-handed and right-handed twin quotes.

Ruby is worse. It has $` and $' as descendents of the regexp $~ and Ruby strings are sequences of 8-bit chunks ( or they once were)

To see the advantage of CURL, you might need to look at what is required to move a language which was not UNICODE from the start, to being UNICODE.

Recently the MERCURY project (http://www.cs.mu.oz.au/research/mercury a typed Prolog rather like commercial PDC Visual Prolog) retracted any impression of being UNICODE. The group maintaining SWI-PROLOG with Jan W. at http://www.swi-prolog.org/ has recently gone through a major effort as perusal of their mail list would reveal. REBOL3 is now in alpha as it makes the major move to UNICODE ( I follow the 'frontline' blog of Carl Sassenrath at Rebol.net)

What I wonder is if an alternate CURL editor could benefit from not 'display as HEX' but 'display-as-7-bit' ? Any non-printing character which is not '\u0000' would display as some suspect color (absinthe wormwood-green comes to mind ... no, those are my comments ) and quotes which are not ascii quotes would display in hot-pink. I also would like an option to trim all lines after terminal }

Or are those features already there? And where is the "Start Page' and how do you edit it ?

And "Add File Resource" should create the SCURL file if you type a name in the dialog that is not there in the selected directory. And our file dialogs should show something useful on the title-bar ... such as 'SAVE AS' on drive J:\curl\projects\this-project and if it is a package, the name ... such as RANT-UI

And split panes should show which project contains that file or which package or which directory ... and so I need to get to work on an opensource editor alternative and will you join me? I'll have another cuppa decaf ... The CURL doc's give some details on editor extension facilities.

In the meantime, is our best option to go to Edit | Preferences and set the editor to a smarter font family than 'mono-spaced' ? Suggestions?

One tip: the help for surge-do.exe reveals how to bring up a file in the IDE editor from the command line ;-)

Click to view Duke's profile Curl Duke 179 posts since
Oct 17, 2007
1. Re: Dealing with unexpected char encodings Feb 15, 2008 6:05 PM

One thing that might bite you sometime is that I see a note in the Editor docs that says

Note: The Source Editor converts TAB characters and non-breaking SPACE characters into SPACE characters.