As part of my efforts to learn more about functional programming in
general and Haskell in specific, I chose a small project that was
tangentially related to the work I do in my web programming work at
Enterity. If anyone out there knows of a built-in for converting from
Int to Hex and back again, I'd be much obliged. I didn't see anything
skimming the standard libraries via Zvon.org
module Main where
import System.Environment
import Char
import List
main :: IO ()
main = do args <- getArgs
case ( args !! 0 ) of
"encode" -> putStrLn ( encode $ concat $ tail args )
"decode" -> putStrLn ( decode $ head $ tail args )
-- --------------------------------------
encode :: String -> String
encode str = concat $ map encodeChar str
decode :: String -> String
decode "" = ""
decode (c:cs) = case c of
'%' -> fromHex( take 2 cs ) : decode( drop 2 cs )
otherwise -> c : decode cs
-- --------------------------------------
encodeChar :: Char -> String
encodeChar c =
case isLegalChar c of
True -> c : ""
otherwise -> '%' : ( toHex . ord ) c
isLegalChar :: Char -> Bool
isLegalChar c = any (c==) ( [ 'A' .. 'Z' ] ++ ['a' .. 'z'] ++ [ '0'..'9' ] ++ "-_.!~*'()" )
-- --------------------------------------
toHex :: Int -> String
toHex 0 = ""
toHex x =
let rem = x `mod` 16
in toHex( ( x-rem ) `div` 16 ) ++ [ toChar rem ]
fromHex :: String -> Char
fromHex hx =
let a = fromChar $ head hx
z = fromChar $ head $ tail hx
in chr( a * 16 + z )
-- --------------------------------------
toChar :: Int -> Char
toChar x = ( [ '0' .. '9' ] ++ [ 'a' .. 'f' ] ) !! x
fromChar :: Char -> Int
fromChar x = case findIndex (x==) ( [ '0' .. '9' ] ++ [ 'a' .. 'f' ] ) of
Nothing -> 0
Just y -> y
Some time ago I encountered the concept of Literate Programming.
I was intrigued by the idea but never really explored taking up the
practice. I don’t really remember why, though it probably had
something to do with a lack of tuits and no obvious support for it in
my main language, Perl. I also was not really at a point in my
programming career where it would have occurred to me to implement
tools for Literate Perl.
I was reacquainted with it more recently when I started to learn Haskell, which has native support
for literate programs. Haskell is a language that has some intrinsic
appeal for me that I can’t quite put a finger on, but again between not
having yet grasped the Haskell mindset and not having time for a new
project I let that slip by too (I suspect the former will not occur
until I have time for the latter).
More recently I’ve been on a
DRY (Don’t Repeat Yourself) kick in my day-to-day programming. Of
course I’d always practiced DRY to some degree, but lately I’ve been
getting more insistent with myself that I really keep on top of
duplicate or near-duplicate code.
This probably came to a head when I read Steve Yegge’s rant about code-base size. The code base for my main project
is large enough that I don’t visit all it’s layers, nooks and crannies
on a regular basis. The main problem of code maintenance as many
programmers will attest is memory loss. Our own,
that is. Remembering why you wrote code they way you did when you
haven’t seen it in a week, much less months, is a major hurdle when
returning to it. The two solutions I see to reduce the effects of
memory loss are solid documentation and reducing the amount of code
that needs remembering in the first place. Naturally, I chose to look
at the latter.
Dispensing with Boilerplate
I
imagine that any programmer who takes DRY seriously will at some point
take a hard look at “boilerplate code”. More incantation than
instruction, boilerplate code tends to have a very low signal to noise
ratio; most of the code is instead there to make the compiler happy.
Being
Perl, most of the boilerplate code was of my own making. So I hacked
and refactored and experimented with shorter ways of expressing the
same intent (another story for another time) and made some good headway
towards weeding out the verbosity of my code, without sacrificing (and
in some cases enhancing) clarity.
As all good things must be
taken to extremes, eventually I started eyeing some of the standard
Perl incantations and wondered about doing away with them.
First,
library files (modules) in perl must have a final statement in them
that evaluate to true in order to compile. I am not clear on where
this legacy comes from, but the convention to make sure all is well
with your module is to append a “1;” at the end of the file to keep the
compiler happy. A whole whopping 2 characters, but it wasn’t adding
any meaning to my code, and I aimed to remove it.
Second, best
practices recommend coding under “strict” mode, and at least during
active development, under “warnings” mode as well. This makes the Perl
interpreter the least tolerant possible of your errant ways, and I take
these best practices to heart. In fact, the forthcoming 6th version of
Perl makes “strict” the default state, so rather than pronouncing
strict mode at the top of every file, you instead only declare when you
intend to be naughty by explicitly shutting off strictures. I (and
many others) are impatient for Perl6’s arrival, and I decided I wanted
this little bit of code reduction here and now.
A Preprocessor
These
got me thinking about writing a preprocessor. I didn’t want to give up
strictures, and I needed the compiler to accept my modules, but I also
did not feel it was doing me any meaningful good adding these
incantations to each of my files. The “1;” for the module was legacy
nonsense and the strict and warnings pragmas were policies I’d rather
set once across my whole project rather than for each file. So I
figured I could add these in programatically.
At about this point
I ran across the literate programming idea again. I figured that as
long as I was already running my code through a preprocessor, I may as
well have it convert my code from a literate style at the same time. I
can’t really recall what I ran across that brought it up, but by some
route I found myself reading an older article by Mark-Jason Dominus, Pod is not Literate Programming about how POD lacks certain key features of a literate programming system. He pointed to noweb
as a language agnostic tool for Literate Programming, but for some
reason that did not appeal to me. I thought, well, this is Perl!
There must be something on CPAN.
And
indeed there were several attempts on CPAN at implementing Literate
Perl, but none addressed one of MJD’s core issues with Pod, which was
that it doesn’t let you rearrange your code from human-reading order to
compiler-reading order.
The closest I found to what I wanted was Audrey Tang’s Filter::LiterateComments. She chose a style very much in keeping with Haskell’s notation, which is unsurprising given her background with the language. But being a source code filter it never had a chance to do code reordering, and on its own it would not be able to create typeset documentation.
Pretty Documentation
Now
that I was committed (in both senses of the word) to writing a
preprocessor, I wanted to one-up the typesetting question by
prettifying the code as well as the explanation in the documentation.
I knew there had been some work done with PPI to do syntax highlighting, and I had already decided to rely on Perltidy
to enforce code formatting conventions. Further, I recognized that
there were several perl operators written in ascii (such as -> and
=>) that were really just stand-ins for untypeable symbols. I
figured in the interest of legibility I ought to replace them with
their intended entities in the documentation.
So the plan
involved extracting each code block in the source code, running it
through perltidy, then PPI::HTML, and then doing a search-and-replace
on operators that were stand-ins for more legible symbols. That way
something like this:
sub dump_code {
my ( $parser, $extension ) = @_;
my $source;
<<replace anchors with code snippets >>
$source .= "use strict;\n" if $parser->{_use_strict};
$source .= "use warnings;\n" if $parser->{_use_warnings};
$source .= join(
'',
map( { $parser->{_code}{$_} }
grep { defined $parser->{_code}{$_} }
@{ $parser->{_code_sections} } )
);
<<source code amendments based on file type>>
return tidy( $source );
}
into this:
sub dump_code {
my ( $parser, $extension ) = @_;
my $source;
$source ⋅= "use strict;\n" if $parser→{_use_strict};
$source ⋅= "use warnings;\n" if $parser→{_use_warnings};
$source ⋅= join(
'',
map( { $parser→{_code}{$_} }
grep { defined $parser→{_code}{$_} }
@{ $parser→{_code_sections} } )
);
return tidy($source);
}
Bootstrapping
As a tool
for literate programming, naturally I wanted to be able to write the
library itself as literate perl. The dilemma of course, is that the
literate format can’t be executed until it’s translated into regular
Perl. What I found was that I had to first develop the code in a
non-literate style, and get to a point where I could process literate
perl. At that point I could then run the "compiled" non-literate
version on the new literate copy of the module in order to get a
runnable version of the newest code.
Where To Go From Here
The system I have written works well, but there are still a few things I need to do before I can release it to CPAN
- It cannot at the moment create a linked TOC for the headings in the documentation
- The
preprocessor does not spit out a pure POD version of the docs, which is
what CPAN wants for module documentation. Indeed, the literate
documentation isn't really even the sort of documentation CPAN users
are looking for. It may mean putting the overview documentation in a
separate pod file, or marking a certain portion of the literate
document as appropriate for extraction for use on CPAN.
- I actually need to write a bit more of the documentation. While the Pod::Literate module and the literate preprocessing script
are written in the literate style, there are still some longer code
sections that I have not augmented with appropriate documentation.
- Similarly
I have not written any tests. Running it on it's own source code has
been a pretty good test in itself, but I know I will get frowns from
the community if I ship without a good test suite. For the tests-first
segment, I'm sure I'm already getting those frowns.
Then there
are a few things I'd like to do (or have done) but aren't critical.
I'd really like a literate-aware version of perltidy that I could use
to clean up the literate source. Right now it wouldn't know what to do
with it. Similarly I'd like to investigate what it would take to make Perl::Critic compatible with the literate style files.