Changes to latex2html 96.3 compared to latex2html 96.1: latex2html 96.3 changes a lot of things over previous versions. The main disadvantages: * The way do_cmd_ and do_env_ subroutines are called and the way arguments are retrieved has been changed. This means that old .perl files will no longer work and need to be rewritten. * Requires Perl 5.001 or newer. * Probably many new bugs introduced. However, in general, the advantages of 96.3 should greatly outweigh the disadvantages. Most of the changes made latex2html behave more closely to the way TeX/LaTeX behave. This in turn means that a lot of things now work correctly or that things that required awkward workarounds are now straight forward. Also, the HTML generated is much closer to the standard and more likely to validate. Generally, latex2html now works directly on the raw LaTeX code. This means no more preprocessing of any kind. Texexpand is no longer needed and since the file is no longer split, no DBM routines are required (which caused problems for some). The code is processed in sequential order, instead of innermost environment first. Definitions and newcommands are processed when they occur so that real scoping is possible (i.e., a definition is local to its scope). Mixing verbatim environments with \input commands is trivial. Since no preprocessing occurs, the original code is always available for image generation. Here is a more detailed list of changes (not sure if this covers all of them): * Major change in the way do_cmd_ subroutines are used. Subroutines are called with two parameters: The first one contains all LaTeX code following the command (as before). However, it is strongly recommended that programmers not use local($text) = @_; or somesuch to get at the text. Instead, this parameter should always be called by reference, i.e. by using $_[0]. This is for several reasons: 1. local($text)=$_; just makes another copy of the (possibly long) text following the command, wasting both time and memory. In contrast, accessing $_[0] is a call by reference and requires no copying. 2. If the subroutine needs to access an argument, it has to be removed from the following text. This however, is not possible if a copy of the text is used. The second parameter contains the exact piece of LaTeX code that triggered the call to the subroutine. For example, do_cmd_bgroup may be called both by `{' as well as `\bgroup'. The return value of do_cmd_ subroutines is also different. It used to be the "value" of the LaTeX command expressed as HTML markup plus the following text. However, this meant yet another copy operation of the (possibly large) LaTeX code. Instead, the return value now only contains the HTML markup produced by the command, nothing more. * Arguments are now *always* retrieved by using get_next_argument or get_next_optional_argument which *must* be called with $_[0] as parameter so that the argument can be removed from the following text. These subroutines are now brace aware. That is, you can use the brackets in optional arguments: \section[The {]} bracket]{A section on closing brackets} * Environments are now processed just like any other commands. That is, there is now do_cmd_begin and do_cmd_end. Consequently, there now need to be two subroutines per environment, not just one. They are called do_env_begin_* and do_env_end_*. Grouping is handled automatically by the do_cmd_begin and do_cmd_end commands. * Real grouping and local declarations and definitions. If you define a macro, its scope is limited to the local group. This is done via do_cmd_bgroup and do_cmd_egroup and the use of an execute stack. What happens is that every time a declaration or definition is executed, a piece of perl code is added to the execute stack that reverses the declaration or definition. At the end of the group, the code on the stack is executed so that the same declarations and definitions as before the group are in effect again. * All grouping is now copied to the images.tex file as well as all definitions and declarations. That is, every {, every }, every \bf, every \def, etc. is copied over to the images.tex file so that definitions and declarations have the same effects on images as they do in the original LaTeX code. * The preamble text is copied to images.tex unmodified. * Support for ALT text in images (e.g. LaTeX code for math equations, more useful footnote ALT text). * Support for trial mode: Certain math equations can be translated without the use of images. In trial mode, translation is attempted. If successful, the translation is used. Otherwise, the equation is converted to an image. * No more use of DBM (one of the major sources of problems under Linux). * No more preprocessing. All translation is done on the raw LaTeX code. * No more use of texexpand. This should solve all problems the previous versions had with \input (e.g. input in figures). * Support for catcodes. This makes support for the alltt environment or german.sty trivial. You can also write: \catcode`\%=12 This: % is no longer a comment. * Comments are now translated into HTML comments. For example, This: % is a comment is translated to This: Note that the way arguments are retrieved has a side effect on comment parsing: \mbox{\catcode`\%=12 This: % is still a comment.} That is, even though the % character is defined to be a regular character, it is still treated as a comment character. LaTeX behaves the same way, that is it also sees this as a comment. In other words, latex2html's comment parsing is now closer to that of LaTeX. * Number of system calls reduced to a minimum. This is done for portability reasons. System calls generally do not port well to a Mac, PC, or to VMS. The only remaining non-portable calls should now reside in the image generation routines. The hope is that as development on latex2html progresses, other platforms can use the updated versions unmodified. * Font declarations now finally work right. The routines have been completely rewritten. Latex2html now keeps track of the current font specification. Whenever something is changed (e.g., size, boldness, italics, etc.), a subroutine is called, which looks at the current and the desired font specifications and determines the proper HTML code to do the switch. The same subroutine is called at the end of the scope to return back to the old specifications. * The produced HTML code is much closer to standard HTML. For example, font commands no longer extend beyond the end of the paragraph (the following is illegal: "Par 1

Par2"). * Full support for TeX-style \def. * texdefs.perl now part of the main latex2html script. * Better at converting accents (use two character workarounds if accented character not available). * Lots more comments in the perl code. * Much better support for counters (now done in HTML) * Much better support for lists (\usecounter supported) * Behaviour of \label closer to the LaTeX behaviour: Depending on whether \ref or \pageref is used, it can now point to either the current section or the exact location of the \label. * Image generation now makes use of information extracted from the log file. That is, during writing of the images.tex file, a bunch of \lthtmltypeout commands are interspersed that write information to the log file such as the current page numbers, the names and sizes of the boxes containing the image generating code, current counter values, and image parameters from the \htmlimg command. This makes the process more robust to images that extend over more than one page and has other advantages (e.g. textogif might use the information). * Better support for links to sections and images (i.e., if a \label is part of an image, the image is embedded inside an anchor instead of adding an "empty" anchor before the image). * Different platforms use different directory delimiters in their pathnames. Similarly, in order to separate paths in path lists, different path delimiters must be used. For example, under Unix, the directory delimiter is '/' and the path delimiter is ':', whereas on a Mac the directory delimiter is ':' and the path delimiter is ','. In order to improve portability, two new variables are introduced, called $dd and $pd. They should always be used when files are referenced by their full path name. * The elements of the %section_info and %toc_section_info hashes are now proper lists, making post_process more efficient.