CHANGES Nov-09-2006-Beta Use metadata extraction names for info. Allow for blank metadata attribute values. Fix anonymous class in call detection. New "index" element to mark pairs of brackets. Fix regression bug with encoding conversion not happening in strings or char literals. Expose expression mode. No xml declaration and no namespace declaration for srcml2src. Finish no namespace declaration for src2srcml. Expose info and longinfo options for srcml2src. Allow compressed output of XML for srcml2src also. Change compressed option flag to "--compress". Improved error message for out of range unit error in srcml2src. Markup parentheses in expressions as operators. Rework internal handling of URI's. Fix bug with fully qualified class names for extends and implements. Convert variable declarations to also allow markup of anonymous structs. Markup function header incrementally. Add post-parsing grammar rule for cleanup. Make variable with class/struct/union definition as type a declaration in type. Start of markup of structs in typedefs. Fixed typedef ending error. Added option to turn off XML declaration. Specifying prefix for http://www.sdml.info/srcML/srcerr automatically turns on debugging same as --debug option. Don't include cpp namespace declaration for languages other than C or C++. Fix missing markup of exception handling for Java. Output prefixes on info. Change srcml2src namespace prefix flag to "prefix" instead of "namespace". Fix incorrect markup of anonymous Java classes. Check for non-unique namespace prefixes. Add option to src2srcml to allow for specification of namespace prefix. Add option to extract prefix of a namespace from srcml2src. Change --info option in srcml2src to use new multiple output options. Allow for multiple output (in order) of attributes. SIGINT handling for unit count. Fix help formatting error. Change cpp markup to text instead of textonly. Generate internal tokens for indexes. Change hidden --extended option to public --literal option. Move --xml-encoding to help section dealing with output srcML settings. Short flag for --extract-all in srcml2src has been changed from -e to -a. New status 16 for prematurely terminated input lists. For multiple input files, the first SIGINT signal lets the current file finish. Multiple SIGINT signals allows the default behavior. SIGUSR1 now toggles verbose setting. New call check mechanism. Allow specification of attributes on outer unit of compound srcML document, e.g., directory, filename, version. Markup macro arguments. Markup macro name. Allow for access specifier on classes, e.g., private class A {}. Apr-12-2006-Beta Default (in all cases) is now ISO-8859-1 for input format. Changes default in libxml2 version. Fix wrong return code on expand in srcml2src. Add options --cpp_markup_if0 and --cpp_textonly_if0 for controlling whether cpp #else sections are parsed and marked up. --cpp_textonly_if0 is default. Change option --cpp_nomark_else to --cpp_textonly_else and add --cpp_markup_else for controlling whether cpp #else sections are parsed and marked. Fix new cppmode and missing #if. Add option --cpp-nomark-else for controlling whether cpp #else sections are parsed and marked. Fix skipping of #if 0 blocks regression caused by handling of #else sections. Use partial cppmode entity handling. Fix for no newline before EOF on preprocessor line. Start of handling of partial entities in #if #else #end sections. Fix detection of #ifdef and #ifndef for storing cppmode. A statement following a preprocessor else was getting ignored. Since we were skipping all parsing after an else (until the endif) this was not a problem. However, now that we are moving to allowing parsing of the else part, this was a problem. Solved by using a separate stack to keep track of else and endif. Remove fix for multiple namespaces for operators methods and implicit casting because of speed problems until new handling can be implemented. Fix for literals as arguments in template instantiation. Start of new handling for operator method names. Allow multiple namespaces for operator methods for implicit casting. Fix bug with operator methods for implicit casting. Removes empty type element. Detect input and output files when they are the same file. Detect incompatible options with encoding. Change short flag of new option and improve help messages. Add xml encoding to verbose output. Add skip encoding option to srcml2src. Add skip encoding option to src2srcml. Fixed problem with encodings handled by iconv and not by libxml2 (directly) in srcml2src. Fixed problem with encodings handled by iconv and not by libxml2 (directly). Also made encoding changes more efficient in src2srcml. Attribute info for srcml2src now has output of nothing for missing attribute, and output of empty line for blank value of attribute. Added hidden long info option. Check for invalid combination of xml output and source encoding on srcml2src. Output encoding with new option "--info" in srcml2src. With multiple input file and in verbose mode output name of file (as in file list mode). Change output of errors in srcMLTranslator to std::cerr. Put (temporary) fix for encoding problem with srcml2src --xml mode. Removed unused MarkerToken. Add comment handling to srcml2src extraction. Copy non-standard attributes on nested unit in srcml2src. Add preliminary option "--info" to srcml2src. Jan-30-2006-Beta Better error detection for unit numbers greater than the number of units. Remove special handling for ANTLR bug in 2.7.5. Fix bug with else bound to wrong if. Fix bug with misidentification of no parameter macro as call when followed by end of file. Fix bug with misidentification of macro as call when followed by end of file. Fix bug with processing #if 0 block correctly in guessing mode. Fix bug with processing include directive correctly in guessing mode. Return status codes for src2srcml and srcml2src. More intelligent handling of unit attributes for units inside of compound nested documents. Output error when translating entire compound document (without extracting or specifying nested unit). Fix misidentification of call as macro with starting '&' operator. Put in full handling of Java interfaces. Fix regression error for Java packages. Output source encoding in src2srcml when in verbose mode. Write verbose output to standard error. Make sure that C++ is specified as the language for C++ (instead of CPP). Document hidden srcml2src option for nested unit count in help. Fix function pointer declarations with no '*'. Allow for comment character of '#' in file list. Allow for blank lines in file list. Input file option is made a default nested output. Added whether libxml2 enabled to version. Form feeds are now stored using a new empty XML element. Default xml encoding for srcml2src unit extraction is that of the root unit. Allow for libxml and non-libxml builds of src2srcml. Currently, srcml2src is only available with a libxml build. Get default text encoding from locale. Handle non-existing input files correctly. Allow for combined short options in src2srcml. Fixed then after condition problem with while nested in if. Add option to srcml2src to get encoding. Remove append handling. New options make it unnecessary. New attribute, version. Validate encodings in src2srcml before further processing. Added ability to use embedded values in parameters. Added verbose flag (changing version flag). Changed srcml2src to use libxml2. Improve command line options for srcml2src and make the handling more consistent between the programs. Changed src2srcml to use libxml2. Converted boolean parameters to one option parameter. Added gzip compression option to output of src2srcml. Move output options out of srcMLOutput. Move selection of options to main program. Cleanup of file handling in src2srcml program. Changed standalone attribute in xml declaration to "yes". Added command line options to process multiple input files. Aug-29-2005-Beta Output of XML declaration. This allows for proper encoding type to be given and clears up some problems with special Fix markup of throws in Java. Marks as error list of more than one input file in both src2srcml and srcml2src. Prevents overwriting of second parameter. Remove use of wstring in srcml2src. Doesn't work in Visual Studio builds. In addition, it is not used in src2srcml. Make builds easier in Visual Studio. Fix regression problem with character '#' used in text. Eliminate empty expression element in empty expression block. Fix use of calls in expressions in throw lists. Fix problem with use of cpp directive name as identifier names (for constructors and others) Changed use of bootstrap src2srcml to fix ANTLR multi-line comment generation problem to using simple perl command. Change names of error modes to srcmlerr:parse and srcmlerr:mode distinguish between different errors. Change names of error modes to srcmlerr:parse and srcmlerr:mode distinguish between different errors. Change extra mode detection to issue srcML error element. Fix bug with blocks inside of parentheses in expressions. Fix regression with new macro detection by explicitly ending guessing mode stack. A macro statement (a macro with a terminating semicolon) now does not include the semicolon. Moved mode flags out of enum due to problems with long long type __int64 in MS Visual Studio. Cleanup of names and addition of missing license for process pointer table in output. Moved special srcML lexer code to testing source directory. Fix srcml2src problem with new non-ignored end-of-line character. Line comments no longer include the end-of-line character inside of the element. Put back in special extended mode for marking literals. More options will follow. Remove initial special newline whitespace after the start element of unit. NOTE: This will cause breakage with older versions of src2srcml and srcml2src. Improved BUILD information about ANTLR problem. Move definition of namespace URI's to a single include file. Create separate directory for testing code. Fix for template problems in g++ 4.0 Fix for strings in preprocessor lines that don't end. Fix for preprocessor directives at end of file with no newline. Fix for initialized parameter mistaken for declaration. Fix for name and type markup confusion problem with overloaded operator method definition. Partial fix for initialized parameters error reappearing. Improved error handling for incomplete macro structures. Changed order of directory and filename attributes. Add the ability to not issue the XML declaration. Change language attribute on unit element so that it is always inserted. This also changes the order of the attributes on this element. Now handles old K&R C parameter declarations. Changed output processing dispatch. Put in check limiting escaping of '&' only with non-valid UNICODE characters. Allow grouped short options (those without parameters). Update copyright year to 2005. Improved option handling. Options can now be specified in any order. Unrecognized options are properly handled. New option to mark translation errors, -g. When marked, proper namespace is declared. Changed to cantlr (self-contained antlr executable) instead of using Java directly with antlr Removed compatibility mode. Due to ease of translation no one seems to be storing srcML, just generating it when needed. Can be replaced by XML transformation if needed. Fixed clean problems of object file generated by Makefile. Change generated version file to version.cpp. This allows for default compilation as C++. Dec-14-2004-Beta Output of XML declaration. This allows for proper encoding type to be given and clears up some problems with special characters, especially in comments. Will allow user selection via a parameter of encoding type in the future. Large speed increase from previous versions. Timing test increased from ~8,000 lines/second to ~11,000 lines/second. Resulted from tuning of output stage, change in macro/call detection, and general cleanup of token handling. Preprocessor lines are now handled entirely out of normal processing, just like white space and comments. This should not affect any existing translation, but will increase robustness in the face of preprocessor statements. E.g., a return type for a function inside of an #if #else #endif. New append output mode (-a). The output is nested (correctly) into the output file. This allows for repeated src2srcml translations to be combined into a single srcML file. Work remains on the extraction of single srcML file into multiple source code files using srcml2src. Added new parameters for selecting the contents of the directory and filename attributes in the unit tag. Both src2srcml and srcml2src now properly check for existance of input files and output an error message when they are missing. Add capability to extract nested units in srcml2src. Allow for macro followed by block (detect macro correctly). Fixed nested endif problem with else. Temporarily turned off handling of macro calls in function types because of speed penalty. Fixed problem with methods that contained both const and throw. Marks wide literal strings, e.g., L"abc", correctly. Nov-09-2004-Beta New error handling mode. Text that may cause a translation error is preserved and put into a special error element of name srcml:error. Translation then continues. This prevents the translator from crashing, preserves the problem text, and maintains well-formed XML. No namespace for this element is declared, so the XML is invalid. We foresee changes to how this is handled in the future. Allows void parameter in destructors (e.g., g++). CPP directive else section and if 0 sections are not currently marked due to potential to form non well formed sections. This will be changed later to well-formed sections whenever possible. In C mode default public access on a struct is not marked. It is marked in C++ mode. Fixed declarations that look like the start of a function pointer declaration, e.g., a b(*c); Fixed markup of function-pointer declarations with initialization. Allow #else without preceeding #if. Allow macro call as part of function type. Stop escaping entity references (in text), e.g., "#" is left the same. May-02-2004-Beta Major speed improvements. This version translates at ~7500 lines/second (3Ghz Pentium 4, Linux version, single file) in C++ mode, and over 8000 lines/second in C mode. The unoptimized for speed Linux version is at ~6000 lines/second. This is a 250% speed improvement over the last version. The Apr-26-2004-Beta translated at ~3000 lines/second. That was a 50% improvement from the Apr-19-2004-Beta version (~2000 lines/second). This compares to the ~100 lines/second that the alpha version does on the same file. Changes made to the markup are described below. Addition of a compatibility flag (-c) allows the translator to output the old srcML. Default on the new translator is the new markup. The compatibility mode is not noticeably different in speed. The test suite and dtd have been updated for the new srcML. If you find any changes running in compatibility mode (over the previous version) let us know. Changed to new srcML: Complex names are now marked, e.g., a::b replaces a::b, a[] replaces a[] Names are now marked up in types, e.g., int replaces int Tag name changes include: - "using" replaces "using_directive" - "parameter_list" replaces "formal_params" - "argument_list" replaces "actual_params" - "argument" replaces "param" in "argument_list" Template arguments and parameters are now marked similarly to function arguments and parameters. e.g.,