CHANGES
Nov-09-2006-Beta
Use metadata extraction names for info.
Allow for blank metadata attribute values.
Fix anonymous class in call detection.
New "index" element to mark pairs of brackets.
Fix regression bug with encoding conversion not happening in strings or char literals.
Expose expression mode.
No xml declaration and no namespace declaration for srcml2src.
Finish no namespace declaration for src2srcml.
Expose info and longinfo options for srcml2src.
Allow compressed output of XML for srcml2src also.
Change compressed option flag to "--compress".
Improved error message for out of range unit error in srcml2src.
Markup parentheses in expressions as operators.
Rework internal handling of URI's.
Fix bug with fully qualified class names for extends and implements.
Convert variable declarations to also allow markup of anonymous structs.
Markup function header incrementally.
Add post-parsing grammar rule for cleanup.
Make variable with class/struct/union definition as type a declaration in type.
Start of markup of structs in typedefs.
Fixed typedef ending error.
Added option to turn off XML declaration.
Specifying prefix for http://www.sdml.info/srcML/srcerr automatically turns on debugging
same as --debug option.
Don't include cpp namespace declaration for languages other than C or C++.
Fix missing markup of exception handling for Java.
Output prefixes on info.
Change srcml2src namespace prefix flag to "prefix" instead of "namespace".
Fix incorrect markup of anonymous Java classes.
Check for non-unique namespace prefixes.
Add option to src2srcml to allow for specification of namespace prefix.
Add option to extract prefix of a namespace from srcml2src.
Change --info option in srcml2src to use new multiple output options.
Allow for multiple output (in order) of attributes.
SIGINT handling for unit count.
Fix help formatting error.
Change cpp markup to text instead of textonly.
Generate internal tokens for indexes.
Change hidden --extended option to public --literal option.
Move --xml-encoding to help section dealing with output srcML settings.
Short flag for --extract-all in srcml2src has been changed from -e to -a.
New status 16 for prematurely terminated input lists.
For multiple input files, the first SIGINT signal lets the current file finish.
Multiple SIGINT signals allows the default behavior.
SIGUSR1 now toggles verbose setting.
New call check mechanism.
Allow specification of attributes on outer unit of compound srcML document, e.g.,
directory, filename, version.
Markup macro arguments.
Markup macro name.
Allow for access specifier on classes, e.g., private class A {}.
Apr-12-2006-Beta
Default (in all cases) is now ISO-8859-1 for input format. Changes default
in libxml2 version.
Fix wrong return code on expand in srcml2src.
Add options --cpp_markup_if0 and --cpp_textonly_if0 for controlling whether
cpp #else sections are parsed and marked up. --cpp_textonly_if0 is default.
Change option --cpp_nomark_else to --cpp_textonly_else and add --cpp_markup_else
for controlling whether cpp #else sections are parsed and marked.
Fix new cppmode and missing #if.
Add option --cpp-nomark-else for controlling whether cpp #else sections are
parsed and marked.
Fix skipping of #if 0 blocks regression caused by handling of #else sections.
Use partial cppmode entity handling.
Fix for no newline before EOF on preprocessor line.
Start of handling of partial entities in #if #else #end sections.
Fix detection of #ifdef and #ifndef for storing cppmode.
A statement following a preprocessor else was getting ignored. Since we were
skipping all parsing after an else (until the endif) this was not a problem.
However, now that we are moving to allowing parsing of the else part, this
was a problem. Solved by using a separate stack to keep track of else and endif.
Remove fix for multiple namespaces for operators methods and implicit casting
because of speed problems until new handling can be implemented.
Fix for literals as arguments in template instantiation.
Start of new handling for operator method names.
Allow multiple namespaces for operator methods for implicit casting.
Fix bug with operator methods for implicit casting. Removes
empty type element.
Detect input and output files when they are the same file.
Detect incompatible options with encoding.
Change short flag of new option and improve help messages.
Add xml encoding to verbose output.
Add skip encoding option to srcml2src.
Add skip encoding option to src2srcml.
Fixed problem with encodings handled by iconv and not by
libxml2 (directly) in srcml2src.
Fixed problem with encodings handled by iconv and not by
libxml2 (directly). Also made encoding changes more efficient in src2srcml.
Attribute info for srcml2src now has output of nothing for missing attribute,
and output of empty line for blank value of attribute.
Added hidden long info option.
Check for invalid combination of xml output and source encoding on srcml2src.
Output encoding with new option "--info" in srcml2src.
With multiple input file and in verbose mode output name of file (as in file list mode).
Change output of errors in srcMLTranslator to std::cerr.
Put (temporary) fix for encoding problem with srcml2src --xml mode.
Removed unused MarkerToken.
Add comment handling to srcml2src extraction.
Copy non-standard attributes on nested unit in srcml2src.
Add preliminary option "--info" to srcml2src.
Jan-30-2006-Beta
Better error detection for unit numbers greater than the number of units.
Remove special handling for ANTLR bug in 2.7.5.
Fix bug with else bound to wrong if.
Fix bug with misidentification of no parameter macro as call when followed by end of file.
Fix bug with misidentification of macro as call when followed by end of file.
Fix bug with processing #if 0 block correctly in guessing mode.
Fix bug with processing include directive correctly in guessing mode.
Return status codes for src2srcml and srcml2src.
More intelligent handling of unit attributes for units inside of compound nested documents.
Output error when translating entire compound document (without extracting or specifying nested unit).
Fix misidentification of call as macro with starting '&' operator.
Put in full handling of Java interfaces.
Fix regression error for Java packages.
Output source encoding in src2srcml when in verbose mode.
Write verbose output to standard error.
Make sure that C++ is specified as the language for C++ (instead of CPP).
Document hidden srcml2src option for nested unit count in help.
Fix function pointer declarations with no '*'.
Allow for comment character of '#' in file list.
Allow for blank lines in file list.
Input file option is made a default nested output.
Added whether libxml2 enabled to version.
Form feeds are now stored using a new empty XML element.
Default xml encoding for srcml2src unit extraction is that of the root unit.
Allow for libxml and non-libxml builds of src2srcml. Currently, srcml2src is only available with a
libxml build.
Get default text encoding from locale.
Handle non-existing input files correctly.
Allow for combined short options in src2srcml.
Fixed then after condition problem with while nested in if.
Add option to srcml2src to get encoding.
Remove append handling. New options make it unnecessary.
New attribute, version.
Validate encodings in src2srcml before further processing.
Added ability to use embedded values in parameters.
Added verbose flag (changing version flag).
Changed srcml2src to use libxml2.
Improve command line options for srcml2src and make the handling more consistent between the programs.
Changed src2srcml to use libxml2.
Converted boolean parameters to one option parameter.
Added gzip compression option to output of src2srcml.
Move output options out of srcMLOutput.
Move selection of options to main program.
Cleanup of file handling in src2srcml program.
Changed standalone attribute in xml declaration to "yes".
Added command line options to process multiple input files.
Aug-29-2005-Beta
Output of XML declaration. This allows for proper encoding
type to be given and clears up some problems with special
Fix markup of throws in Java.
Marks as error list of more than one input file in both src2srcml and srcml2src. Prevents overwriting
of second parameter.
Remove use of wstring in srcml2src. Doesn't work in Visual Studio builds. In addition,
it is not used in src2srcml.
Make builds easier in Visual Studio.
Fix regression problem with character '#' used in text.
Eliminate empty expression element in empty expression block.
Fix use of calls in expressions in throw lists.
Fix problem with use of cpp directive name as identifier names (for constructors and others)
Changed use of bootstrap src2srcml to fix ANTLR multi-line comment generation problem to using simple
perl command.
Change names of error modes to srcmlerr:parse and srcmlerr:mode distinguish between different errors.
Change names of error modes to srcmlerr:parse and srcmlerr:mode distinguish between different errors.
Change extra mode detection to issue srcML error element.
Fix bug with blocks inside of parentheses in expressions.
Fix regression with new macro detection by explicitly ending guessing mode stack.
A macro statement (a macro with a terminating semicolon) now does not include the semicolon.
Moved mode flags out of enum due to problems with long long type __int64 in MS Visual Studio.
Cleanup of names and addition of missing license for process pointer table in output.
Moved special srcML lexer code to testing source directory.
Fix srcml2src problem with new non-ignored end-of-line character.
Line comments no longer include the end-of-line character inside of the element.
Put back in special extended mode for marking literals. More options will follow.
Remove initial special newline whitespace after the start element of unit.
NOTE: This will cause breakage with older versions of src2srcml and srcml2src.
Improved BUILD information about ANTLR problem.
Move definition of namespace URI's to a single include file.
Create separate directory for testing code.
Fix for template problems in g++ 4.0
Fix for strings in preprocessor lines that don't end.
Fix for preprocessor directives at end of file with no newline.
Fix for initialized parameter mistaken for declaration.
Fix for name and type markup confusion problem with overloaded operator method definition.
Partial fix for initialized parameters error reappearing.
Improved error handling for incomplete macro structures.
Changed order of directory and filename attributes.
Add the ability to not issue the XML declaration.
Change language attribute on unit element so that it is always inserted.
This also changes the order of the attributes on this element.
Now handles old K&R C parameter declarations.
Changed output processing dispatch.
Put in check limiting escaping of '&' only with non-valid UNICODE characters.
Allow grouped short options (those without parameters).
Update copyright year to 2005.
Improved option handling. Options can now be specified in any order. Unrecognized options are properly handled.
New option to mark translation errors, -g. When marked, proper namespace is declared.
Changed to cantlr (self-contained antlr executable) instead of using Java directly with antlr
Removed compatibility mode. Due to ease of translation no one seems to be storing srcML,
just generating it when needed. Can be replaced by XML transformation if needed.
Fixed clean problems of object file generated by Makefile.
Change generated version file to version.cpp. This allows for default compilation as C++.
Dec-14-2004-Beta
Output of XML declaration. This allows for proper encoding type to be given and clears up some problems with
special characters, especially in comments. Will allow user selection via a parameter of encoding type in the
future.
Large speed increase from previous versions. Timing test increased from ~8,000 lines/second to
~11,000 lines/second. Resulted from tuning of output stage, change in macro/call detection, and general
cleanup of token handling.
Preprocessor lines are now handled entirely out of normal processing, just like white space and comments. This
should not affect any existing translation, but will increase robustness in the face of preprocessor statements.
E.g., a return type for a function inside of an #if #else #endif.
New append output mode (-a). The output is nested (correctly) into the output file. This allows for repeated
src2srcml translations to be combined into a single srcML file. Work remains on the extraction of single srcML
file into multiple source code files using srcml2src.
Added new parameters for selecting the contents of the directory and filename attributes in the unit tag.
Both src2srcml and srcml2src now properly check for existance of input files and output an error message when they
are missing.
Add capability to extract nested units in srcml2src.
Allow for macro followed by block (detect macro correctly).
Fixed nested endif problem with else.
Temporarily turned off handling of macro calls in function types because of speed penalty.
Fixed problem with methods that contained both const and throw.
Marks wide literal strings, e.g., L"abc", correctly.
Nov-09-2004-Beta
New error handling mode. Text that may cause a translation error is preserved and put into a special error
element of name srcml:error. Translation then continues. This prevents the translator from crashing, preserves
the problem text, and maintains well-formed XML. No namespace for this element is declared, so the XML is
invalid. We foresee changes to how this is handled in the future.
Allows void parameter in destructors (e.g., g++).
CPP directive else section and if 0 sections are not currently marked due to potential to form non well formed
sections. This will be changed later to well-formed sections whenever possible.
In C mode default public access on a struct is not marked. It is marked in C++ mode.
Fixed declarations that look like the start of a function pointer declaration, e.g., a b(*c);
Fixed markup of function-pointer declarations with initialization.
Allow #else without preceeding #if.
Allow macro call as part of function type.
Stop escaping entity references (in text), e.g., "#" is left the same.
May-02-2004-Beta
Major speed improvements. This version translates at ~7500 lines/second (3Ghz Pentium 4,
Linux version, single file) in C++ mode, and over 8000 lines/second in C mode. The unoptimized for
speed Linux version is at ~6000 lines/second.
This is a 250% speed improvement over the last version. The Apr-26-2004-Beta translated at
~3000 lines/second. That was a 50% improvement from the Apr-19-2004-Beta version (~2000 lines/second).
This compares to the ~100 lines/second that the alpha version does on the same file.
Changes made to the markup are described below. Addition of a compatibility flag (-c) allows
the translator to output the old srcML. Default on the new translator is the new markup. The compatibility
mode is not noticeably different in speed.
The test suite and dtd have been updated for the new srcML. If you find any changes running in compatibility
mode (over the previous version) let us know.
Changed to new srcML:
Complex names are now marked,
e.g., a::b replaces a::b,
a[] replaces a[]
Names are now marked up in types,
e.g., int replaces int
Tag name changes include:
- "using" replaces "using_directive"
- "parameter_list" replaces "formal_params"
- "argument_list" replaces "actual_params"
- "argument" replaces "param" in "argument_list"
Template arguments and parameters are now marked similarly to function
arguments and parameters.
e.g., template <int a>
Throw specifiers in function headers are now marked with a "throw"
New elements include:
- "decl" for declarations (separate from decl_stmt)
- "member_list" for constructor member initialization lists
Apr-26-2004-Beta
Changed language flag to "C++"
Speed improvements both in build options and in code itself. Approximately 50% faster in Linux executable.
Apr-19-2004-Beta
Fix problems with command line options and
Cleanup of -h and --help options.
Addition of version information.
Main with no return value is now identified correctly.
Major source file reorganization.
Apr-15-2004-Beta
Remove unused macro identifier name code in lexer. Speedup of 30%.
Apr-14-2004-Beta
Fix default standard input with language flag
Split lexer into C++ language specific.
Moved to Antlr 2.7.3
Move to detecting possible macro names in lexer.
Fix types in typedef more complex than enum.
Fix misidentification of "const" before enum as macro.
Fix error with namespaced or class prefixed function pointer names in typedef.
Allow macro call before block.
Fixed throw in destructor declaration/definition.
Command line parameters for language type (C or CPP) and help options. Allow
"-" for standard input.
Removal of language type based on file extension.
Apr-10-2004-Beta
Bunch of fixes based on test suite.
Prepare for literal tags.
Mar-29-2004-Beta
Fixed most usages of calls in array indexing.
Fixed use of preprocessor directive names as identifiers.
Fixed false declaration statements (*a b)
Fixed escaped characters in comments.
Fixed constructor parameter list problems.
Regression fix of '\''
Fixed multiline block comment at end of preprocessor define.
Fixed incomplete detection of namespaced type names.
Fixed initialized parameters in constructors.
Mar-04-2004-Beta
Fixed multiple parameter calls in switch case expressions.
Allow types in arguments (for sizeof).
New MS Windows version.
Fixed use of parentheses in variable initializations.
Fixed multiple digit character constants.
Fixed multiple hex characters in string.
Fixed end of line comments and preprocessor statements.
Mar-03-2004-Beta
Behaviour of command line and stdin and stdout restored for MS Windows aversion.
Fixed use of comma's in expressions nested in other statements.
Template arguments are now attached to a name allowing for proper detection of templated
function calls.
Fixed numbers of the form "1.".
Fixed expressions of the form "*a = ..." being translated as declarations.
Allow for single macros with no terminating symbol.
Fixed calls in case expressions.
Fixed comment at end of cpp define.
Fixed "&=" operator.
Formfeeds are now treated as whitespace.
Fixed expression calls in array variable declarations.
Fixed "*" at the start of expressions.
Markup macro calls when they have embedded statements.
Allow greater variety of asm contents. Still not marking up contents,
just entire statement.
Allow cpp keyword "error" as name.
Fix of cpp define.
Fix do statements with embedded statements besides blocks.
Handle distinction between "*=" and "*" in expression.
Fix struct name in type (broken from previous fix)
Fix multiple statements in for initialization separated by commas.
Fix multiple statements in for increment separated by commas.
Fix comma problem in function call with more than one parameter
embedded inside array index.
Fix struct with name and definition in typedef.
Feb-03-2004-Beta
Filenames for input and output are now allowed on the command line.
Fixes:
Class, struct and union are all handled in a similar way.
Added tags for class, struct and union declarations.
Default public section of classes and private section of struct and union is now
marked correctly with an attribute. They also now start at the very beginning of
the block. Any whitespace before an access section tag (public, private, or
protected) is marked with the default section.
Structs embedded in the types of variable declarations now allow nested parentheses.
All line ending characters for UNIX, DOS and Mac are matched and converted to a
a single '\n'. Expect that this will be converted to the proper line ending
on the platform on which the translator is built.
Structs with a name "struct A" now allowed in types.
Fixed problem with ternary operator (a ? b : c) in blocks.
Keywords in filenames inside of angle brackets are now detected properly.
Fixed problem where character '`' in comments causes crash.
Fixed problem with empty statement causing crash.
Fixed problem with call format of variable declaration initialization.
Partial fix of macro parameter list with extra parentheses.
Allow comments to end lines for preprocessor constructs.
Redid string and character literal detection.
Fix of parameter detection problem.
Fix of template parameters broken in last release.
Jan-29-2004-Beta
Enums embedded in typedefs are now handled properly.
All text is now XML escaped, including literal strings.
Fix function call problem.
Fix do statement markup problem.
Jan-28-2004-Beta
The character "@" is now allowed in block comments (caused crash)
Structs embedded in variable declarations are partially fixed (full fix when
tags are turned for names in types)
XML characters are escaped in comments.
Line comments now end in the same place for both Windows and UNIX
Remove unused output and lexeme counting code and grammars
Jan-20-2004-Beta
Release of Beta version