src2srcml

Section: User Commands (1)
Updated: Sun Nov 12 19:47:07 EST 2006
Index Return to Main Contents
 

NAME

src2srcml - translates source code into the XML source-code representation srcML  

SYNOPSIS

src2srcml [-hVnizcgv] [-l language] [-d directory] [-f filename] [-s version] [-x encoding] [-t encoding] [input-source-code-file]... [output-srcML-file]  

DESCRIPTION

The program src2srcml translates source-code files into the XML source-code representation srcML. The srcML format allows for XML addressing, querying, and transformation of source code. All text from the original source-code file is preserved including white-space, comments and preprocessor statements. No preprocessing of the source code is done. In addition it can be applied to individual source-code files, or code fragments.

The translation is fast and uses a stream-parsing approach where parsing is done top-down and elements are issued as soon as they are detected.

Some options are only available in the libxml2 version.  

OPTIONS

-h, --help
Output the help and exit.
-V, --version
Output the version of src2srcml, including whether libxml2 enabled, then exit.
-e, --expression
Translates a single, standalone expression.
-n, --nested
Stores all input source files into one compound srcML document. Default with --input-file mode.
-i, --input-file
Treats the input file as a list of source files. Each file is separately translated and collectively stored into a single compound srcML document. The list has a single filename on each line starting at the beginning of the line. Blank lines are ignored. A line with the character '#' at the start of a line is regarded as a comment character and the line is ignored.
-x, --encoding=encoding
Sets the xml encoding of the output srcML file to encoding. The default is UTF-8. Conversion to the xml encoding from the source encoding is only performed in the libxml2-enabled version. Possible encodings with the libxml2-enabled version can be obtained by using the command iconv -l.
-t, --src-encoding=encoding
Sets the input encoding of the source-code file to encoding. The default is ISO-8859-1. Not stored, but used for any necessary source-code text translation in the libxml2-enabled version.
--xmlns=URI
Sets the URI for the default namespace. Default is xmlns="http://www.sdml.info/srcML/src".
--xmlns:PREFIX=URI
Sets the namespace prefix PREFIX for the namespace URI. Defaults are xmlns:cpp="http://www.sdml.info/srcML/cpp", and "xmlns:srcerr=http://www.sdml.info/srcML/srcerr".
--no-xml-declaration
No output of the default XML declaration. Useful when the output of the translator is to be placed inside another XML document.
--no-namespace-decl
No output of namespace declarations. Useful when the output of the translator is to be placed inside another XML document.
-z, --compress
Output is in compressed gzip format. Only available in the libxml2-enabled version.
-c, --interactive
Default is to use buffered output for speed. For interactive applications output is issued as soon as parsed.
-g, --debug
When translation errors occur src2srcml preserves all text, but may issue incorrect markup. In debug mode the text with the translation error is marked with a special set of tags with the prefix srcerr from the namespace http://www.sdml.info/srcml/srcerr.
-v, --verbose
Verbose output to standard error. Especially useful with the --input-file option. The signal SIGUSR1 can be used to toggle this option.
 

METADATA OPTIONS

This set of options allows control over various metadata stored in the srcML document.
-l, --language=language
The programming language of the source-code file. Allowable values are C, C++, Java, or AspectJ. The language affects parsing, the allowed markup, and what is considered a keyword. The value is also stored as an attribute of the root element unit. The default is C++.
-d, --directory=directory
The unit element includes an optional, descriptive attribute, directory. The value is typically obtained from the path of the input filename. This option allows you to specify a different directory for standard input or where the directory is not contained in the path of the input filename. For compound srcML documents this option sets the attribute on the root element.
-f, --filename=filename
The unit element includes an optional, descriptive attribute, filename. The value is typically obtained from the input filename. This option allows you to specify a different filename for standard input or where a different filename is wanted. For compound srcML documents this option sets the attribute on the root element.
-s, --src-version=version
The unit element includes the version, an optional, purely-descriptive attribute. Sets the value of the attribute version to version. The value is a string with no interpretation by the srcML tools. For compound srcML documents this option sets the attribute on the root element.
 

MARKUP EXTENSIONS

Extensions to the srcML markup are available.
--literal
Additional markup of literal values using the element literal with the prefix "lit" in the namespace "http://www.sdml.info/srcML/literal". Can also be specified by declaring a prefix for this namespace using the xmlns option, e.g., --xmlns:lit="http://www.sdml.info/srcML/literal"
--operator
Additional markup of literal values using the element operator with the prefix "op" in the namespace "http://www.sdml.info/srcML/operator". Can also be specified by declaring a prefix for this namespace using the xmlns option, e.g., --xmlns:op="http://www.sdml.info/srcML/operator"
 

CPP MARKUP OPTIONS

This set of options allows control over how preprocessing regions are handled, i.e., whether parsing and markup occur. In all cases the text is preserved.
--cpp_markup_else
Place markup in #else and #elif regions. Default.
--cpp_text_else
Only place text in #else and #elif regions leaving out markup.
--cpp_markup_if0
Place markup in #if 0 regions.
--cpp_text_if0
Only place text in #else and #elif regions leaving out markup. Default.
 

SIGNAL PROCESSING

The following signals may be used to control src2srcml:
SIGUSR1
Toggles verbose option. Useful with multiple input files as in the --input-file option.
SIGINT
Completes current file translation (and output) with multiple input files. The input file currently being translated is allowed to complete, the complex document is closed, and then the program stops. More than one SIGINT causes default behavior.

This special SIGINT handling only occurs with multiple input files in compound srcML documents.

 

USAGE

To translate the C++ source-code file main.cpp into the srcML file main.cpp.xml:

src2srcml main.cpp main.cpp.xml

To translate a C source-code file main.c into the srcML file main.c.xml:

src2srcml --language=C main.c main.c.xml

To translate a Java source-code file main.java into the srcML file main.java.xml:

src2srcml --language=Java main.java main.java.xml

To specify the directory, filename, and version for an input file from standard input:

src2srcml --directory=src --filename=main.cpp --version=1 - main.cpp.xml

To translate a source-code file in ISO-8859-1 encoding into a srcML file with UTF-8 encoding:

src2srcml --src-encoding=ISO-8859-1 --encoding=UTF-8 main.cpp main.cpp.xml  

RETURN STATUS

0: Normal

1: Error

2: Problem with input file

3: Unknown option

4: Unknown encoding

5: Libxml2-only feature (not libxml2 enabled)

6: Invalid language

7: Language option specified, but value missing

8: Filename option specified, but value missing

9: Directory option specified, but value missing

10: Version option specified, but value missing

11: Text encoding option specified, but value missing

12: XML encoding option specified, but value missing

15: Invalid combination of options

16: Incomplete output due to termination  

CAVEATS

Translation is performed based on local information with no symbol table. For non-CFG languages, i.e., C/C++, and with macros this may lead to incorrect markup.

Line endings are normalized in XML formats including srcML.  

BUGS

Java language mode does not contain all of Java 1.5 language elements.

Libxml2 directly supports many encodings beyond UTF-8, UTF-16, and ISO-8859-1 through iconv. However, the BOM (Byte Order Mark) immediately before the XML declaration may not be processed correctly by srcml2src and by other libxml2-based tools (e.g., xmllint). Use the LE or BE version of the encoding, e.g., UTF-32BE, UTF-32LE, instead.  

SEE ALSO

srcml2src(1)  

AUTHOR

Written by Michael L. Collard and Huzefa Kagdi


 

Index

NAME
SYNOPSIS
DESCRIPTION
OPTIONS
METADATA OPTIONS
MARKUP EXTENSIONS
CPP MARKUP OPTIONS
SIGNAL PROCESSING
USAGE
RETURN STATUS
CAVEATS
BUGS
SEE ALSO
AUTHOR

This document was created by man2html, using the manual pages.
Time: 00:47:07 GMT, November 13, 2006