src2srcml
Section: User Commands (1)
Updated: Sun Nov 12 19:47:07 EST 2006
Index
Return to Main Contents
NAME
src2srcml - translates source code into the XML source-code representation srcML
SYNOPSIS
src2srcml [-hVnizcgv] [-l language] [-d directory] [-f filename] [-s version] [-x encoding] [-t encoding] [input-source-code-file]... [output-srcML-file]
DESCRIPTION
The program src2srcml translates source-code
files into the XML source-code representation srcML. The srcML
format allows for XML addressing, querying, and transformation of
source code. All text from the original source-code file is
preserved including white-space, comments and preprocessor
statements. No preprocessing of the source code is done. In
addition it can be applied to individual source-code files, or code
fragments.
The translation is fast and uses a stream-parsing approach where
parsing is done top-down and elements are issued as soon as they
are detected.
Some options are only available in the libxml2 version.
OPTIONS
- -h, --help
-
Output the help and exit.
- -V, --version
-
Output the version of src2srcml, including whether libxml2 enabled, then exit.
- -e, --expression
-
Translates a single, standalone expression.
- -n, --nested
-
Stores all input source files into one compound srcML
document. Default with --input-file mode.
- -i, --input-file
-
Treats the input file as a list of source files. Each file
is separately translated and collectively stored into a
single compound srcML document. The list has a single
filename on each line starting at the beginning of the line.
Blank lines are ignored. A line with the character '#' at
the start of a line is regarded as a comment character and
the line is ignored.
- -x, --encoding=encoding
-
Sets the xml encoding of the output srcML file to
encoding. The
default is UTF-8. Conversion to the xml encoding from the
source encoding is only performed in the libxml2-enabled
version. Possible encodings with the libxml2-enabled
version can be obtained by using the command iconv
-l.
- -t, --src-encoding=encoding
-
Sets the input encoding of the source-code file to
encoding. The default is ISO-8859-1.
Not stored, but used for any necessary source-code text translation in the libxml2-enabled version.
- --xmlns=URI
-
Sets the URI for the default namespace. Default is xmlns="http://www.sdml.info/srcML/src".
- --xmlns:PREFIX=URI
-
Sets the namespace prefix PREFIX for the namespace URI. Defaults are xmlns:cpp="http://www.sdml.info/srcML/cpp",
and "xmlns:srcerr=http://www.sdml.info/srcML/srcerr".
- --no-xml-declaration
-
No output of the default XML declaration. Useful when the output of the translator is to be placed inside
another XML document.
- --no-namespace-decl
-
No output of namespace declarations. Useful when the output of the translator is to be placed inside
another XML document.
- -z, --compress
-
Output is in compressed gzip format. Only available in the
libxml2-enabled version.
- -c, --interactive
-
Default is to use buffered output for speed. For
interactive applications output is issued as soon as parsed.
- -g, --debug
-
When translation errors occur src2srcml preserves all text,
but may issue incorrect markup. In debug mode the text with
the translation error is marked with a special set of tags
with the prefix srcerr from the namespace
http://www.sdml.info/srcml/srcerr.
- -v, --verbose
-
Verbose output to standard error. Especially useful with the --input-file option. The signal SIGUSR1 can
be used to toggle this option.
METADATA OPTIONS
This set of options allows control over various metadata stored in the srcML document.
- -l, --language=language
-
The programming language of the source-code file. Allowable
values are C, C++, Java, or AspectJ. The language affects parsing,
the allowed markup, and what is considered a keyword.
The value is also stored as an attribute of
the root element unit.
The default is C++.
- -d, --directory=directory
-
The unit element includes an optional, descriptive
attribute, directory. The value is typically obtained from
the path of the input filename. This option allows you to
specify a different directory for standard input or where
the directory is not contained in the path of the input
filename.
For compound srcML documents this option sets
the attribute on the root element.
- -f, --filename=filename
-
The unit element includes an optional, descriptive
attribute, filename. The value is typically obtained from
the input filename. This option allows you to specify a
different filename for standard input or where a different
filename is wanted.
For compound srcML documents this option sets
the attribute on the root element.
- -s, --src-version=version
-
The unit element includes the version, an optional,
purely-descriptive attribute. Sets the value of the
attribute version to version.
The value is a string with no interpretation by the srcML
tools.
For compound srcML documents this option sets
the attribute on the root element.
MARKUP EXTENSIONS
Extensions to the srcML markup are available.
- --literal
-
Additional markup of literal values using the element literal with the prefix "lit" in the namespace "http://www.sdml.info/srcML/literal".
Can also be specified by declaring a prefix for this namespace using the xmlns option, e.g.,
--xmlns:lit="http://www.sdml.info/srcML/literal"
- --operator
-
Additional markup of literal values using the element operator with the prefix "op" in the namespace "http://www.sdml.info/srcML/operator".
Can also be specified by declaring a prefix for this namespace using the xmlns option, e.g.,
--xmlns:op="http://www.sdml.info/srcML/operator"
CPP MARKUP OPTIONS
This set of options allows control over how preprocessing regions are handled,
i.e., whether parsing and markup occur. In all cases the text is preserved.
- --cpp_markup_else
-
Place markup in #else and #elif regions. Default.
- --cpp_text_else
-
Only place text in #else and #elif regions leaving out markup.
- --cpp_markup_if0
-
Place markup in #if 0 regions.
- --cpp_text_if0
-
Only place text in #else and #elif regions leaving out markup.
Default.
SIGNAL PROCESSING
The following signals may be used to control src2srcml:
- SIGUSR1
-
Toggles verbose option. Useful with multiple input files as in the --input-file option.
- SIGINT
-
Completes current file translation (and output) with multiple input files.
The input file currently being translated is allowed to complete, the complex document is
closed, and then the program stops. More than one SIGINT causes default behavior.
This special SIGINT handling only occurs with multiple input files in compound srcML documents.
USAGE
To translate the C++ source-code file main.cpp into the srcML file
main.cpp.xml:
src2srcml main.cpp main.cpp.xml
To translate a C source-code file main.c into the srcML file main.c.xml:
src2srcml --language=C main.c main.c.xml
To translate a Java source-code file main.java into the srcML file main.java.xml:
src2srcml --language=Java main.java main.java.xml
To specify the directory, filename, and version for an input file from standard input:
src2srcml --directory=src --filename=main.cpp --version=1 - main.cpp.xml
To translate a source-code file in ISO-8859-1 encoding into a srcML file with UTF-8 encoding:
src2srcml --src-encoding=ISO-8859-1 --encoding=UTF-8 main.cpp main.cpp.xml
RETURN STATUS
0: Normal
1: Error
2: Problem with input file
3: Unknown option
4: Unknown encoding
5: Libxml2-only feature (not libxml2 enabled)
6: Invalid language
7: Language option specified, but value missing
8: Filename option specified, but value missing
9: Directory option specified, but value missing
10: Version option specified, but value missing
11: Text encoding option specified, but value missing
12: XML encoding option specified, but value missing
15: Invalid combination of options
16: Incomplete output due to termination
CAVEATS
Translation is performed based on local information with no symbol
table. For non-CFG languages, i.e., C/C++, and with macros this
may lead to incorrect markup.
Line endings are normalized in XML formats including srcML.
BUGS
Java language mode does not contain all of Java 1.5 language elements.
Libxml2 directly supports many encodings beyond UTF-8, UTF-16, and ISO-8859-1 through iconv.
However, the BOM (Byte Order Mark) immediately before the XML declaration may not be processed
correctly by srcml2src and by other libxml2-based tools (e.g., xmllint).
Use the LE or BE version of the encoding, e.g., UTF-32BE, UTF-32LE, instead.
SEE ALSO
srcml2src(1)
AUTHOR
Written by
Michael L. Collard
and
Huzefa Kagdi
Index
- NAME
-
- SYNOPSIS
-
- DESCRIPTION
-
- OPTIONS
-
- METADATA OPTIONS
-
- MARKUP EXTENSIONS
-
- CPP MARKUP OPTIONS
-
- SIGNAL PROCESSING
-
- USAGE
-
- RETURN STATUS
-
- CAVEATS
-
- BUGS
-
- SEE ALSO
-
- AUTHOR
-
This document was created by
man2html,
using the manual pages.
Time: 00:47:07 GMT, November 13, 2006