Home | Trees | Indices | Help |
|
---|
|
The module provides the methods for inspecting docx files.
Author: Vili Auvinen, Juho Tammela
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|
|||
__package__ =
|
|
Gets a style element by the style id from the styles.xml. The style Id links a paragraph using a style in document.xml to the right style in styles.xml. Style id can be in different languages depending on what language the Word was that wrote the document.
Note: XML example: <w:p> <w:pPr> <w:pStyle w:val="Otsikko1"/> (This is the style id.) </w:pPr> <r> ... </r> </w:p> |
Gets a style element by a style name from styles.xml. Style name is found in the styles.xml. A style has always the same name regardless of the language of the Word that wrote the document.
Note: XML example: <w:style w:type="paragraph" w:styleId="Otsikko1"> (This is the style id.) <w:name w:val="heading 1"/> (Here is the style name.) ... </w:style> |
Get the based-on style style id for a given style from styles.xml.
|
Get a style name of the style element.
|
Get the name of a style with a given style id.
|
Get the id of a style with a given style name from styles.xml.
|
Gets the style id of a paragraph element.
|
Gets a themefont from theme1.xml.
Note: XML example: <a:fontScheme name="Office"> <a:majorFont> <a:latin typeface="Cambria"/> <a:ea typeface=""/> <a:cs typeface=""/> </a:majorFont> <a:minorFont> <a:latin typeface="Calibri"/> <a:ea typeface=""/> <a:cs typeface=""/> </a:minorFont> </a:fontScheme> See Also: _getCompleteStyleDefinitions. |
Return style definitions of a given element. First checks if the element has any children and uses recursion if some are found. Next checks if the element has attributes.
If the element tag name is a key in the dict and the element doesn't have any attributes or children, stores value '1' in the dict.
|
Gets all definitions of a style from document dictionary. Converts twips to centimeters.
|
Returns the style definition of the given style from style.xml and theme1.xml. Recursion used because the style can be based on some other style. In addition, the base style gets style definitions from the document defaults. Finally, some style definitions are not found in the XML file at all. These definitions use some default value which must be assumed.
Note: XML example: <w:style w:type="paragraph" w:default="1" w:styleId="Normaali"> <w:name w:val="Normal"/> <w:qFormat/> <w:rsid w:val="006B493C"/> <w:pPr> <w:spacing w:before="140" w:after="220" w:line="360" w:lineRule="auto"/> <w:ind w:left="567"/> <w:jc w:val="both"/> </w:pPr> <w:rPr> <w:rFonts w:ascii="Georgia" w:hAnsi="Georgia"/> <w:lang w:val="fi-FI"/> </w:rPr> </w:style> See Also: _getStyleElementById and _getStyleElementByName for difference. |
Gets the text content of the first element with a certain tag in the given DOM tree.
|
Gets the first child of an element with the given tag name. Returns the element of a the given parent element with the given elementTagName.
|
Gets the children of an element with the given tag name. Returns the element of the given parent element by the given elementTagName.
|
Goes through header or footer references and checks if there is any content in them. Checks if there are headers or footers in the front page by looking for <w:t> tags. Even if there are references to headers or footers, they might be empty.
|
Checks if a section has an automatic page numbering and gets the numbering format. First goes through the section element and checks that the numbering starts at 1. Gets the section numbering of format definition. If it is defined, returns it. If a numbering format is not found in the section properties, it defaults to 'Standard'. If the numbering format is standard, checks the header and footer references for other numbering format definitions. The numbering format in the header or the footer reference is sometimes in <w:instrText> element inside the content of PAGE \* MERGEFORMAT.
|
Looks for text inside a header or footer and sees if the last modifier's name is in there. Problem: sometimes we want to check that there is no name in the header or the footer. If a name is found but it's different from the last modifier's name, result is False, even though a name is in a header/footer. For now just tries to check that either the name of the last modifier or just some name was found.
|
Gets the number format of the given section page number type.
|
Checks that the headers and footers of a document are made correctly. Assumes that the document has three sections:
See Also: checkSections method must pass in order to run this method Notes:
|
Get paragraph elements of the wanted section. The page breaking section break elements changes section, continuous section brake elements don't change section. The first list of the section elements is the cover section. The second list of the section elements is the table of contents-section. The third list of the section elements is the text section. The document has to have at least 3 sections.
|
Gets all the w:sectPr elements of a document or optionally the w:sectPr elements of a specific section. w:sectPr elements are stored in a two dimensional list. Continuous section breaks are appended to current outer list index. The page breaking section raises the outer list index.
|
Goes through two lists of paragraph elements checking if the same paragraph is in both lists.
|
Goes through the section elements in the document checking that the sections are done properly. There must be at least three sections in the document. The cover page and the table of the contents cannot be in the same section. Also checks that the Microsoft Office Word setting "Different first page" is off.
|
Goes through all section properties to see that they have coherent property values. If the property value is the same in all section elements, the value is stored in pageProperties. If something is different between the sections, it's wrong and the page property is set False. For example, if two different section elements have different page top marginal, the property is set False.
|
Gets the document page marginals sizes.
|
Gets the document page sizes.
|
Gets the text content of <w:t>-elements from the given (paragraph) element.
|
Checks if all of the headings created in the document are listed in the table of contents.
|
Check if table of contents is done correctly. It has to have a page break before (and after) it.
See Also: checkTocContent -- calls the method if there's a table of contents to be found. Note: XML example: <w:p w:rsidR="004A16ED" w:rsidRDefault="004A16ED" w:rsidP="006158B0"> <w:pPr> <w:pStyle w:val="Otsikko"/> </w:pPr> <w:r w:rsidRPr="006158B0"> <w:lastRenderedPageBreak/> <w:t>SISALLYSLUETTELO</w:t> </w:r> </w:p> <w:p w:rsidR="002274FC" w:rsidRDefault="00FA6E61"> <w:pPr> <w:pStyle w:val="Sisluet1"/> |
Checks if the front page is done correctly
|
Returns the value of Target attribute of a Relationship element with the given id in a given rels file. The value of Target attribute can be for example a relative path to local XML files or images. It can also be a hyperlink.
|
Returns the parent <w:p>-element of a given element if there is one.
|
Check if there is an image in the document.
|
Gets the image paths or the file names of the images used in the document.
|
Checks if the next paragraph after a picture paragraph uses the caption style. Also checks that the caption contains an automatic field. Goes through all picture paragraphs.
|
Checks that text paragraphs are using styles and that no manual style definitions are made. Goes through all paragraph-elements in a document looking for <w:pStyle>-elements. Gets the style definitions to see if there are manual changes.
Note: Exception: Automatically generated table on contents can contain "manual" style definitions. The <w:sectPr> elements within paragraph elements are skipped also. |
Checks if there is an endnote or a footnote in the document. Looks for w:endnoteReference and w:footnoteReference elements.
|
Goes through images' captions looking for a reference. Then checks if the caption is referenced somewhere.
|
Gets an element by an attribute value.
|
Checks that a style is used in the document.
|
Checks the headings in the document. Goes through the heading styles used in the document checking that they use a multilevel numbering, the numbering is done correctly using styles and that the numbering is connected to other heading styles. Gets all the heading styles used in the document. Searches for the heading's numbering definition reference in styles.xml. Next searches the associated numbering definition in numbering.xml. Next searches the correct numbering level definition associated to the heading. Checks that the numbering is multilevel and done correctly using the heading styles.
Note: XML example: styles.xml: <w:style w:type="paragraph" w:styleId="Heading2"> - Heading 2 style definition <w:name w:val="heading 2"/> <w:pPr> <w:numPr> <w:ilvl w:val="1"/> - Numbering Level Reference <w:numId w:val="1"/> - Numbering Definition Instance Reference </w:numPr> <w:outlineLvl w:val="1"/> </w:pPr> </w:style> numbering.xml: <w:abstractNum w:abstractNumId="0"> - Abstract Numbering Definition <w:multiLevelType w:val="multilevel"/> - Abstract Numbering Definition Type <w:lvl w:ilvl="0"> - </w:lvl> - Numbering Level Definition <w:lvl w:ilvl="1"> - Numbering Level Definition <w:start w:val="1"/> - Starting Value <w:numFmt w:val="decimal"/> - Numbering Format <w:pStyle w:val="Heading2"/> - Paragraph Style's Associated Numbering Level <w:lvlText w:val="%1.%2"/> - Numbering Level Text <w:lvlJc w:val="left"/> - Justification <w:pPr> - Numbering Level Associated Paragraph Properties <w:ind w:left="576" w:hanging="576"/> </w:pPr> </w:lvl> </w:abstractNum> <w:num w:numId="1"> - Numbering Definition Instance <w:abstractNumId w:val="0"/> - Abstract Numbering Definition Reference </w:num> |
Return all paragraph elements that use a style name with a sequential numbering. Gets all paragraphs that use styles with stylenames for example heading 1, heading 2, etc or index 1, index 2, etc.
|
Checks that the document has an automatically made index.
|
Checks that the document has a index that is not empty, and that the index entries are referenced somewhere in the document. First gets all the index styles' definitions from styles.xml and finds paragraphs using the styles in the document.xml. Checks that there is a field code element indicating that the index is generated automatically. Collects the content of the index and checks it isn't empty. Finds references to the index entries and matches them to the index content.
Note: XML example: Index example: <w:p w:rsidR="002F2A09" w:rsidRDefault="00CA51D5"> <w:r> <w:fldChar w:fldCharType="begin"/> </w:r> <w:r> <w:instrText xml:space="preserve"> INDEX \c "2" \z "1035" </w:instrText> </w:r> <w:r> <w:fldChar w:fldCharType="separate"/> </w:r> </w:p> <w:p w:rsidR="002F2A09" w:rsidRDefault="002F2A09"> <w:pPr> <w:pStyle w:val="Index1"/> <w:tabs> <w:tab w:val="right" w:leader="dot" w:pos="3950"/> </w:tabs> </w:pPr> <w:r> <w:t>Index entry level 1</w:t> </w:r> </w:p> Reference example: <w:r w:rsidR="00B27B47"> <w:instrText xml:space="preserve"> XE "</w:instrText> </w:r> <w:r w:rsidR="00B27B47" w:rsidRPr="00B27B47"> <w:instrText>Level 1 entry</w:instrText> </w:r> <w:r w:rsidR="00B27B47" w:rsidRPr="00B27B47"> <w:instrText>:</w:instrText> </w:r> <w:r w:rsidR="00B27B47" w:rsidRPr="0011587C"> <w:instrText>Level 2 entry</w:instrText> </w:r> |
Checks double whitespaces in the document.
|
Checks the *-character in the document.
|
Checks if a string is found in the text content of the document (in the w:t-elements). If string is found, returns how many occurences were found in a paragraph.
|
Checks if the tabulator is used in the document.
Note: Exceptions:
|
Checks if a paragraph is empty.
Note: Expections: Picture in the document produces an empty paragraph. Empty table cell produces an empty paragraph. A table produces an empty paragraph right after the table. Objects and graphics produce an empty paragraph. ... |
Finds all empty paragraphs in the document.
Note: Expections: Picture in the document produces an empty paragraph. Empty table cell produces an empty paragraph. A table produces an empty paragraph right after the table. ...? |
Goes through all paragraph elements in the document looking for paragraphs that use some list style.
|
Checks that the document has a chart copied from a spreadsheet document. The Chart must be pasted as a link. |
Checks that the document has a table copied from a spreadsheet document. For now checks that the table is pasted as a link. |
Checks that the document contains a chart pasted from PowerPoint as a vector graphics picture or as an object. Doesn't really know if the picture or object is actually from PowerPoint! |
Home | Trees | Indices | Help |
|
---|
Generated by Epydoc 3.0.1 on Tue Jun 21 17:30:07 2011 | http://epydoc.sourceforge.net |