Political website ad
 
     Source Document     Configuration Tools     Global Conformity    
     Implicit Modifications     Explicit Modifications     Spider Properties     Output Options    
corner image corner image
LinkStructor Configuration Utility Heading

  • You must complete either the Web Page Address, Upload File, or Paste Document field for the initial extraction of link data.
    spacer
  • To quickly configure LinkStructor as a link checker or site mapper, check the appropriate box in the Configuration Tools section.
    spacer
  • To submit settings, click here, or click Submit at bottom of page.
    spacer
  • To make the LinkStructor index return to the left frame, click here.

separator

Source Document—complete one of the following
Web Page Address
ED SR SM

Enter the web address of the document you'd like processed, or the address of an HTML List1 or Custom List2. 
Individual    HTML List   Custom List
Upload File
ED SR SM

If you would like LinkStructor to retrieve the document from your computer, select its content-type and enter its path below. 
Select content-type: ASCII (plain text)    binary
Paste Document
ED SR SM

The document to be parsed may be pasted or typed into the box below.
1. 
An HTML List contains HTML links whose targets indicate web pages to process. Other content in the HTML list is ignored. This could be used as a shortcut to entering one URL at a time with the Individual setting, but results for web pages indicated by the HTML list get returned in a single HTML document, with double lines separating the results.
2. 
A Custom List contains a list of addresses, one per line, indicating web pages to process. Use may preceed each address (on the same line). Any address preceded by show and placed after a use line will be substituted for the source document address in the output. Show text containing text other than a single address is included in addition to the the source document address. A maximum of two show addresses per use address are allowed. Text following a pound sign (#) may be used for comments and is not included in the output.
 
As with an HTML List, a Custom List could be used as a shortcut to entering one URL at a time with the Individual setting, and results for web pages indicated by the custom list get returned in a single HTML document, with double lines separating the results.

separator
 
Configuration Tools
Link Checker
ED SR SM

 Configure as a single-page link checker.
 Configure as a link checker for an entire website.
Site Map
ED SR SM

 Configure to create a site map for a single page.
 Configure to create a site map for entire website.
Upload Configuration File
ED SR SM

If you would like LinkStructor to retrieve a configuration file from your computer, select its content-type, enter its path and click Upload. File will appear in Configuration Text box. 
Select content-type: ASCII (plain text)    binary
 
Download Configuration File
ED SR SM

If you would like to save a configuration file to your computer, right click on the button below, click save target as, and enter a filename. 
right click here
Configuration Format
ED SR SM

Configuration information that appears in the Configuration Text box can be distributed to appropriate color-coded fields, and vice versa. This format change will occur immediately after checking one of the following boxes. 
 Text-box to distributed
 Distributed to text-box
Configuration Text
ED SR SM

Configuration information may be pasted or typed into the box below.

separator
 
Global Conformity
Prepend
ED SR SM

Enter any HTML or text that you would like included before the opening anchor tag of each link.
Append
ED SR SM

Enter any HTML or text that you would like included after the closing anchor tag of each link.
Add Attribute
ED SR SM

If you would like any attributes added to the anchor tag of each link, enter them here.
Extracted Data Width
ED SR SM

Enter maximum number of characters of Extracted data to display per line. LinkStructor will shorten lines exceeding this length by adding break tags in place of spaces, before or after punctuation, or in the middle of words. The latter produces status report entry. If left blank or set to 0, break tags won't be added. This option doesn't add break tags before the first or after the last line of a link's extracted data, so a paragraph or break tag should assigned to prepend to prevent text of two different links from sharing the same line.
Absolute Targets
ED SR SM

Type yes if you would like the base address added to relative links. Leave blank or type no to keep link targets as they were found.
Local Targets
ED SR SM

 Prevent the extraction of any link (or link data belonging to a link) that has a non-local
spacer target (see Define Local, below).
Define Local
ED SR SM

Options such as Local Targets and Spider Locally determine whether a link target1 is local by comparing its authority component2 with that of the address used to access the page containing the link (the access address). Select a variation of this process.
 
 Specified
spacer An address must have the same authority component as the access address,
spacer which includes the presence or absence of www., to be considered local.
 
 Ignore www
spacer A www. that immediately follows the first double slash of the access address
spacer or link target to which it's compared will be ignored when determining locality.
 
 Confirm
spacer The document at the access address will be compared with the document at
spacer the alternate form of the access address (by adding or ignoring the www). If both
spacer documents are identical, a www. that immediately follows the first double slash
spacer of the access address or link target to which it's compared would be ignored
spacer when determining locality.
1. 
Relative targets are expanded to their absolute form before determining locality and are returned to their original form afterwards.
2. 
The authority component is the portion of the address between the double slash and the next slash, question mark or end of the address.

separator
 
Implicit Modifications
Exclude Generic Text
ED SR SM

Type yes if you want the title of the target document to be prepended to generic and nondescript link text (such as click, more or a web address) in the extracted data field, and for the link to be omitted if there is no title available.1 Otherwise, type no or leave blank.
Exclude Navigation
ED SR SM

Type yes to eliminate links containing link text such as top and home page.1 Otherwise, type no or leave blank.
Exclude Contact Text
ED SR SM

Type yes to eliminate links containing link text such as contact and feedback, which may or may not open your e-mail client (doesn't check for mailto: within target).1 Otherwise, type no or leave blank.
Move Contact
ED SR SM

Type top or bottom to specify the placement of links such as contact and feedback. Type no or leave blank to position these links as they appear in the original document.
1. 
Only extracted link text is checked. No extracted data or site map entry will be omitted based on the text of a site map entry, which is normally derived from a web page's title. Links omitted from extracted data won't be included in the site map and their target addresses won't be followed in search of more links.

separator
 
Explicit Modifications
Notate Duplicates
ED SR SM

Type yes if you would like numbers to be appended and target page titles (if not all identical) to be prepended to each link whose link text is identical to that of another link extracted from the same page. Otherwise, type no or leave blank.
Exclude Duplicates
ED SR SM

Type yes to omit links that have the same target and link text as another link extracted from the same page.1 Otherwise, type no or leave blank. Also see the Exclude option of Map www.
Exclude Mailto
ED SR SM

Type yes to eliminate links that open your e-mail client (checks for mailto: within target).1 Otherwise, type no or leave blank.
Move Mailto
ED SR SM

Type top or bottom to specify the placement of links, within each "extracted from" section, that open your e-mail client. Type no or leave blank to position these links in the order they appear in the original document.
Slash Directory
ED SR SM

Type yes to append a slash to each link target that points to a directory, when a trailing slash isn't already present (speeds loading of target page). Extensionless files will be detected and not slashed. Otherwise, type no or leave blank.
Slash Base
ED SR SM

Type yes to append a slash to each link target that points to a base directory (domain name), when a trailing slash isn't already present. Otherwise, type no or leave blank. Targets that specify a home page's file, such as domain.com/index.html, will not be slashed. This feature adds consistancy to link targets, but does not speed loading of target page.
Conform www
ED SR SM

Choose whether a www. in a link target should be removed when present, or added when not present.2 Only applies to local links and when Define Local is not set to Specified.
 
All Fields: Keep as found    Add if not present    Remove if present
Site Map: Add if not present    Remove if present
Map www
ED SR SM

The following options affect how or whether links that are identical except for the www. are included in the site map.2 Only applicable to local links (the only links a site map would contain) and if Conform www, is set to no and Define Local is not set to Specified.
 
 Exclude3
spacer Local links with targets that are identical except for the www. would be considered
spacer identical and only one would be included in the site map. The main URL portion
spacer (before the "#") of an intradocument link might differ from the URL it's listed under4
spacer by the presence or absence of a www. Links with directories that are identical except
spacer for the www. would be grouped together, as though their directories are identical.
 
 Same Section
spacer Local links whose targets are identical except for the www. would be grouped in the
spacer same section of the site map.
 
 Separate Sections
spacer Treats a www. as a directory. Since site map links are grouped by their directories,
spacer www and non-www links would be separated.
1. 
Only extracted link text is checked. No extracted data or site map entry will be omitted based on the link text of a site map entry, which is normally derived from a web page's title. Links omitted from extracted data won't be included in the site map and their target addresses won't be followed in search of more links.
2. 
Applies only to a www. at the beginning of the authority component (as in http://www.), so this option wouldn't apply to targets such as http://examplewww.com.
3. 
Exclude Duplicates is related, but it affects local and non-local links in the Extracted Data section as well as the site map, it requires link text to be identical as well as the targets, and a www. would not be ignored.
4. 
Intradocument links in the site map are rendered in a small font, under the non-intradocument (no # or fragment identifier) version. The non-intradocument version is added to the site map even if LinkStructor didn't find it during link extraction.

separator
 
Spider Properties
Spider Locally
ED SR SM

Check box to indicate that only extracted links with local targets (see Define Local) should be used as pointers to other documents to process. Only applicable when there might be more than one cycle of extractions (see Exit_Conditions).
 
 Spider local links only
Spider Rate
ED SR SM

Enter the number of half-seconds that LinkStructor should wait between retrieving webpages. The default value is 1, which makes LinkStructor wait for 1/2 second. A value of 1 or greater is recommened to keep LinkStructor server-friendly, but you may set the value to 0 or leave the box empty for fastest possible execution.1
 
 Half-seconds of sleep between page loads.
Exit Conditions
ED SR SM

Set numerical time and quantity limits for the operations below.
 
 
Cycles
Description
 
Items
Description
 
Kbytes
Description
 
Seconds (total)
Description
 
Seconds (page load)
Enter number of seconds to wait for web pages to load before timing out. General guidlines: For web pages that are large or hosted on slow or busy servers, this value should be increased. If numerous web pages won't be loadable when spidering pages on an otherwise fast web site, decrease this value.
1. 
LinkStructor does not support multiple threads, so no more than one webpage at a time will be retrieved by a single execution of LinkStructor.

separator
 
Output Options
Webpage Format
ED SR SM

Select which copies of which data fields1 you would like displayed as webpages when viewed at your chosen destination (see destination options, below). The source code would still be available through your browser's "view-source" function.
 
Copy 1:  Site Map    Status Report    Extracted Data
Copy 2:  Site Map    Status Report    Extracted Data
Source Format
ED SR SM

Select which copies of which data fields1 you would like displayed as HTML source code when viewed at your chosen destination (see destination options, below). If necessary, the actual source code will be altered so tags and other HTML markup are displayed as such, rather than rendered as a webpage.
 
Copy 1:  Site Map    Status Report    Extracted Data
Copy 2:  Site Map    Status Report    Extracted Data
Destination
ED SR SM

For each data field1 copy that you would like to recieve, select a "screen" or "email" destination. A separate block-form status report covering all processed webpages is the default form.
 
Site Map (copy 1):  Screen    Email    Neither
Site Map (copy 2):  Screen    Email    Neither
 
Status Report (copy 1):  Screen    Email    Neither
Status Report (copy 2):  Screen    Email    Neither
 
Extracted Data (copy 1):  Screen    Email    Neither
Extracted Data (copy 2):  Screen    Email    Neither
Combine Fields
ED SR SM

When there are two or more data fields1 with the same copy number and destination, they could be combined so they will appear on the same web page, in the same file, or in the same email, depending on their destination. For each copy number, select which fields to combine. Checking only one box in either row has no effect.
 
Copy 1:  Site Map    Status Report    Extracted Data
Copy 2:  Site Map    Status Report    Extracted Data
Status Report Content
ED SR SM

Under Construction
1. 
The three data fields are the Site Map (Local links grouped by directory), Status Report (various information about other fields' data), and extracted Data (data extracted from webpages in list form).

   

corner
spacer
Copyright 2002-2003 RecoServe