<HTML> <!-- $Id$ --> <HEAD> <TITLE>Newsgroup Interchange within FidoNet.</TITLE> </HEAD> <!-- Background white, links blue (unvisited), navy (visited), red (active) --> <BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#000080" ALINK="#FF0000" > <PRE> Document: FSC-0059 Version: 001 Date: 08-Mar-1992 Newsgroup Interchange within FidoNet Jack Decker 1:154/8@fidonet A proposed standard for the interchange of USENET News messages among FidoNet nodes. Status of this document: This FSC suggests a proposed protocol for the FidoNet(r) community, and requests discussion and suggestions for improvements. Distribution of this document is unlimited. Fido and FidoNet are registered marks of Tom Jennings and Fido Software. Introduction: This document defines the standard format for the interchange of USENET news messages among FidoNet nodes. It incorporates by reference the document RFC-1036, "Standard for Interchange of USENET Messages" by M. Horton of AT&T Bell Laboratories and R. Adams of the Center for Seismic Studies. A copy of RFC-1036 should be included in the distribution archive of this standard. However, RFC-1036 is NOT applicable in its entirety to FidoNet. Therefore, unless specifically referenced elsewhere in this document, only section 2 of RFC-1036 should be considered part of this standard. Section 3, which deals with "control messages", may be implemented in FidoNet on an optional basis, and if processing of control messages is included in a FidoNet implementation, it should be done in accordance with section 3 of RFC-1036 to the extent possible. Section 4 of RFC-1036 is *NOT* applicable to FidoNet (except for section 4.3, which will be discussed later) and therefore is NOT included as part of this standard. Section 5 of RFC-1036 is a treatise on the News Propagation Algorithm used within UseNet, and should be studied even though it is not directly applicable to FidoNet, in particular because it contains a discussion on the prevention of loops (what we in FidoNet commonly refer to as "dupe loops"). Please note that FidoNet implementations do not recognize nor support what is referred to as the "old format" or the "A format" in section 2 of RFC-1036. The goal of this document is to define a standard for the interchange of news messages between FidoNet nodes in a format that will also be acceptable to UseNet hosts. In order to simplify the creation of software that conforms to this standard, we do not intend to support every news format that has ever existed in UseNet. The standard described in RFC-1036 is used by the majority of UseNet hosts, and therefore it is the standard that will be adopted in this document. This standard will contain three sections: General theory of newsgroup transmission, Format and protocols of batched newsgroups, and the translation of newsgroup messages to and from FidoNet message format. 1. General theory of newsgroup transmission: Prior to the introduction of the DoveMail program, the usual method of gating a UseNet newsgroup into FidoNet was to convert it to FidoNet echomail, and then send it to "downstream" nodes in echomail format. This method is still used at the majority of gateway systems at this writing. Unfortunately, no conversion process is perfect, and some useful control information is usually lost in the conversion. In addition, most FidoNet echomail processors don't handle long messages (which are fairly common in newsgroups) well at all, and many gateway systems either try to split these messages into multiple parts (a somewhat awkward process) or discard them entirely. Because the duplicate message detection algorithms used in many FidoNet echomail processors incorrectly identify some of the parts of a split message as duplicates, parts of long messages often get "lost" when transmitted as echomail. Also, UseNet allows a message to be posted to multiple newsgroups, and when such messages are converted to echomail, it may be necessary to create multiple copies of the message (one for each echomail area that it would be placed in), thus increasing the transmission time for such messages. Even normal-length newsgroup messages may be falsely discarded as duplicates by some "downstream" echomail processors. The reason this is a particular problem in newsgroups converted to echomail is because some echomail processors use a checksum of parts of FidoNet message headers to determine if messages are duplicates. Since all newsgroup messages are assumed to be addressed to "All", and since some gateway software uses the date and time that the message was converted to echomail rather than the original date and time from the message, it's quite possible that the remainder of the message header contains information that is similar enough to information in another message's header to cause it to be discarded as a duplicate message. This happens far more frequently with converted newsgroup messages than with messages originally entered as echomail. Finally, when a BBS user enters a reply to a news message that has been converted to echomail, in many cases the information is simply not available in the original message to generate a proper "References:" line in the reply, as required by RFC-1036. If the original message contained a "Followup-To:" line, which requires that replies be posted to a different newsgroup than the one in which the original message was entered, this line may not transmitted in the message as converted to echomail. And even if this information is available, no echomail processor currently available will modify the reply message as required (to add the "References:" line where necessary, or to move the message to a different area if it is a reply to a message that contained a "Followup-To:" line). Under this proposed standard, none of the UseNet message header information is lost in transmission between nodes, and reply messages can be generated that conform to UseNet specifications. If a message is posted to multiple newsgroups, it is only transmitted once (instead of multiple times as it might be if converted to echomail). Also, long messages are not truncated or changed in transmission between nodes, and finally, there is no chance that a message will be improperly discarded as a duplicate. The main thing to remember is that under this standard, news messages are never converted to echomail. Echomail is an irrelevant concept in this context, since we are not passing echomail between nodes. Instead, newsgroups are transmitted in the native format specified by RFC-1036, and tossed directly from batched newsgroup packets to the FidoNet message format (e.g. the *.msg format) if necessary. Keep in mind that most FidoNet BBS software uses the same general format not only for echomail messages, but also for netmail and local message areas, so it is not necessary to transmit messages between nodes in echomail format if another format is more suitable for the type of message being transmitted. 2. Format and protocols of batched newsgroups: When newsgroup messages are transmitted between systems, the individual messages must conform to the specifications of section 2 of RFC-1036, and section 3 of this document. Where section 3 of this document defines a more restrictive standard than RFC-1036, this document shall take precedence. When transmitting news messages between FidoNet nodes, they must be sent in a batched newsgroup file (as described in section 4.3 of RFC-1036) unless some other format is agreed upon in advance. The transmission of unbatched news messages, or the use of any batching method other than that described in section 4.3 of RFC-1036 shall be considered non-standard. Please note that RFC-1036 section 4.3 refers to this batching process as combining several messages into "one large message", but we will refer to this "one large message" as a "batched newsgroup file", or a "UseNet format mail packet" rather than as a "large message", since FidoNet systems do not normally handle large "messages". When messages pass through a FidoNet system on their way to other nodes, the header lines in the message may be modified to conform with the standards given here. However, the text (body) of a message should NEVER be altered (one exception: Carriage Returns MAY be converted to Line Feeds in order to conform to this standard, but this is neither required nor expected of software). The standard format for sending a batched newsgroup file to other FidoNet nodes is as follows: First, as will be noted in section 3 of this document, individual lines of the batched newsgroup file must be terminated with Line Feeds only, and the file must NOT contain Carriage Return characters (ASCII 13). Batched newsgroup files shall be transmitted between FidoNet nodes as files named using the filename ????????.PKU, where the eight character root name can be any of the hexadecimal digits 0 - 9 or A - F. The .PKU extension (which stands for "PacKet - Usenet format") is the news equivalent of the .PKT file used to transmit FidoNet format netmail and echomail between nodes. Batched newsgroup files with the filespec ????????.PKU may be archived into a standard mail archive file (bearing the extension *.MO?, *.TU?, *.WE? ... *.SU?). It is assumed that the receiver of batched newsgroup files will take any necessary steps to make sure that both *.PKU and *.PKT files are extracted from incoming mail archive files before the mail archive files are deleted. In certain cases, this may mean that an external unarchive shell may have to be used, instead of allowing the echomail processor to call the unarchiver (typical external unarchive shell programs at this writing are GUS, POLYXARC, and SPAZ). A batched newsgroup file awaiting transmission may be stored in a FidoNet system's "outbound" area in uncompressed form, prior to being archived for transmission or sent in uncompressed form. It is suggested that when a system uses the .OUT extension to indicate an uncompressed netmail or echomail packet, the .UUT extension be used to indicate an uncompressed batched newsgroup packet. It is expected that a .UUT file in a system's "outbound" area will be treated in much the same way as an .OUT file, except it will be renamed to a file with an extension of .PKU (rather than .PKT) before being archived into the mail archive. This implies that the root name of the .UUT file will contain the net number and node number of the destination system, expressed as four hexadecimal digits each for net and node numbers, in the same manner as the root name for a FidoNet .OUT file is constructed. The root filename of the *.PKU file should be an eight digit hexadecimal number, with leading zeroes used if necessary, in order to make an eight character root filename. It is suggested that this hexadecimal number be based on time of year, with 00000000.PKU generated at exactly midnight on January 1 and FFFFFFFF.PKU generated at just a moment before midnight on December 31. However, it is permissible to use the same algorithm that is used to generate the root filename for *.PKT files. The normal sequence for transmission of messages between FidoNet nodes might then be described as follows: a. Messages created on the originating system are placed into a batched newsgroup file conforming to the specifications of RFC-1036 section 4.3. When this batched newsgroup file is destined for another FidoNet node, it will have a filename of the format: [4 hex digit net number][4 hex digit node number].UUT This file will then be placed in the outbound mail area for packing. b. A mail packing program will examine the outbound mail area and, upon finding the .UUT file, will rename it to a file with an extension of .PKU, and then shell to a compression program in order to place the *.PKU file into a new or existing mail archive file for the destination node. Mail archive files bear extension names consisting of the first two letters of a day of the week (in the English language) plus a numeric character in the range 0 - 9 (for example, .MO5 or .TH7). The method of compression for the mail archive is as agreed upon between the originating and destination nodes. No "standard" method of compression for the mail archive is specified in this document. NOTE: If the compression program fails for any reason (such as running out of disk space), the mail packing program MUST rename the .PKU file back to the original *.UUT filename before exiting. Since batched newsgroup files do not contain a header that indicates the destination node, there would be no way to determine the proper destination node if the file were not renamed back to the original filename. c. The mail archive is transmitted in the usual manner by a FidoNet compatible mailer, or such other means as may be agreed upon in advance by the sysops of the originating and destination nodes. d. At the destination system, the individual files are extracted from the mail archive. *.PKT files are processed in the usual manner to extract any netmail or echomail messages, while *.PKU files are processed by software designed to handle batched newsgroup files. In this context, such files could be "handled" by re-processing the messages and batching them to be sent on to one or more additional node(s), or by tossing the messages to the local message base, or both. Please note that this standard does not anticipate that batched newsgroup files will be converted to FidoNet echomail at any point along the way. It is realized that this may indeed happen, but such conversions should be considered as something to be avoided if at all possible due to the problems discussed in section 1 of this document. 3. Translation of newsgroup messages to and from FidoNet message format: NOTE: Where applicable, the standards defined in this section for messages shall apply not only to locally created messages, but also to all messages sent to "downstream" FidoNet nodes. In this context, "FidoNet message format" means that format in which messages commonly reside on a FidoNet BBS. At this writing, there are three formats commonly used for message storage on FidoNet systems, but other formats may be in use as well. The three most common formats are the "*.msg" format as used by the original Fido program (and a host of programs since), also commonly referred to as the "single message per file format"; the "Hudson" format, used by QuickBBS, Remote Access, and some other products; and the "Squish" format used by the Maximus BBS and the "Squish" echomail processor. Because there are so many message formats, some other programs have taken the approach of trying to convert UseNet news into echomail, creating *.PKT files which can theoretically be processed by any FidoNet system. However, since the *.PKT files are processed by the echomail processor, all the limitations and pitfalls associated with converting newsgroup messages to echomail come into play. The preferred way of handling incoming messages would be to have the BBS (or message reader/editor) software directly read batched newsgroup files. In this way, the files would not have to be "processed" per se. As new batched newsgroup files arrived on a system, they could simply be concatenated to the existing message base, and then a utility could be run that would build an index to the message base, in a manner somewhat similar to the way "flat file" message bases are currently implemented on some BBS's. Of course, you'd need to occasionally run a utility to delete old messages in order to keep the message base from growing too large, and new messages entered on the system would have to be exported from the system in a separate batched newsgroup file. However, at this writing no FidoNet-compatible BBS or message editor is capable of directly reading a batched newsgroup file. The second most preferable method is to convert news messages directly to the message format used by that system. At this writing the DoveMail software includes utilities (NewsToss and NewsScan) that can convert batched newsgroup files to and from messages in the *.msg (single message per file) format. It should be possible to convert batched newsgroup files to and from other FidoNet message formats as well. The method in which messages are stored on a BBS, and the method in which it is determined which new (locally-entered) messages need to be exported from the system will necessarily be implementation-specific. One method that can be used with *.msg type message bases is to maintain a "high water mark" in 1.msg, similar to the "high water mark" used for echomail messages, and additionally to mark messages received from other nodes as "sent" when they arrive, and locally-entered messages as "sent" when they have been exported, and to never re-send a message marked as "sent". When tossing incoming messages, duplicate messages can be detected by comparing the contents of the "Message-ID:" line with those of previously received messages. This may be slow processing considerably, however, and would require storage of a history file of "previously seen" messages. Another method is to look in the "Path" line and see if we are already listed in the path; if so, the message is a duplicate and should be deleted. This method is faster and does not require maintenance of a history file, but will not guard against duplicate messages arriving from one's feed that have not passed through the system twice (for example, a message that arrived from two different paths). Fortunately, UseNet folks seem to understand the need for proper topology, so those types of dupes are relatively rare. FidoNet sysops taking UseNet feeds must understand that it is IMPERATIVE that a feed of any one newsgroup be obtained from only ONE source, especially if they are then passing that newsgroup to any "downstream" nodes. This absolutely does NOT imply that geographic restrictions on newsgroup distribution are necessary or desirable! Additional comments on preventing "loops" can be found in section 5 of RFC-1036, in the discussion of the News Propagation Algorithm. Please note that only two methods of loop prevention are included in this standard: 1) The history mechanism. Each host keeps track of all messages it has seen (by their Message-ID) and whenever a message comes in that it has already seen, the incoming message is discarded immediately. 2) Not sending a message to a system listed in the "Path" line of the header, or to the system that originated the message (which, in practice, should be listed in the Path line). No other methods of dupe loop prevention are acceptable. In particular, checksums of portions of the message header or message itself are NOT permitted to be used for loop prevention, except perhaps as a method to quickly identify POTENTIAL duplicate messages before doing a full string comparison with the Message-ID data in the history file. In no case should a checksum be used as the SOLE method of determining whether a message is a duplicate. When newsgroup messages are created for transmission to other systems, or when received messages are transmitted other systems, the individual messages must conform to the specifications of section 2 of RFC-1036. However, in order to simply programming of software designed to handle such messages, the following modifications to the standard are proposed for use within FidoNet. Please note that these are slightly more restrictive than the standard permitted by RFC-1036: a. The "old format" or "A format" described in section 2 of RFC-1036 is NOT supported in FidoNet. Only the format detailed in RFC-1036 (sometimes referred to as the "B" News format) is supported. The vast majority of UseNet sites currently use the "B" News format. b. The UseNet standard permits the use of "white space" to separate certain items in the message header, with "white space" defined as blanks or tabs. It also states that "the Internet convention of continuation header lines (beginning with a blank or tab) is allowed." However, it should NOT be ASSUMED that "continuation header lines" will be used in any message. It is suggested that when creating newsgroup messages for transmission to other systems, the use of tab characters be avoided in header lines, and that "continuation header lines" NOT be used, even if this means that a header line will be considerably longer than the length of a screen line. Software that creates FidoNet-format messages (for display to BBS callers) from batched newsgroup files (that is, newsgroup message tossers) should break up such extra-long header lines, using a single space character ONLY (NOT a tab!) at the start of "continuation header lines." Since batched newsgroup files received from a UseNet site may contain "continuation header lines" and/or tabs as "white space" in header lines, it is necessary to be able to decode such header lines properly, but it is strongly suggested that FidoNet software not CREATE messages with tabs or "continuation header lines" for transmission through the network. c. All lines in news messages, including header lines, shall be terminated with a LINE FEED (ASCII 10 decimal) ONLY. Under NO circumstances shall a CARRIAGE RETURN (ASCII 13 decimal) appear in news messages transmitted through FidoNet (if a Carriage Return is found in an in-transit message it MAY be changed to a Line Feed, this being the sole exception to the rule about not changing the body of a message, but the expectation is that no Carriage Returns will appear in a news message). Also, spaces appearing at the end of lines (just prior to the Line Feed character) are strongly discouraged since they convey no useful information. Finally, there should be only a single line feed at the end of each message (blank lines following the last line of a message are not allowed, again because they convey no useful information). Please note that the use of the Line Feed as a line terminator is fairly standard throughout UseNet, and when a news message is converted to a FidoNet format message it is a simple matter to replace Line Feeds with Carriage Returns so that the message will display properly. d. When constructing or adding to "Path" lines, RFC-1036 (section 2.1.6) states that "The names may be separated by any punctuation character or characters (except '.' which is considered part of the hostname)." However, in actual practice, only the "!" (exclamation point or "bang" character) is commonly used to separate names. Therefore, the "!" character will be considered the "standard" separator for system names in Path lines in messages generated in FidoNet. Also, RFC-1036 states that "Normally, the rightmost name will be the name of the originating system. However, it is also permissible to include an extra entry on the right, which is the name of the sender. This is for upward compatibility with older systems." In actual practice, it appears that most Path lines originating in UseNet have a user name as the rightmost entry. Therefore, when a Path line is created for a message originating in FidoNet, it is suggested that the following format be used (assuming a message entered by user John Smith at node 1:123/456): Path: f456.n123.z1.fidonet.org!john.smith When a user name is placed in the path, all spaces in the user name must be replaced with periods, and all uppercase characters in the name should be converted to lowercase. It is permissible to use an alias in place of a user's real name if the originating system runs software that will recognize that alias in incoming netmail messages, and remap such messages to the proper user if necessary. Also, note the restrictions on prohibited characters in the user name as specified in RFC-1036 section 2.1.1. Although section 2.1.1. deals with the "From" line, common sense would indicate that these same restrictions on prohibited characters should apply if the user name is placed in the Path line (with the obvious exception of the use of the period to replace spaces in the user name, which is required). e. Header lines defined as "optional" may be more or less optional depending on the keyword. For example, the "Reply-To" and "Followup-To" lines should be automatically honored, if at all possible, when reply messages are created, and the "References" line, even though listed as an "optional" line, is "required for all follow-up messages" (replies). On the other hand, lines such as "Control" and "Distribution" may have little meaning to FidoNet nodes (in particular, "Distribution" is meant to control distribution of a message along hierarchial lines, but since FidoNet topology has little relation to UseNet hierarchies, it is probably best to just ignore "Distribution" lines on in-transit messages). Additional specifications for messages, including required and optional header lines, are detailed in section 2 of RFC-1036. When a newsgroup is moderated, it is the responsibility of the sysop of each participating BBS to prevent users from entering messages in that area (unless the message exporting software is capable of sending any locally-entered messages to the conference moderator via MAIL). However, if a software newsgroup processor is written that both imports (tosses) messages to a FidoNet-format message base, and exports locally entered messages, and if the software does not have a way to send replies to the moderator via mail, then some mechanism must be provided to prevent the export of messages from a moderated area, so that in the unlikely event that there is no easy way to prevent users from posting messages in the moderated area, such messages will still not be sent out. Since this standard does not deal with the transport of UseNet MAIL within FidoNet, the method for transmission of replies in moderated newsgroups is undefined by this document. However, software authors are encouraged to provide some mechanism for private mail replies to newsgroup messages, in both moderated and unmoderated areas. Note that if a moderated newsgroup is carried on a system, it is the responsibility of the sysop to provide mail access to users so that replies can be (manually) sent to the conference moderator, especially if replies in the newsgroup area cannot be automatically routed to the conference moderator. One point that needs to be emphasized is there is NO message length limit on UseNet messages. If a FidoNet node passes newsgroup messages to, or on behalf of other FidoNet nodes, it is NOT permissible to discard or truncate messages that exceed a preset length limit. Note that in a batched newsgroup file, each message is preceded by a header of the form "#! rnews <length in bytes>". Since the message text length is never changed in processing, it is possible to determine the length of a message after processing by reading in all the header lines, calculating the combined length of the header lines prior to making changes in the header (e.g. the Path line), then calculating the combined length of the header lines after making changes. The difference between the original and the new length of the header lines can then be applied to the value given in the "#! rnews" line to determine the new message length, when is then used in the "#! rnews" header of the modified message. Also, the number of bytes given in the "#! rnews" line, MINUS the length of the message header lines, is the length of the body of the message. Once this length is known, the body of the message can be copied from the input file to the output file(s) in "chunks" small enough to fit in memory, until the end of the message is reached. The following comments are implementation suggestions applicable to current FidoNet-compatible BBS systems, though not necessarily to software that may be written in the future: It should be noted that when a BBS user enters a reply message, most FidoNet BBS software will "link" the reply message to the original by placing the message number of the original message in the message header (this is almost always the case if messages are stored in the "*.msg" format, in which case the number of the message being replied to is found at bytes 185-186 in the message header). If the appropriate header lines have been stored in the text of the original message, it is possible to construct a reply message that meets all RFC-1036 specifications. For example, a "References" line can be constructed from the "Message-ID" line (and the "References" line, if any) of the original message. Similarly, if the original message contains a "Followup-To:" line, the reply can be posted to the newsgroup(s) specified in that line. This may not work as expected if a message renumbering program or similar program messes with the message base before reply message is exported, so it is highly recommended that locally-entered newsgroup messages be exported as soon as practicable after they are entered. Since the user of a BBS may reply to a message entered by another user of the same BBS, it is recommended that when a message is exported, any UseNet format header lines created for the exported message also be written back to the original message if possible. This will permit reply linking to remain intact even if two or more users of the same BBS participate in the same message thread. If a message is received that specifies more than one newsgroup in the "Newsgroups" header line, and corresponding message areas are available on the local system, one copy of the message should be placed in each such area. For example, if the message is posted to four different newsgroups, and two of those groups are carried on the local BBS, then a copy of the message should be placed in the message base for each of those groups. If users of a BBS are allowed to post a message to multiple newsgroups, then any message thus posted should be copied to the message bases of any of the other areas that are also carried on that system (and that the message was posted to) at the time the message is exported. Corrections and Additions to this document: Proposed corrections and additions to this document should be submitted to Jack Decker at 1:154/8, or jack.decker@f8.n154.z1.fidonet.org </PRE> <HR> <PRE> Network Working Group M. Horton Request for Comments: 1036 AT&T Bell Laboratories Obsoletes: RFC-850 R. Adams Center for Seismic Studies December 1987 Standard for Interchange of USENET Messages STATUS OF THIS MEMO This document defines the standard format for the interchange of network News messages among USENET hosts. It updates and replaces RFC-850, reflecting version B2.11 of the News program. This memo is disributed as an RFC to make this information easily accessible to the Internet community. It does not specify an Internet standard. Distribution of this memo is unlimited. 1. Introduction This document defines the standard format for the interchange of network News messages among USENET hosts. It describes the format for messages themselves and gives partial standards for transmission of news. The news transmission is not entirely in order to give a good deal of flexibility to the hosts to choose transmission hardware and software, to batch news, and so on. There are five sections to this document. Section two defines the format. Section three defines the valid control messages. Section four specifies some valid transmission methods. Section five describes the overall news propagation algorithm. 2. Message Format The primary consideration in choosing a message format is that it fit in with existing tools as well as possible. Existing tools include implementations of both mail and news. (The notesfiles system from the University of Illinois is considered a news implementation.) A standard format for mail messages has existed for many years on the Internet, and this format meets most of the needs of USENET. Since the Internet format is extensible, extensions to meet the additional needs of USENET are easily made within the Internet standard. Therefore, the rule is adopted that all USENET news messages must be formatted as valid Internet mail messages, according to the Internet standard RFC-822. The USENET News standard is more restrictive than the Internet standard, Horton & Adams [Page 1] RFC 1036 Standard for USENET Messages December 1987 placing additional requirements on each message and forbidding use of certain Internet features. However, it should always be possible to use a tool expecting an Internet message to process a news message. In any situation where this standard conflicts with the Internet standard, RFC-822 should be considered correct and this standard in error. Here is an example USENET message to illustrate the fields. From: jerry@eagle.ATT.COM (Jerry Schwarz) Path: cbosgd!mhuxj!mhuxt!eagle!jerry Newsgroups: news.announce Subject: Usenet Etiquette -- Please Read Message-ID: <642@eagle.ATT.COM> Date: Fri, 19 Nov 82 16:14:55 GMT Followup-To: news.misc Expires: Sat, 1 Jan 83 00:00:00 -0500 Organization: AT&T Bell Laboratories, Murray Hill The body of the message comes here, after a blank line. Here is an example of a message in the old format (before the existence of this standard). It is recommended that implementations also accept messages in this format to ease upward conversion. From: cbosgd!mhuxj!mhuxt!eagle!jerry (Jerry Schwarz) Newsgroups: news.misc Title: Usenet Etiquette -- Please Read Article-I.D.: eagle.642 Posted: Fri Nov 19 16:14:55 1982 Received: Fri Nov 19 16:59:30 1982 Expires: Mon Jan 1 00:00:00 1990 The body of the message comes here, after a blank line. Some news systems transmit news in the A format, which looks like this: Aeagle.642 news.misc cbosgd!mhuxj!mhuxt!eagle!jerry Fri Nov 19 16:14:55 1982 Usenet Etiquette - Please Read The body of the message comes here, with no blank line. A standard USENET message consists of several header lines, followed by a blank line, followed by the body of the message. Each header Horton & Adams [Page 2] RFC 1036 Standard for USENET Messages December 1987 line consist of a keyword, a colon, a blank, and some additional information. This is a subset of the Internet standard, simplified to allow simpler software to handle it. The "From" line may optionally include a full name, in the format above, or use the Internet angle bracket syntax. To keep the implementations simple, other formats (for example, with part of the machine address after the close parenthesis) are not allowed. The Internet convention of continuation header lines (beginning with a blank or tab) is allowed. Certain headers are required, and certain other headers are optional. Any unrecognized headers are allowed, and will be passed through unchanged. The required header lines are "From", "Date", "Newsgroups", "Subject", "Message-ID", and "Path". The optional header lines are "Followup-To", "Expires", "Reply-To", "Sender", "References", "Control", "Distribution", "Keywords", "Summary", "Approved", "Lines", "Xref", and "Organization". Each of these header lines will be described below. 2.1. Required Header lines 2.1.1. From The "From" line contains the electronic mailing address of the person who sent the message, in the Internet syntax. It may optionally also contain the full name of the person, in parentheses, after the electronic address. The electronic address is the same as the entity responsible for originating the message, unless the "Sender" header is present, in which case the "From" header might not be verified. Note that in all host and domain names, upper and lower case are considered the same, thus "mark@cbosgd.ATT.COM", "mark@cbosgd.att.com", and "mark@CBosgD.ATt.COm" are all equivalent. User names may or may not be case sensitive, for example, "Billy@cbosgd.ATT.COM" might be different from "BillY@cbosgd.ATT.COM". Programs should avoid changing the case of electronic addresses when forwarding news or mail. RFC-822 specifies that all text in parentheses is to be interpreted as a comment. It is common in Internet mail to place the full name of the user in a comment at the end of the "From" line. This standard specifies a more rigid syntax. The full name is not considered a comment, but an optional part of the header line. Either the full name is omitted, or it appears in parentheses after the electronic address of the person posting the message, or it appears before an electronic address which is enclosed in angle brackets. Thus, the three permissible forms are: Horton & Adams [Page 3] RFC 1036 Standard for USENET Messages December 1987 From: mark@cbosgd.ATT.COM From: mark@cbosgd.ATT.COM (Mark Horton) From: Mark Horton <mark@cbosgd.ATT.COM> Full names may contain any printing ASCII characters from space through tilde, except that they may not contain "(" (left parenthesis), ")" (right parenthesis), "<" (left angle bracket), or ">" (right angle bracket). Additional restrictions may be placed on full names by the mail standard, in particular, the characters "," (comma), ":" (colon), "@" (at), "!" (bang), "/" (slash), "=" (equal), and ";" (semicolon) are inadvisable in full names. 2.1.2. Date The "Date" line (formerly "Posted") is the date that the message was originally posted to the network. Its format must be acceptable both in RFC-822 and to the getdate(3) routine that is provided with the Usenet software. This date remains unchanged as the message is propagated throughout the network. One format that is acceptable to both is: Wdy, DD Mon YY HH:MM:SS TIMEZONE Several examples of valid dates appear in the sample message above. Note in particular that ctime(3) format: Wdy Mon DD HH:MM:SS YYYY is not acceptable because it is not a valid RFC-822 date. However, since older software still generates this format, news implementations are encouraged to accept this format and translate it into an acceptable format. There is no hope of having a complete list of timezones. Universal Time (GMT), the North American timezones (PST, PDT, MST, MDT, CST, CDT, EST, EDT) and the +/-hhmm offset specifed in RFC-822 should be supported. It is recommended that times in message headers be transmitted in GMT and displayed in the local time zone. 2.1.3. Newsgroups The "Newsgroups" line specifies the newsgroup or newsgroups in which the message belongs. Multiple newsgroups may be specified, separated by a comma. Newsgroups specified must all be the names of existing newsgroups, as no new newsgroups will be created by simply posting to them. Horton & Adams [Page 4] RFC 1036 Standard for USENET Messages December 1987 Wildcards (e.g., the word "all") are never allowed in a "News- groups" line. For example, a newsgroup comp.all is illegal, although a newsgroup rec.sport.football is permitted. If a message is received with a "Newsgroups" line listing some valid newsgroups and some invalid newsgroups, a host should not remove invalid newsgroups from the list. Instead, the invalid newsgroups should be ignored. For example, suppose host A subscribes to the classes btl.all and comp.all, and exchanges news messages with host B, which subscribes to comp.all but not btl.all. Suppose A receives a message with Newsgroups: comp.unix,btl.general. This message is passed on to B because B receives comp.unix, but B does not receive btl.general. A must leave the "Newsgroups" line unchanged. If it were to remove btl.general, the edited header could eventually re-enter the btl.all class, resulting in a message that is not shown to users subscribing to btl.general. Also, follow-ups from outside btl.all would not be shown to such users. 2.1.4. Subject The "Subject" line (formerly "Title") tells what the message is about. It should be suggestive enough of the contents of the message to enable a reader to make a decision whether to read the message based on the subject alone. If the message is submitted in response to another message (e.g., is a follow-up) the default subject should begin with the four characters "Re:", and the "References" line is required. For follow-ups, the use of the "Summary" line is encouraged. 2.1.5. Message-ID The "Message-ID" line gives the message a unique identifier. The Message-ID may not be reused during the lifetime of any previous message with the same Message-ID. (It is recommended that no Message-ID be reused for at least two years.) Message-ID's have the syntax: <string not containing blank or ">"> In order to conform to RFC-822, the Message-ID must have the format: <unique@full_domain_name> where full_domain_name is the full name of the host at which the message entered the network, including a domain that host is in, and unique is any string of printing ASCII characters, not including "<" (left angle bracket), ">" (right angle bracket), or "@" (at sign). Horton & Adams [Page 5] RFC 1036 Standard for USENET Messages December 1987 For example, the unique part could be an integer representing a sequence number for messages submitted to the network, or a short string derived from the date and time the message was created. For example, a valid Message-ID for a message submitted from host ucbvax in domain "Berkeley.EDU" would be "<4123@ucbvax.Berkeley.EDU>". Programmers are urged not to make assumptions about the content of Message-ID fields from other hosts, but to treat them as unknown character strings. It is not safe, for example, to assume that a Message-ID will be under 14 characters, that it is unique in the first 14 characters, nor that is does not contain a "/". The angle brackets are considered part of the Message-ID. Thus, in references to the Message-ID, such as the ihave/sendme and cancel control messages, the angle brackets are included. White space characters (e.g., blank and tab) are not allowed in a Message-ID. Slashes ("/") are strongly discouraged. All characters between the angle brackets must be printing ASCII characters. 2.1.6. Path This line shows the path the message took to reach the current system. When a system forwards the message, it should add its own name to the list of systems in the "Path" line. The names may be separated by any punctuation character or characters (except "." which is considered part of the hostname). Thus, the following are valid entries: cbosgd!mhuxj!mhuxt cbosgd, mhuxj, mhuxt @cbosgd.ATT.COM,@mhuxj.ATT.COM,@mhuxt.ATT.COM teklabs, zehntel, sri-unix@cca!decvax (The latter path indicates a message that passed through decvax, cca, sri-unix, zehntel, and teklabs, in that order.) Additional names should be added from the left. For example, the most recently added name in the fourth example was teklabs. Letters, digits, periods and hyphens are considered part of host names; other punctuation, including blanks, are considered separators. Normally, the rightmost name will be the name of the originating system. However, it is also permissible to include an extra entry on the right, which is the name of the sender. This is for upward compatibility with older systems. The "Path" line is not used for replies, and should not be taken as a mailing address. It is intended to show the route the message traveled to reach the local host. There are several uses for this information. One is to monitor USENET routing for performance Horton & Adams [Page 6] RFC 1036 Standard for USENET Messages December 1987 reasons. Another is to establish a path to reach new hosts. Perhaps the most important use is to cut down on redundant USENET traffic by failing to forward a message to a host that is known to have already received it. In particular, when host A sends a message to host B, the "Path" line includes A, so that host B will not immediately send the message back to host A. The name each host uses to identify itself should be the same as the name by which its neighbors know it, in order to make this optimization possible. A host adds its own name to the front of a path when it receives a message from another host. Thus, if a message with path "A!X!Y!Z" is passed from host A to host B, B will add its own name to the path when it receives the message from A, e.g., "B!A!X!Y!Z". If B then passes the message on to C, the message sent to C will contain the path "B!A!X!Y!Z", and when C receives it, C will change it to "C!B!A!X!Y!Z". Special upward compatibility note: Since the "From", "Sender", and "Reply-To" lines are in Internet format, and since many USENET hosts do not yet have mailers capable of understanding Internet format, it would break the reply capability to completely sever the connection between the "Path" header and the reply function. It is recognized that the path is not always a valid reply string in older implementations, and no requirement to fix this problem is placed on implementations. However, the existing convention of placing the host name and an "!" at the front of the path, and of starting the path with the host name, an "!", and the user name, should be maintained when possible. 2.2. Optional Headers 2.2.1. Reply-To This line has the same format as "From". If present, mailed replies to the author should be sent to the name given here. Otherwise, replies are mailed to the name on the "From" line. (This does not prevent additional copies from being sent to recipients named by the replier, or on "To" or "Cc" lines.) The full name may be optionally given, in parentheses, as in the "From" line. 2.2.2. Sender This field is present only if the submitter manually enters a "From" line. It is intended to record the entity responsible for submitting the message to the network. It should be verified by the software at the submitting host. Horton & Adams [Page 7] RFC 1036 Standard for USENET Messages December 1987 For example, if John Smith is visiting CCA and wishes to post a message to the network, using friend Sarah Jones' account, the message might read: From: smith@ucbvax.Berkeley.EDU (John Smith) Sender: jones@cca.COM (Sarah Jones) If a gateway program enters a mail message into the network at host unix.SRI.COM, the lines might read: From: John.Doe@A.CS.CMU.EDU Sender: network@unix.SRI.COM The primary purpose of this field is to be able to track down messages to determine how they were entered into the network. The full name may be optionally given, in parentheses, as in the "From" line. 2.2.3. Followup-To This line has the same format as "Newsgroups". If present, follow- up messages are to be posted to the newsgroup or newsgroups listed here. If this line is not present, follow-ups are posted to the newsgroup or newsgroups listed in the "Newsgroups" line. If the keyword poster is present, follow-up messages are not permitted. The message should be mailed to the submitter of the message via mail. 2.2.4. Expires This line, if present, is in a legal USENET date format. It specifies a suggested expiration date for the message. If not present, the local default expiration date is used. This field is intended to be used to clean up messages with a limited usefulness, or to keep important messages around for longer than usual. For example, a message announcing an upcoming seminar could have an expiration date the day after the seminar, since the message is not useful after the seminar is over. Since local hosts have local policies for expiration of news (depending on available disk space, for instance), users are discouraged from providing expiration dates for messages unless there is a natural expiration date associated with the topic. System software should almost never provide a default "Expires" line. Leave it out and allow local policies to be used unless there is a good reason not to. Horton & Adams [Page 8] RFC 1036 Standard for USENET Messages December 1987 2.2.5. References This field lists the Message-ID's of any messages prompting the submission of this message. It is required for all follow-up messages, and forbidden when a new subject is raised. Implementations should provide a follow-up command, which allows a user to post a follow-up message. This command should generate a "Subject" line which is the same as the original message, except that if the original subject does not begin with "Re:" or "re:", the four characters "Re:" are inserted before the subject. If there is no "References" line on the original header, the "References" line should contain the Message-ID of the original message (including the angle brackets). If the original message does have a "References" line, the follow-up message should have a "References" line containing the text of the original "References" line, a blank, and the Message-ID of the original message. The purpose of the "References" header is to allow messages to be grouped into conversations by the user interface program. This allows conversations within a newsgroup to be kept together, and potentially users might shut off entire conversations without unsubscribing to a newsgroup. User interfaces need not make use of this header, but all automatically generated follow-ups should generate the "References" line for the benefit of systems that do use it, and manually generated follow-ups (e.g., typed in well after the original message has been printed by the machine) should be encouraged to include them as well. It is permissible to not include the entire previous "References" line if it is too long. An attempt should be made to include a reasonable number of backwards references. 2.2.6. Control If a message contains a "Control" line, the message is a control message. Control messages are used for communication among USENET host machines, not to be read by users. Control messages are distributed by the same newsgroup mechanism as ordinary messages. The body of the "Control" header line is the message to the host. For upward compatibility, messages that match the newsgroup pattern "all.all.ctl" should also be interpreted as control messages. If no "Control" header is present on such messages, the subject is used as the control message. However, messages on newsgroups matching this pattern do not conform to this standard. Horton & Adams [Page 9] RFC 1036 Standard for USENET Messages December 1987 Also for upward compatibility, if the first 4 characters of the "Subject:" line are "cmsg", the rest of the "Subject:" line should be interpreted as a control message. 2.2.7. Distribution This line is used to alter the distribution scope of the message. It is a comma separated list similar to the "Newsgroups" line. User subscriptions are still controlled by "Newsgroups", but the message is sent to all systems subscribing to the newsgroups on the "Distribution" line in addition to the "Newsgroups" line. For the message to be transmitted, the receiving site must normally receive one of the specified newsgroups AND must receive one of the specified distributions. Thus, a message concerning a car for sale in New Jersey might have headers including: Newsgroups: rec.auto,misc.forsale Distribution: nj,ny so that it would only go to persons subscribing to rec.auto or misc. for sale within New Jersey or New York. The intent of this header is to restrict the distribution of a newsgroup further, not to increase it. A local newsgroup, such as nj.crazy-eddie, will probably not be propagated by hosts outside New Jersey that do not show such a newsgroup as valid. A follow-up message should default to the same "Distribution" line as the original message, but the user can change it to a more limited one, or escalate the distribution if it was originally restricted and a more widely distributed reply is appropriate. 2.2.8. Organization The text of this line is a short phrase describing the organization to which the sender belongs, or to which the machine belongs. The intent of this line is to help identify the person posting the message, since host names are often cryptic enough to make it hard to recognize the organization by the electronic address. 2.2.9. Keywords A few well-selected keywords identifying the message should be on this line. This is used as an aid in determining if this message is interesting to the reader. 2.2.10. Summary This line should contain a brief summary of the message. It is usually used as part of a follow-up to another message. Again, it Horton & Adams [Page 10] RFC 1036 Standard for USENET Messages December 1987 is very useful to the reader in determining whether to read the message. 2.2.11. Approved This line is required for any message posted to a moderated newsgroup. It should be added by the moderator and consist of his mail address. It is also required with certain control messages. 2.2.12. Lines This contains a count of the number of lines in the body of the message. 2.2.13. Xref This line contains the name of the host (with domains omitted) and a white space separated list of colon-separated pairs of newsgroup names and message numbers. These are the newsgroups listed in the "Newsgroups" line and the corresponding message numbers from the spool directory. This is only of value to the local system, so it should not be transmitted. For example, in: Path: seismo!lll-crg!lll-lcc!pyramid!decwrl!reid From: reid@decwrl.DEC.COM (Brian Reid) Newsgroups: news.lists,news.groups Subject: USENET READERSHIP SUMMARY REPORT FOR SEP 86 Message-ID: <5658@decwrl.DEC.COM> Date: 1 Oct 86 11:26:15 GMT Organization: DEC Western Research Laboratory Lines: 441 Approved: reid@decwrl.UUCP Xref: seismo news.lists:461 news.groups:6378 the "Xref" line shows that the message is message number 461 in the newsgroup news.lists, and message number 6378 in the newsgroup news.groups, on host seismo. This information may be used by certain user interfaces. 3. Control Messages This section lists the control messages currently defined. The body of the "Control" header line is the control message. Messages are a sequence of zero or more words, separated by white space (blanks or tabs). The first word is the name of the control message, remaining words are parameters to the message. The remainder of the header Horton & Adams [Page 11] RFC 1036 Standard for USENET Messages December 1987 and the body of the message are also potential parameters; for example, the "From" line might suggest an address to which a response is to be mailed. Implementors and administrators may choose to allow control messages to be carried out automatically, or to queue them for annual processing. However, manually processed messages should be dealt with promptly. Failed control messages should NOT be mailed to the originator of the message, but to the local "usenet" account. 3.1. Cancel cancel <Message-ID> If a message with the given Message-ID is present on the local system, the message is cancelled. This mechanism allows a user to cancel a message after the message has been distributed over the network. If the system is unable to cancel the message as requested, it should not forward the cancellation request to its neighbor systems. Only the author of the message or the local news administrator is allowed to send this message. The verified sender of a message is the "Sender" line, or if no "Sender" line is present, the "From" line. The verified sender of the cancel message must be the same as either the "Sender" or "From" field of the original message. A verified sender in the cancel message is allowed to match an unverified "From" in the original message. 3.2. Ihave/Sendme ihave <Message-ID list> [<remotesys>] sendme <Message-ID list> [<remotesys>] This message is part of the ihave/sendme protocol, which allows one host (say A) to tell another host (B) that a particular message has been received on A. Suppose that host A receives message "<1234@ucbvax.Berkeley.edu>", and wishes to transmit the message to host B. A sends the control message "ihave <1234@ucbvax.Berkeley.edu> A" to host B (by posting it to newsgroup to.B). B responds with the control message "sendme <1234@ucbvax.Berkeley.edu> B" (on newsgroup to.A), if it has not already received the message. Upon receiving Horton & Adams [Page 12] RFC 1036 Standard for USENET Messages December 1987 the sendme message, A sends the message to B. This protocol can be used to cut down on redundant traffic between hosts. It is optional and should be used only if the particular situation makes it worthwhile. Frequently, the outcome is that, since most original messages are short, and since there is a high overhead to start sending a new message with UUCP, it costs as much to send the ihave as it would cost to send the message itself. One possible solution to this overhead problem is to batch requests. Several Message-ID's may be announced or requested in one message. If no Message-ID's are listed in the control message, the body of the message should be scanned for Message-ID's, one per line. 3.3. Newgroup newgroup <groupname> [moderated] This control message creates a new newsgroup with the given name. Since no messages may be posted or forwarded until a newsgroup is created, this message is required before a newsgroup can be used. The body of the message is expected to be a short paragraph describing the intended use of the newsgroup. If the second argument is present and it is the keyword moderated, the group should be created moderated instead of the default of unmoderated. The newgroup message should be ignored unless there is an "Approved" line in the same message header. 3.4. Rmgroup rmgroup <groupname> This message removes a newsgroup with the given name. Since the newsgroup is removed from every host on the network, this command should be used carefully by a responsible administrator. The rmgroup message should be ignored unless there is an "Approved:" line in the same message header. Horton & Adams [Page 13] RFC 1036 Standard for USENET Messages December 1987 3.5. Sendsys sendsys (no arguments) The sys file, listing all neighbors and the newsgroups to be sent to each neighbor, will be mailed to the author of the control message ("Reply-To", if present, otherwise "From"). This information is considered public information, and it is a requirement of membership in USENET that this information be provided on request, either automatically in response to this control message, or manually, by mailing the requested information to the author of the message. This information is used to keep the map of USENET up to date, and to determine where netnews is sent. The format of the file mailed back to the author should be the same as that of the sys file. This format has one line per neighboring host (plus one line for the local host), containing four colon separated fields. The first field has the host name of the neighbor, the second field has a newsgroup pattern describing the newsgroups sent to the neighbor. The third and fourth fields are not defined by this standard. The sys file is not the same as the UUCP L.sys file. A sample response is: From: cbosgd!mark (Mark Horton) Date: Sun, 27 Mar 83 20:39:37 -0500 Subject: response to your sendsys request To: mark@cbosgd.ATT.COM Responding-System: cbosgd.ATT.COM cbosgd:osg,cb,btl,bell,world,comp,sci,rec,talk,misc,news,soc,to, test ucbvax:world,comp,to.ucbvax:L: cbosg:world,comp,bell,btl,cb,osg,to.cbosg:F:/usr/spool/outnews /cbosg cbosgb:osg,to.cbosgb:F:/usr/spool/outnews/cbosgb sescent:world,comp,bell,btl,cb,to.sescent:F:/usr/spool/outnews /sescent npois:world,comp,bell,btl,ug,to.npois:F:/usr/spool/outnews/npois mhuxi:world,comp,bell,btl,ug,to.mhuxi:F:/usr/spool/outnews/mhuxi 3.6. Version version (no arguments) The name and version of the software running on the local system is to be mailed back to the author of the message ("Reply-to" if present, otherwise "From"). 3.7. Checkgroups Horton & Adams [Page 14] RFC 1036 Standard for USENET Messages December 1987 The message body is a list of "official" newsgroups and their description, one group per line. They are compared against the list of active newsgroups on the current host. The names of any obsolete or new newsgroups are mailed to the user "usenet" and descriptions of the new newsgroups are added to the help file used when posting news. 4. Transmission Methods USENET is not a physical network, but rather a logical network resting on top of several existing physical networks. These networks include, but are not limited to, UUCP, the Internet, an Ethernet, the BLICN network, an NSC Hyperchannel, and a BERKNET. What is important is that two neighboring systems on USENET have some method to get a new message, in the format listed here, from one system to the other, and once on the receiving system, processed by the netnews software on that system. (On UNIX systems, this usually means the rnews program being run with the message on the standard input. <1>) It is not a requirement that USENET hosts have mail systems capable of understanding the Internet mail syntax, but it is strongly recommended. Since "From", "Reply-To", and "Sender" lines use the Internet syntax, replies will be difficult or impossible without an Internet mailer. A host without an Internet mailer can attempt to use the "Path" header line for replies, but this field is not guaranteed to be a working path for replies. In any event, any host generating or forwarding news messages must have an Internet address that allows them to receive mail from hosts with Internet mailers, and they must include their Internet address on their From line. 4.1. Remote Execution Some networks permit direct remote command execution. On these networks, news may be forwarded by spooling the rnews command with the message on the standard input. For example, if the remote system is called remote, news would be sent over a UUCP link with the command: uux - remote!rnews and on a Berknet: net -mremote rnews Horton & Adams [Page 15] RFC 1036 Standard for USENET Messages December 1987 It is important that the message be sent via a reliable mechanism, normally involving the possibility of spooling, rather than direct real-time remote execution. This is because, if the remote system is down, a direct execution command will fail, and the message will never be delivered. If the message is spooled, it will eventually be delivered when both systems are up. 4.2. Transfer by Mail On some systems, direct remote spooled execution is not possible. However, most systems support electronic mail, and a news message can be sent as mail. One approach is to send a mail message which is identical to the news message: the mail headers are the news headers, and the mail body is the news body. By convention, this mail is sent to the user newsmail on the remote machine. One problem with this method is that it may not be possible to convince the mail system that the "From" line of the message is valid, since the mail message was generated by a program on a system different from the source of the news message. Another problem is that error messages caused by the mail transmission would be sent to the originator of the news message, who has no control over news transmission between two cooperating hosts and does not know whom to contact. Transmission error messages should be directed to a responsible contact person on the sending machine. A solution to this problem is to encapsulate the news message into a mail message, such that the entire message (headers and body) are part of the body of the mail message. The convention here is that such mail is sent to user rnews on the remote system. A mail message body is generated by prepending the letter N to each line of the news message, and then attaching whatever mail headers are convenient to generate. The N's are attached to prevent any special lines in the news message from interfering with mail transmission, and to prevent any extra lines inserted by the mailer (headers, blank lines, etc.) from becoming part of the news message. A program on the receiving machine receives mail to rnews, extracting the message itself and invoking the rnews program. An example in this format might look like this: Horton & Adams [Page 16] RFC 1036 Standard for USENET Messages December 1987 Date: Mon, 3 Jan 83 08:33:47 MST From: news@cbosgd.ATT.COM Subject: network news message To: rnews@npois.ATT.COM NPath: cbosgd!mhuxj!harpo!utah-cs!sask!derek NFrom: derek@sask.UUCP (Derek Andrew) NNewsgroups: misc.test NSubject: necessary test NMessage-ID: <176@sask.UUCP> NDate: Mon, 3 Jan 83 00:59:15 MST N NThis really is a test. If anyone out there more than 6 Nhops away would kindly confirm this note I would Nappreciate it. We suspect that our news postings Nare not getting out into the world. N Using mail solves the spooling problem, since mail must always be spooled if the destination host is down. However, it adds more overhead to the transmission process (to encapsulate and extract the message) and makes it harder for software to give different priorities to news and mail. 4.3. Batching Since news messages are usually short, and since a large number of messages are often sent between two hosts in a day, it may make sense to batch news messages. Several messages can be combined into one large message, using conventions agreed upon in advance by the two hosts. One such batching scheme is described here; its use is highly recommended. News messages are combined into a script, separated by a header of the form: #! rnews 1234 where 1234 is the length of the message in bytes. Each such line is followed by a message containing the given number of bytes. (The newline at the end of each line of the message is counted as one byte, for purposes of this count, even if it is stored as <CARRIAGE RETURN><LINE FEED>.) For example, a batch of message might look like this: Horton & Adams [Page 17] RFC 1036 Standard for USENET Messages December 1987 #! rnews 239 From: jerry@eagle.ATT.COM (Jerry Schwarz) Path: cbosgd!mhuxj!mhuxt!eagle!jerry Newsgroups: news.announce Subject: Usenet Etiquette -- Please Read Message-ID: <642@eagle.ATT.COM> Date: Fri, 19 Nov 82 16:14:55 EST Approved: mark@cbosgd.ATT.COM Here is an important message about USENET Etiquette. #! rnews 234 From: jerry@eagle.ATT.COM (Jerry Schwarz) Path: cbosgd!mhuxj!mhuxt!eagle!jerry Newsgroups: news.announce Subject: Notes on Etiquette message Message-ID: <643@eagle.ATT.COM> Date: Fri, 19 Nov 82 17:24:12 EST Approved: mark@cbosgd.ATT.COM There was something I forgot to mention in the last message. Batched news is recognized because the first character in the message is #. The message is then passed to the unbatcher for interpretation. The second argument (in this example rnews) determines which batching scheme is being used. Cooperating hosts may use whatever scheme is appropriate for them. 5. The News Propagation Algorithm This section describes the overall scheme of USENET and the algorithm followed by hosts in propagating news to the entire logical network. Since all hosts are affected by incorrectly formatted messages and by propagation errors, it is important for the method to be standardized. USENET is a directed graph. Each node in the graph is a host computer, and each arc in the graph is a transmission path from one host to another host. Each arc is labeled with a newsgroup pattern, specifying which newsgroup classes are forwarded along that link. Most arcs are bidirectional, that is, if host A sends a class of newsgroups to host B, then host B usually sends the same class of newsgroups to host A. This bidirectionality is not, however, required. USENET is made up of many subnetworks. Each subnet has a name, such Horton & Adams [Page 18] RFC 1036 Standard for USENET Messages December 1987 as comp or btl. Each subnet is a connected graph, that is, a path exists from every node to every other node in the subnet. In addition, the entire graph is (theoretically) connected. (In practice, some political considerations have caused some hosts to be unable to post messages reaching the rest of the network.) A message is posted on one machine to a list of newsgroups. That machine accepts it locally, then forwards it to all its neighbors that are interested in at least one of the newsgroups of the message. (Site A deems host B to be "interested" in a newsgroup if the newsgroup matches the pattern on the arc from A to B. This pattern is stored in a file on the A machine.) The hosts receiving the incoming message examine it to make sure they really want the message, accept it locally, and then in turn forward the message to all their interested neighbors. This process continues until the entire network has seen the message. An important part of the algorithm is the prevention of loops. The above process would cause a message to loop along a cycle forever. In particular, when host A sends a message to host B, host B will send it back to host A, which will send it to host B, and so on. One solution to this is the history mechanism. Each host keeps track of all messages it has seen (by their Message-ID) and whenever a message comes in that it has already seen, the incoming message is discarded immediately. This solution is sufficient to prevent loops, but additional optimizations can be made to avoid sending messages to hosts that will simply throw them away. One optimization is that a message should never be sent to a machine listed in the "Path" line of the header. When a machine name is in the "Path" line, the message is known to have passed through the machine. Another optimization is that, if the message originated on host A, then host A has already seen the message. Thus, if a message is posted to newsgroup misc.misc, it will match the pattern misc.all (where all is a metasymbol that matches any string), and will be forwarded to all hosts that subscribe to misc.all (as determined by what their neighbors send them). These hosts make up the misc subnetwork. A message posted to btl.general will reach all hosts receiving btl.all, but will not reach hosts that do not get btl.all. In effect, the messages reaches the btl subnetwork. A messages posted to newsgroups misc.misc,btl.general will reach all hosts subscribing to either of the two classes. Notes <1> UNIX is a registered trademark of AT&T. Horton & Adams [Page 19] </PRE> <A HREF="index.htm"><IMG SRC="../images/b_arrow.gif" ALT="Back" Border="0">Go Back</A> </BODY> </HTML>