Docs: erpmanip

Introduction

Erpmanip performs arbitrary linear operations on event-related potentials (ERPs) in standard ERPSS data files. It uses an input command file, termed a erpmanip command file (ecf), to specify various operations on the data in an input file, the results of which are placed in an output file. Any linear combination of channels and/or bins can be generated using erpmanip (E1), including scaling by the number of sums.

Erpmanip can be used to calculate the following:

Erpmanip is easy to use, and the command files are standard text files; hence they can be easily generated using any editor.

Using Erpmanip

The actual use of erpmanip is simple; most of the work lies in the generation of the command file. Erpmanip is invoked thusly:

where commandfile is the name of a file containing data manipulation commands (see Command Files, below), inputfile is the name of the ERPSS data file containing the data to be processed, and outputfile is the name of a new file (to be created) which will contain the results of the operations. If commandfile is an isolated minus sign (-), the commands are read from the standard input. Erpmanip will not overwrite an already existing outputfile, thus providing some protection against inadvertent loss of data.

Currently, the following options are implemented:

Basic Operation of Erpmanip

Since an understanding of the internal workings of erpmanip can assist in writing correct command files as well as understanding and diagnosing problems, here follows a description of the basic operation of erpmanip.

The command files used by erpmanip can have two basic types of specifications: channel processing, and bin processing. The channel processing specifications define a mapping of linear combinations of input channels for each channel of the output file, while the bin processing specifications define each output bin as a linear combination of the input bins. Since the channel processing is performed on each input bin before it is used in any bin processing, the actual output data can be considered a composition of the channel processing with the bin processing. Frequently, one wishes simply to employ the same input channels in the output file as in the input file, or the same input bins in the output file as in the input file. This can be accomplished by a simple channel processing or bin processing section that simply maps each input channel to each output channel, or each input bin to each output bin. However, this is tedious, and erpmanip (E1) can supply either of these default mappings by simply not including the channel or bin processing specifications in the command file. In this case, the number of channels (bins) in the output file will be the same as in the input file. Note, however, that at least one of channel or bin processing specifications must appear in the command file, otherwise there would be no point in performing the processing; the input file would simply be copied to the output file.

Upon invocation, erpmanip reads the command file and encodes it into an internal, easily executed form. During this process, syntactic validity as well as a few processing constraints are checked. If any errors are detected, a message describing the problem and the offending line in the command file is printed on the standard error. Note that erpmanip "bails out" at the point where an error is detected; hence, only one error will be reported when erpmanip is executed, even if a command file has multiple errors.

Then, erpmanip opens the input file and extracts various parameters from the headers therein, including the number of channels and bins. The applicability of the command file to these particular data is assessed and default channel or bin processing specifications are set up, if needed.

Next, erpmanip checks to see if the output file already exists. If it does, a message to that effect is issued, and erpmanip exits. Otherwise, the output file is created and pre-written with zeros. The pre-writing is performed to ensure that the final data can be written after what can be a lengthy processing period. Finally, the processing of the data begins, following the specifications contained in the command file. If any errors arise during the data processing phase of erpmanip (e.g. data overflow), a message containing a description of the problem and the associated line in the command file is printed, and the output file is removed. If all goes well, erpmanip exits normally with the results of the processing contained in the output file.

Command Files

General Format

The operation of erpmanip is controlled by specifications in an erpmanip command file, or ecf. Command files are standard text files that can be generated using any text editor, and are interpreted as explained here.

Lines in the command file whose first "non-white" character is a "#" are considered comments, and are ignored, as are any blank lines in the file. Each line that is not a comment or blank is partitioned into tokens or words, which are separated from each other by spaces, tabs, or newlines (white space). If a token is encountered that begins with a "#", that token as well as all subsequent ones on that line are treated as comments, and are ignored. Thus, it is possible to include comments on lines containing actual processing specifications, if desired.

The lexical analysis and parsing of command files is not strictly line-oriented for erpmanip, but rather is dependent on the sequence of tokens. Although this flexibility is useful at times, it is recommended that one employ a style which produces easily read (by humans) command files. Tokens can be keywords (sometimes with associated argument information), integer numbers denoting channels or bin numbers, or processing terms. Keywords are identified by matching an initial substring of characters. A copy of the token is converted to lower case prior to identification, so that the case of the keywords is immaterial. For example, the tokens bin, Bin, and BIN would all be classified as the keyword bin. Beware, however, as binstuff, binary, and binomial would also be recognized as the keyword bin. In some cases, more than one initial substring can be used for the same token, e.g. cp and channel_processing are both recognized as keywords that introduce a set of channel processing specifications. The unique initial substrings that are used to identify each keyword are noted in the descriptions below.

A command file is logically divided into major sections, which specify either the channel processing or bin processing that will be performed when the command file is supplied to erpmanip. It is possible to have more than one type of channel processing, hence there can be more than two major sections in a command file, and it is possible to invoke default channel or bin processing by omitting that section. Hence, it is possible to employ command files having only a single channel processing or bin processing section. However, at least one of these must be present (default channel and bin processing just copies the file - cp is better suited for that!).

The order of appearance of the channel and bin processing specifications in the command file is not important, as erpmanip (E1) encodes the entire command file prior to actually processing the data. However, it is customary to place any channel processing specifications before the bin processing specifications, as this is, in some sense, the order in which they are used during processing. Erpmanip recognizes the beginning of a channel processing specification by encountering a token that begins with either "cp" or "channel_processing", while bin processing is introduced by tokens beginning with "bp" or "bin_processing". Note that the long forms must include the underscore to link what would normally be interpreted as separate tokens. Each specification, either channel or bin processing, continues until another channel processing section, bin processing section, or end of file is encountered. The syntax of each of these types of specifications follows.

Channel Processing Specifications

A channel processing section of an erpmanip command file consists of specifications for calculating each channel of the output file. Each such specification is introduced by a token having "channel" or "chan" as an initial substring, followed by an integer which designates the number of the output channel. A colon (:) must follow the number of the output channel, to indicate to erpmanip that subsequent tokens are processing operations that should be performed to calculate that output bin. The specifications for that output channel end when either another "channel", "cp", "bp", or end of file is encountered.

The processing operations that can be invoked to calculate an output channel are limited to changing the channel description, and to summming scaled input channels together as a linear combination. Each term in the linear combination consists of an input channel number with an optional prepended minus sign and/or floating point scaling constant. Rather than give the formal grammatical description of a term, here are a few examples of valid terms for a channel processing specification:


	11     -6     .25*7    -.333*4

Some constraints on the form of a term are:

As an example, here is a valid specification that directs erpmanip to calculate output channel 3 as the difference between input channels 7 and 9, perhaps to form a "bipolar" derivation:


	#
	channel 3 :     chandesc="Fz-Cz"
                	7   -9
	#
	#       This could also be written
	#
	#               1.0*7  -1.0*9

When processing is performed, it occurs in the order that the various terms appear in the command file; but since the operations are restricted to linear combinations, this fact is not really relevant for channel processing specifications.

One can have more than one channel processing specification in a command file, in which case they must be distinguished by assigning a name to each. This is done by including a type=cpname after the channel processing token (i.e. cp or channel_processing) and the first set of channel specifications. The name cpname can be any character string; remember that if it contains white space it should be enclosed in double quotes. This name can be used later to select a specific set of channel processing operations during bin processing. Since all bins in the output data file are constrained to have the same number of channels as well as the same order of channels, the specifications in each channel processing must have the same number of output channels. The ability to use more than one channel processing specification was motivated by the desire to create ipsilateral/contralateral channels, however, it can be employed for other purposes, if they are reasonable. When more than one channel processing section is present, the mapping of input channels to output channels can cease to be a linear one. More on this later. For now, let's look at a full-blown channel processing section that is designed to calculate an averaged mastoid reference:


	#
	#       If we have recorded a number of sites
	#       using the right mastoid as reference
	#       and have also recorded the left mastoid
	#       as channel 7 of the input data, we can
	#       calculate a new set of referential data
	#       that employ the average of the right
	#       and left mastoid as the reference voltage.
	#       The derivation is:
	#
	#           We have for the relevant scalp sites,
	#                Fz - RM
	#                Cz - RM
	#                .
	#                .
	#                .
	#                LM - RM
	#
	#           We want
	#                Fz -.5*(RM+LM)
	#                Cz -.5*(RM+LM)
	#                etc.
	#
	#           Note
	#                Fz -.5*(RM+LM)
	#           =    Fz -.5*RM -.5*LM
	#           =    Fz -RM +.5*RM -.5*LM
	#           =    Fz -RM -.5*(LM - RM)
	#
	#           Hence:
	
	Channel_Processing     type=avgref
	
        	Channel 0 :    chandesc="F3-am"
                       		0  -.5*7
        	Channel 1 :    chandesc="FZ-am"
                       		1  -.5*7
        	Channel 2 :    chandesc="F4-am"
                       		2  -.5*7
        	Channel 3 :    chandesc="P3-am"
                       		3  -.5*7
        	Channel 4 :    chandesc="PZ-am"
                       		4  -.5*7
        	Channel 5 :    chandesc="P4-am"
                       		5  -.5*7
        	Channel 6 :    chandesc="HEOG"
                       		6
	#
	#       Note that we didn't bother to re-reference the horizontal
	#	electro-oculogram
	#
	

A few syntactic notes:

Bin Processing Specifications

The bin processing section (there can be only one) is recognized by the initial substring "bp" or "bin_processing". The specifications in a bin processing section describe each output bin as a linear combination of input bins. The bin processing specifications are very similar to the channel processing specifications, except that the numbers in the terms represent input bin numbers, and the specification of each output bin is introduced with the word bin instead of channel. There are a few other twists, however, as described here.

Following the introduction of the bin processing should be a sequence of "bin" statements, recognized by the initial substring bin. Each of these bin statements must be followed by the number of the output bin that is being defined, and a colon. The output bins must be described in sequence, starting with number 0, to ensure that there are no undefined bins in a data file.

Each bin specification consists of one or more bin processing terms which define the output data as a scaled sum of data from certain input bins. Optionally, one may set the bin and/or condition description, or select the channel processing. There must be at least one data operation defined for each output bin, and the data operations are performed in the order they appear in the bin statement. This is important only when one does alter the channel processing "on the fly". The data processing terms are similar to those used in the channel processing section, except the scaling constants now apply to all the (newly generated) output channels for the specified input bin. In addition, there is a syntax for specifying a scaling based on the number of sums in the corresponding input bin (actually, scaling by the "number of sums" is a misnomer; erpmanip actually scales by the fraction of the total sums, which is really what is intended anyway). Instead of a verbose description of the syntax, let's look at a few output bin specifications:


	#
	#       A standard output bin forming a difference wave:
	#
	Bin 6 :    bindesc="Hi Tone difference wave"
           	   15  -31
	#
	#       To set the condition description:
	#       (the bin description is inherited
	#       from the input file...........)
	#
	Bin 6 :    condesc="Difference waves"
           	   15  -31
	#
	#       To demonstrate scaling:
	#
	Bin 7 :    .5*15  -.5*31                      # Note how
           	   bindesc="Lumped difference wave"   # order is not
           	   .5*32  -.5*16                      # important
	#
	#       To scale by the number of sums:
	#
	Bin 12 :   ^18 ^34                            # division is
           	   bindesc="Lumped False Alarms"      # implicit
	

Note that when one scales by the number of sums, the total sums from all terms involving such scaling is found, and then the scaling fraction is set to the sums in the bin being processed divided by the total sums. This obviates an explicit division operation, and simplifies everything. When one employs such scaling, it is not possible to scale by a predetermined scalar as well.

Examples

The best way to gain a feel for writing erpmanip command files (ecfs) is to emulate working examples. The following ecfs are known to work with compatible input data, and each demonstrates a different type of task that can be solved using erpmanip.

Default Channel Processing

This first example demonstrates the overall form of an ecf, and the most common types of operations one may wish to perform. It also shows how the lack of a channel processing section invokes default channel processing.


	#
	#       A short example to caculate various difference
	#       waves that exploits default channel processing,
	#       so that all channels in the input file will be
	#	processed for the output file
	#
	#       Assume input bins:
	#               0 -> cals
	#               1 -> Low standard tones, attend low
	#               2 -> Hi standard tones, attend low
	#               3 -> Low target tones, attend low
	#               4 -> Hi target tones, attend low
	#               5 -> Low standard tones, attend hi
	#               6 -> Hi standard tones, attend hi
	#               7 -> Low target tones, attend hi
	#               8 -> Hi target tones, attend hi
	#

	Bin_Processing
        	#
        	#   These first four output bins are pretty simple
        	#
        	Bin 0 : bindesc="Low tone standard difference"
                	1 -5
        	Bin 1 : bindesc="Hi tone standard difference"
                	6 -2
        	Bin 2 : bindesc="Low tone target difference"
                	3 -7
        	Bin 3 : bindesc="Hi tone target difference"
                	8 -4
        	#
        	#   We can average across hi and low eliciting
        	#   stimuli by employing scaling, thusly:
        	#
        	Bin 4 : bindesc="Lumped hi and low standard difference"
                	.5*1 -.5*5 +.5*6 -.5*2
        	Bin 5 : bindesc="Lumped hi and low target difference"
                	.5*3 -.5*7 +.5*8 -.5*4

Multiple Channel Processing

Here is how one can select from more than one set of channel processing specifications "on the fly" during bin processing.


	#
	#       An extract from an experiment of the type
	#       performed by Steve Luck, using bilateral
	#       visual stimuli, with asymmetric visual field
	#       processing in different bins. The object is
	#	to average across contralateral scalp sites
	#	for different output bins. This demonstrates
	#       multiple channel processing sections, and
	#       their application. Assume input file channels are:
	#
	#       0 -> F3         1 -> Fz         2 -> F4
	#       3 -> C3         4 -> Cz         5 -> C4
	#       6 -> P3         7 -> Pz         8 -> P4
	#       9 -> O1         10 -> Oz        11 -> O2
	#       12 -> HEOG
	#
	#       with the input bins described:
	#               0 -> cals
	#               1 -> Left field color pop-out
	#               2 -> Right field color pop-out
	#               3 -> Left field shape pop-out
	#               4 -> Right field shape pop-out
	#       
	#       First, two types of channel processing are
	#       defined, one for left and one for right
	#       visual field stimuli.
	#
	Channel_processing      type="LVF"
        	channel 0 :     0       chandesc="F3/4 IP"
        	channel 1 :     1       chandesc="Fz"
        	channel 2 :     2       chandesc="F3/4 CN"
        	channel 3 :     3       chandesc="C3/4 IP"
        	channel 4 :     4       chandesc="Cz"
        	channel 5 :     5       chandesc="C3/4 CN"
        	channel 6 :     6       chandesc="P3/4 IP"
        	channel 7 :     7       chandesc="Pz"
        	channel 8 :     8       chandesc="P3/4 CN"
        	channel 9 :     9       chandesc="O1/2 IP"
        	channel 10 :    10      chandesc="Oz"
        	channel 11 :    11      chandesc="O1/2 CN"
        	channel 12 :    12      chandesc="HEOG IP"
	#
	#       Now one for the right visual field which
	#       exchanges ipsilateral and contralateral
	#       scalp sites ... note that a string argument
	#       (e.g. RVF) need not be enclosed in double
	#       quotes it does not contain spaces or tabs.
	#       Since channel description operation are
	#       only performed when the corresponding
	#       processing is invoked, the safe route is
	#       to duplicate the channel descriptions:
	#
	Channel_processing      type=RVF
        	channel 0 :     2       chandesc="F3/4 IP"
        	channel 1 :     1       chandesc="Fz"
        	channel 2 :     0       chandesc="F3/4 CN"
        	channel 3 :     5       chandesc="C3/4 IP"
        	channel 4 :     4       chandesc="Cz"
        	channel 5 :     3       chandesc="C3/4 CN"
        	channel 6 :     8       chandesc="P3/4 IP"
        	channel 7 :     7       chandesc="Pz"
        	channel 8 :     6       chandesc="P3/4 CN"
        	channel 9 :     11      chandesc="O1/2 IP"
        	channel 10 :    10      chandesc="Oz"
        	channel 11 :    9       chandesc="O1/2 CN"
        	channel 12 :    -12     chandesc="HEOG IP"
	#
	#       Now, we'll lump over eliciting field for two
	#       output bins, for brevity of pedagogy.
	#
	Bin_Processing
        	Bin 0 : bindesc="Color Pop-out"
                	cptype=LVF  .5*1    cptype=RVF  .5*2
	#
	#               Note how the different types of
	#               channel processing can be selected
	#               during bin operations. Now the
	#               shape pop-outs...
	#
        	Bin 1 : bindesc="Shape Pop-out"
                	cptype=RVF  .5*3    cptype=RVF  .5*4

Current Source Densities

To show how one might employ erpmanip to calculate estimates of current flowing into or out of the scalp through the skull, this example is offered. It also shows how the lack of a bin processing section results is default bin processing.


	#
	#       In this example, we wish to calculate two-
	#       dimensional transverse-skull current source
	#       densities, given a set of recorded sites laid
	#       out as follows:
	#
	#                      0       1       2       3
	#
	#              4       5       6       7       8      9
	#
	#                    10       11      12      13
	#
	#       We can calculate a five-point approximation
	#       for input sites 5, 6, 7, and 8 where a positive
	#       value implies outward positive current as follows:
	#
	Channel_processing
        	Chan 0 :        chandesc=OLa
                	4*5  -1*0  -1*4  -1*10 -1*6
	#
	#
	#       The exact coefficients need to be determined
	#       according to the estimate and mapping one is
	#       using, and using the appropriate ones can result
	#       in a standardized value for current flow.
	#
        	Chan 1 :        chandesc=OLb
                	4*6  -1*1  -1*5  -1*11 -1*7
        	Chan 2 :        chandesc=ORb
                	4*7  -1*2  -1*6  -1*12 -1*8
        	Chan 3 :        chandesc=ORa
                	4*8  -1*3  -1*7  -1*13 -1*9
	#
	#       If all we wanted to do was create a file
	#       with the same bins but calculate these
	#       four channels of CSD estimates, this file is
	#       sufficient; \fBerpmanip\fR will supply a default
	#       copying of input to output bins.
	#

Scaling by Sums

The decision whether to scale by the number of sums or not should really be based on the model of the underlying signal being estimated by the signal-averaging process. Here's what a command file might look like.


	#
	#       Here is an (admittedly contrived) example of
	#       scaling by the number of sums that assumes that the
	#       underlying signal is identical in the bins that
	#       will be averaged together.
	#
	#       We'll employ default channel processing......
	#
	
	Bin_Processing
        	Bin 0 : bindesc="Type A targets"
                	^1 ^4
        	Bin 1 : bindesc="Type B targets"
                	^2 ^5
        	Bin 2 : bindesc="Type C targets"
                	^3 ^6
        	Bin 3 : bindesc="All standards"
                	^7  ^8  ^9  ^10

Data Compatibility

As one might expect, not all data files can be used with all erpmanip command files. A number of compatibility checks are performed prior to commencing calculations, and encompass the following constraints:

Sums and Accounting

The question of the equivalent number of sums that should be associated with an output bin can be difficult to answer for certain operations, while it is straighforward for others. For example, scaling by the number of sums (fraction of sums) clearly requires that the total sums be associated with the data. On the other hand, when one forms a difference wave, it is not so clear what the associated number of sums should be. The convention that is adopted by erpmanip is to always add together the sums for each input bin that is employed to calculate a particular output bin, regardless of whether the input bin was added, subtracted, or scaled by a fraction in the calculation. This approach, at least, gives some idea of the amount of data involved in the calculation, but does not always represent the appropriate "equivalent N" for standard statistical parameters.

Similarly, the counts of trials rejected as data errors or for artifacts are summed, regardless of the type of processing in which they participated.

Errors

Errors can arise from a wide variety of sources. In all cases, a series of messages is printed on the standard error, one for each nested routine which could not proceed normally. If possible, the associated line in the command file is included in the messages. All errors are fatal, which maintains the integrity of data files as well as simplifies error processing. In most cases a solution to the cause of the problem will be apparent. For example, attempting to create an already existing output file can be avoided by removing the existing data file or supplying a new name for the output file in the invocation of erpmanip.

GOOD LUCK.


ERPMANIP
Tutorial Introduction and Reference Manual
Jonathan C. Hansen


© 2005 UCSD ERP Lab
Please send comments and suggestions to the ERPSS Webmaster