Class ExtractBarcodesProgram

    • Field Detail

      • DISTANCE_MODE

        @Argument(doc="The distance metric that should be used to compare the barcode-reads and the provided barcodes for finding the best and second-best assignments.")
        public DistanceMetric DISTANCE_MODE
      • MAX_MISMATCHES

        @Argument(doc="Maximum mismatches for a barcode to be considered a match.")
        public int MAX_MISMATCHES
      • MIN_MISMATCH_DELTA

        @Argument(doc="Minimum difference between number of mismatches in the best and second best barcodes for a barcode to be considered a match.")
        public int MIN_MISMATCH_DELTA
      • MAX_NO_CALLS

        @Argument(doc="Maximum allowable number of no-calls in a barcode read before it is considered unmatchable.")
        public int MAX_NO_CALLS
      • MINIMUM_BASE_QUALITY

        @Argument(shortName="Q",
                  doc="Minimum base quality. Any barcode bases falling below this quality will be considered a mismatch even if the bases match.")
        public int MINIMUM_BASE_QUALITY
      • MINIMUM_QUALITY

        @Argument(doc="The minimum quality (after transforming 0s to 1s) expected from reads.  If qualities are lower than this value, an error is thrown. The default of 2 is what the Illumina\'s spec describes as the minimum, but in practice the value has been observed lower.")
        public int MINIMUM_QUALITY
      • LANE

        @Argument(doc="Lane number. This can be specified multiple times. Reads with the same index in multiple lanes will be added to the same output file.",
                  shortName="L")
        public List<Integer> LANE
      • READ_STRUCTURE

        @Argument(doc="A description of the logical structure of clusters in an Illumina Run, i.e. a description of the structure IlluminaBasecallsToSam assumes the  data to be in. It should consist of integer/character pairs describing the number of cycles and the type of those cycles (B for Sample Barcode, M for molecular barcode, T for Template, and S for skip).  E.g. If the input data consists of 80 base clusters and we provide a read structure of \"28T8M8B8S28T\" then the sequence may be split up into four reads:\n* read one with 28 cycles (bases) of template\n* read two with 8 cycles (bases) of molecular barcode (ex. unique molecular barcode)\n* read three with 8 cycles (bases) of sample barcode\n* 8 cycles (bases) skipped.\n* read four with 28 cycles (bases) of template\nThe skipped cycles would NOT be included in an output SAM/BAM file or in read groups therein.",
                  shortName="RS")
        public String READ_STRUCTURE
      • COMPRESS_OUTPUTS

        @Argument(shortName="GZIP",
                  doc="Compress output FASTQ files using gzip and append a .gz extension to the file names.")
        public boolean COMPRESS_OUTPUTS
      • BASECALLS_DIR

        @Argument(doc="The Illumina basecalls directory. ",
                  shortName="B")
        public File BASECALLS_DIR
      • METRICS_FILE

        @Argument(doc="Per-barcode and per-lane metrics written to this file.",
                  shortName="M",
                  optional=true)
        public File METRICS_FILE
      • INPUT_PARAMS_FILE

        @Argument(doc="The input file that defines parameters for the program. This is the BARCODE_FILE for `ExtractIlluminaBarcodes` or the MULTIPLEX_PARAMS or LIBRARY_PARAMS file for `IlluminaBasecallsToFastq`  or `IlluminaBasecallsToSam`",
                  optional=true)
        public File INPUT_PARAMS_FILE
      • BARCODE_COLUMN

        public static final String BARCODE_COLUMN
        Column header for the first barcode sequence (preferred).
        See Also:
        Constant Field Values
      • BARCODE_NAME_COLUMN

        public static final String BARCODE_NAME_COLUMN
        Column header for the barcode name.
        See Also:
        Constant Field Values
      • LIBRARY_NAME_COLUMN

        public static final String LIBRARY_NAME_COLUMN
        Column header for the library name.
        See Also:
        Constant Field Values
      • BARCODE_PREFIXES

        public static final Set<String> BARCODE_PREFIXES
      • inputReadStructure

        protected ReadStructure inputReadStructure
        The read structure of the actual Illumina Run, i.e. the readStructure of the input data
    • Constructor Detail

      • ExtractBarcodesProgram

        public ExtractBarcodesProgram()
    • Method Detail

      • customCommandLineValidation

        protected String[] customCommandLineValidation()
        Parses all barcodes from input files and validates all barcodes are the same length and unique
        Overrides:
        customCommandLineValidation in class CommandLineProgram
        Returns:
        null if command line is valid. If command line is invalid, returns an array of error message to be written to the appropriate place.
      • collectErrorMessages

        protected String[] collectErrorMessages​(List<String> messages,
                                                String[] superErrors)
      • outputMetrics

        protected void outputMetrics()
      • parseInputFile

        protected static htsjdk.samtools.util.Tuple<Map<String,​BarcodeMetric>,​List<String>> parseInputFile​(File inputFile,
                                                                                                                       ReadStructure readStructure)
        Parses any one of the following types of files: ExtractIlluminaBarcodes BARCODE_FILE IlluminaBasecallsToFastq MULTIPLEX_PARAMS IlluminaBasecallsToSam LIBRARY_PARAMS This will validate to file format as well as populate a Map of barcodes to metrics.
        Parameters:
        inputFile - The input file that is being parsed
        readStructure - The read structure for the reads of the run