I think the important thing is that it is the -largest- contig and is associated with a whopping 1,407,705 sequences, which in turn is a around 5% of the 26,108,482 remaining sequences, as mentioned by the mathematician. The fact that it has a 98.5% match with a human RNA sequence strongly suggests that this alleged "Cov 2" virus is just mislabelled human RNA.
It is NOT the largest contig.
I think that depends on whether you're looking at the original report from the authors of the alleged discovery of Cov 2, or what the mathematician was able to find while trying reproduce their results.
I don't think you even know what a contig is at this point.
I think the following definitions of reads, sequences and contigs gives a good description of contig as well as the other 2 terms:
**
My understanding of those three words as follows:
sequence is a generic name describing order of biological letters (DNA/RNA or amino acids). Both contigs and reads are DNA/RNA or aa sequences
reads are just a short hand for sequenced reads. Usually sequenced reads refer to somewhat digital information obtained from the sequencing machine (for example Illumina MySeq) and stored in the fastq file with quality scores per base. Reads are usually short. However "short" changes rapidly. Right now MySeq produces reads anywhere between 50-150 base pairs long (bp). From a single run (it will really depends on the run) you can get millions of reads, where each read will be set bp size e.g 100bp long. All reads are stored in a single fastq file per replicate, where all reads in that file are usually of uniform size e.g all 5 million reads are 100bp long.
As a bioinformatician your first job is to identify where about those reads come from. Depending on the experimental goal and on what sort of sequencing you were doing e.g DNA-seq or RNA-seq you may or may not encounter contigs.
contigs are simply reads that have been assembled together. For example if you are doing de novo transcriptomics. Then you would:
purify your transcript from a tissue and send it off for sequencing
get your fastq files with sequenced reads, that are all short reads (e.g 100 bp)
assemble those 100bp reads into a longer contig that hopefully will resemble your individual transcript
**
Source:
https://biology.stackexchange.com/q...equence-reads-and-contigs-of-genetic-material