Library: Quick Start
Basic usage
1. Load TCRs into a data frame
Examples of files you may want to load:
10X:
filtered_contig_annotations.csvAdaptive:
Sample_TCRB.tsvIMGT: Output from
MiXCRor other tools
[1]:
import tcrconvert
import pandas as pd
tcr_file = tcrconvert.get_example_path('tenx.csv')
tcrs = pd.read_csv(tcr_file)[['barcode', 'v_gene', 'j_gene', 'cdr3']]
tcrs
[1]:
| barcode | v_gene | j_gene | cdr3 | |
|---|---|---|---|---|
| 0 | AAACCTGAGACCACGA-1 | TRAV29/DV5 | TRAJ12 | CAVMDSSYKLIF |
| 1 | AAACCTGAGACCACGA-1 | TRBV20/OR9-2 | TRBJ2-1 | CASSGLAGGYNEQFF |
| 2 | AAACCTGAGGCTCTTA-1 | TRDV2 | TRDJ3 | CASSGVAGGTDTQYF |
| 3 | AAACCTGAGGCTCTTA-1 | TRGV9 | TRGJ1 | CAVKDSNYQLIW |
2. Convert
[2]:
new_tcrs = tcrconvert.convert_gene(tcrs, frm='tenx', to='adaptive')
new_tcrs
WARNING - Adaptive only captures VDJ genes; C genes will be NA.
INFO - Converting from 10X. Using *01 as allele for all genes.
[2]:
| barcode | v_gene | j_gene | cdr3 | |
|---|---|---|---|---|
| 0 | AAACCTGAGACCACGA-1 | TCRAV29-01*01 | TCRAJ12-01*01 | CAVMDSSYKLIF |
| 1 | AAACCTGAGACCACGA-1 | TCRBV20-or09_02*01 | TCRBJ02-01*01 | CASSGLAGGYNEQFF |
| 2 | AAACCTGAGGCTCTTA-1 | TCRDV02-01*01 | TCRDJ03-01*01 | CASSGVAGGTDTQYF |
| 3 | AAACCTGAGGCTCTTA-1 | TCRGV09-01*01 | TCRGJ01-01*01 | CAVKDSNYQLIW |
Tip: Suppress INFO-level messages by setting
verbose=False. Warnings and errors will still appear.
Tip: If your Adaptive data lacks
x_resolved/xMaxResolvedcolumns, create them yourself by combining thex_gene/xGeneNameandx_allele/xGeneAllelecolumns. See the FAQs.
AIRR data
Supply the standard AIRR gene column names to frm_cols:
new_airr = tcrconvert.convert_gene(airr, frm = "imgt", to = "adaptive",
frm_cols = c('v_call', 'd_call', 'j_call', 'c_call'))
Custom column names
By default, TCRconvert assumes these column names based on the input nomenclature (frm):
frm='imgt':['v_gene', 'd_gene', 'j_gene', 'c_gene']frm='tenx':['v_gene', 'd_gene', 'j_gene', 'c_gene']frm='adaptive':['v_resolved', 'd_resolved', 'j_resolved']frm='adaptivev2':['vMaxResolved', 'dMaxResolved', 'jMaxResolved']
You can override these columns using frm_cols:
1. Load 10X data with custom column names
[3]:
custom_file = tcrconvert.get_example_path('customcols.csv')
custom = pd.read_csv(custom_file)
custom
[3]:
| myVgene | myDgene | myJgene | myCgene | myCDR3 | antigen | |
|---|---|---|---|---|---|---|
| 0 | TRAV1-2 | TRBD1 | TRAJ12 | TRAC | CAVMDSSYKLIF | Flu |
| 1 | TRBV6-1 | TRBD2 | TRBJ2-1 | TRBC2 | CASSGLAGGYNEQFF | Flu |
| 2 | TRBV6-4 | TRBD2 | TRBJ2-3 | TRBC2 | CASSGVAGGTDTQYF | CMV |
| 3 | TRAV1-2 | TRBD1 | TRAJ33 | TRAC | CAVKDSNYQLIW | CMV |
| 4 | TRBV2 | TRBD1 | TRBJ1-2 | TRBC1 | CASNQGLNYGYTF | CMV |
2. Specify names using ``frm_cols`` and convert to IMGT
[4]:
custom_new = tcrconvert.convert_gene(
custom,
frm='tenx',
to='imgt',
verbose=False,
frm_cols=['myVgene', 'myDgene', 'myJgene', 'myCgene'],
)
custom_new
[4]:
| myVgene | myDgene | myJgene | myCgene | myCDR3 | antigen | |
|---|---|---|---|---|---|---|
| 0 | TRAV1-2*01 | TRBD1*01 | TRAJ12*01 | TRAC*01 | CAVMDSSYKLIF | Flu |
| 1 | TRBV6-1*01 | TRBD2*01 | TRBJ2-1*01 | TRBC2*01 | CASSGLAGGYNEQFF | Flu |
| 2 | TRBV6-4*01 | TRBD2*01 | TRBJ2-3*01 | TRBC2*01 | CASSGVAGGTDTQYF | CMV |
| 3 | TRAV1-2*01 | TRBD1*01 | TRAJ33*01 | TRAC*01 | CAVKDSNYQLIW | CMV |
| 4 | TRBV2*01 | TRBD1*01 | TRBJ1-2*01 | TRBC1*01 | CASNQGLNYGYTF | CMV |
Rhesus or mouse data
Use species='rhesus' or species='mouse'
[5]:
new_tcrs = tcrconvert.convert_gene(
tcrs, frm='tenx', to='imgt', verbose=False, species='rhesus'
) # or 'mouse'
new_tcrs
WARNING - These genes are not in IMGT for this species and will be replaced with NA:
['TRAV29/DV5', 'TRBV20/OR9-2', 'TRGJ1']
[5]:
| barcode | v_gene | j_gene | cdr3 | |
|---|---|---|---|---|
| 0 | AAACCTGAGACCACGA-1 | <NA> | TRAJ12*01 | CAVMDSSYKLIF |
| 1 | AAACCTGAGACCACGA-1 | <NA> | TRBJ2-1*01 | CASSGLAGGYNEQFF |
| 2 | AAACCTGAGGCTCTTA-1 | TRDV2*01 | TRDJ3*01 | CASSGVAGGTDTQYF |
| 3 | AAACCTGAGGCTCTTA-1 | TRGV9*01 | <NA> | CAVKDSNYQLIW |