First, you need to download the GeneMANIA JAR file. If you already installed the plugin through Cytoscape, you can find it in one of the following places:
~/.cytoscape/Cytoscape Version/plugins/GeneMANIA-Version/
My Documents\.cytoscape\Cytoscape Version\plugins\GeneMANIA-Version\
Gene Sanitizer | Prints out the mappings between the given gene list and GeneMANIA's preferred identifiers. |
---|---|
Id Importer | Creates a new data set from a set of identifiers and aliases. The identifiers correspond to node labels. |
Query Runner | Runs one or more predictions and writes the results to disk. Each prediction needs to be provided in the form of a query file. One prediction report is generated for each query file. |
Cross Validator | Performs k-fold cross validation on the prediction algorithm for a given set of pre-classified genes. Cross Validator reports on the following evaluation measures: area under the ROC curve (AUC-ROC), area under the precision-recall curve (AUC-PR), and precision at fixed recall. |
Network Assessor | Assesses the value of a set of networks by performing k-fold cross validation against a baseline network set, as well as the networks to assess. The percentage error of each validation measure is computed for each query in the validation set and reported. |
Network Importer | Imports network/profile data from a file into a GeneMANIA data set. |
Validation Set Maker | Produces sets of genes based on Gene Ontology (GO) annotations for use in cross validation. One gene set is created for each GO category in the ontology. More specific annotations are propagated up to all genes associated with any of the parent annotations. |
Prints out the mappings between the given gene list and GeneMANIA's preferred identifiers. This tool is useful for checking which of your genes are recognized by GeneMANIA. The output is a tab-delimited text file containing one mapping per line. The first item is GeneMANIA's preferred identifier, or nothing, if the identifier that follows isn't recognized.
Name | Description |
---|---|
--data directory | Path to a GeneMANIA data set (e.g. /Users/username/genemania_plugin/gmdata-2010-07-28 ). |
--organism name | The name or taxonomy id of an organism whose genes should be considered. |
Creates a new data set from a set of identifiers and aliases. The identifiers correspond to node labels. Although the resulting data set is generally treated like an organism, where the given ids denote its genome, it does not have to be an organism. The identifiers can be anything, as long as they're unique within the data set.
Name | Description |
---|---|
--data directory | Path to a GeneMANIA data set (e.g. /Users/username/genemania_plugin/gmdata-2010-07-28 ). |
--filename file-name | The path to a file that contains a complete set of identifiers that will serve as the basis of a new data set. Each line in the file should follow this format:
primary-id ( \t alias-1 ... )
|
--name entity-name | The name of the resulting entity (e.g. organism). |
--alias entity-name | Optional. An alias for the resulting entity (e.g. shorter, informal name) |
--taxid number | Optional. The taxonomy id of the resulting entity, if applicable. |
--description description | Optional. A description of the resulting entity. |
Name | Description |
---|---|
--data directory | Path to a GeneMANIA data set (e.g. /Users/username/genemania_plugin/gmdata-2010-07-28 ). |
--in input-format |
Optional. The format of the query files, which can be one of:
|
--out output-format |
Optional. The format of the output files, which can be one of:
|
--scoring-method method |
Optional. The method used to compute the gene scores, which can be one of:
|
--ids id-types |
Optional. A comma-separated list of identifier types, in descending order of preference, which may be one or more of the following:
|
--results directory | Optional. Path to where the prediction result files will be created (one per input query file). Defaults to the current working directory. |
--threads number | Optional. The maximum number of parallel predictions. Ideally this should be set to the number of processing cores. Defaults to 1. |
--verbose | Optional. Makes QueryRunner print more details about what's happening. |
--list-networks organism-name | Optional. Lists the available networks for the given organism. You may need to put quotes around the organism name if invoked from a shell. |
--list-genes organism-name | Optional. Lists the genes that are recognized for the given organism. You may need to put quotes around the organism name if invoked from a shell. Each line in the output contains a gene and all its synonyms, if any. |
Performs k-fold cross validation on the prediction algorithm for a given set of pre-classified genes. Cross Validator reports on the following evaluation measures: area under the ROC curve (AUC-ROC), area under the precision-recall curve (AUC-PR), and precision at fixed recall.
Name | Description |
---|---|
--data directory | Path to a GeneMANIA data set (e.g. /Users/username/genemania_plugin/gmdata-2010-07-28 ). |
--organism name | The name or taxonomy id of an organism whose genes should be considered. |
--query file-name | Perform validation against the gene sets listed in the given file. It must be formatted this way. |
--networks network-list | A comma-separated list of network types and/or network names. To get a full listing of network names, use the option --list-networks with Query Runner. |
--exclude-networks network-list | Optional. A comma-separated list of network types and/or network names to exclude from the --networks list. |
--folds number | Optional. The number of folds to use during cross validation. Defaults to 5. |
--min number | Optional. The minimum number of positive genes for a query. Queries with a fewer number of genes will be skipped. Defaults to 10. |
--max number | Optional. The maximum number of positive genes for a query. Queries with a larger number of genes will be skipped. Defaults to 300. |
--use-go-cache | Optional. Perform validation against bundled Gene Ontology gene sets. In this case, the query file should contain one GO id per line (e.g. GO:0005786 ). These gene sets have been pre-filtered so that the smallest has 10 genes and the largest has 300. |
--outfile file-name | Optional. The file where the validation results should be saved. If not provided, the results are sent to standard output (usually the console). |
--auto-negatives | Optional. Forces all non-positive genes to be labeled as negative examples during prediction. Otherwise, negative examples must be explicitly listed in the query file. |
--method weighting-method | Optional. The weighting method to use when combining the individual networks. Defaults to automatic . |
--seed number | Optional. A value used to initialize the pseudo random number generator used for shuffling each gene set during validation. Setting the seed to a constant value will make the validation results deterministic. Defaults to something pseudo-random. |
--threads number | Optional. The maximum number of parallel predictions. Ideally this should be set to the number of processing cores. Defaults to 1. |
--verbose | Optional. Makes CrossValidator print more details about what's happening. |
Multiple gene sets may be used during cross validation. Each gene set should be on its own line using the format below:
...where GENE_SET_ID
is the name of your gene set, gene_symbol
is a positive gene example, and neg_gene_symbol
is a negative gene example (i.e. definitely not a member of the gene set).
If --use-go-cache
is also specified, the query file should contain one GO id per line (e.g. GO:0005786
).
This query file only lists positive examples of genes. Use the option --auto-negatives
to automatically label all other genes in each set as negative examples.
Assesses the value of a set of networks by performing k-fold cross validation against a baseline network set, as well as the networks to assess. The percentage error of each validation measure is computed for each query in the validation set and reported.
Name | Description |
---|---|
--data directory | Path to a GeneMANIA data set (e.g. /Users/username/genemania_plugin/gmdata-2010-07-28 ). |
--organism name | The name or taxonomy id of an organism whose genes should be considered. |
--query file-name | Perform validation against the gene sets listed in the given file. It must be formatted this way. |
--baseline network-list | A comma-separated list of network types and/or network names to use as a baseline for comparison. To get a full listing of network names, use the option --list-networks with Query Runner. |
--exclude-baseline network-list | Optional. A comma-separated list of network types and/or network names to exclude from the --baseline list. |
--networks network-list | A comma-separated list of network types and/or network names representing the networks to assess. To get a full listing of network names, use the option --list-networks with Query Runner. |
--exclude-networks network-list | Optional. A comma-separated list of network types and/or network names to exclude from the --networks list. |
--folds number | Optional. The number of folds to use during cross validation. Defaults to 5. |
--min number | Optional. The minimum number of positive genes for a query. Queries with a fewer number of genes will be skipped. Defaults to 10. |
--max number | Optional. The maximum number of positive genes for a query. Queries with a larger number of genes will be skipped. Defaults to 300. |
--use-go-cache | Optional. Perform validation against bundled Gene Ontology gene sets. In this case, the query file should contain one GO id per line (e.g. GO:0005786 ). These gene sets have been pre-filtered so that the smallest has 10 genes and the largest has 300. |
--outfile file-name | Optional. The file where the validation results should be saved. If not provided, the results are sent to standard output (usually the console). |
--auto-negatives | Optional. Forces all non-positive genes to be labeled as negative examples during prediction. Otherwise, negative examples must be explicitly listed in the query file. |
--method weighting-method | Optional. The weighting method to use when combining the individual networks. Defaults to automatic . |
--seed number | Optional. A value used to initialize the pseudo random number generator used for shuffling each gene set during validation. Setting the seed to a constant value will make the validation results deterministic. Defaults to something pseudo-random. |
--threads number | Optional. The maximum number of parallel predictions. Ideally this should be set to the number of processing cores. Defaults to 1. |
--verbose | Optional. Makes NetworkAssessor print more details about what's happening. |
Network Assessor uses the same query file format as Cross Validator.
Imports network/profile data from a file into a GeneMANIA data set.
Name | Description |
---|---|
--data directory | Path to a GeneMANIA data set (e.g. /Users/username/genemania_plugin/gmdata-2010-07-28 ). |
--organism name | The name or taxonomy id of an organism whose genes should be considered. |
--filename path | Path to a file containing either interaction or profile data. Supported types of data include:
|
--name network-name | The name of the new network. |
--description description | Optional. A description of the new network. |
--group network-type | Optional. The network group to which the new network will be added. If this group does not exist, it will be created. Defaults to other . |
--group-description description | Optional. A short description for a network group being created. Only applicable when the group specified by --group does not already exist. |
--color RRGGBB | Optional. The colour of the network group being created. Only applicable when the group specified by --group does not already exist. Defaults to 000000 (i.e. black). |
--verbose | Optional. Makes NetworkImporter print more details about what's happening. |
Produces sets of genes based on Gene Ontology (GO) annotations for use in cross validation. One gene set is created for each GO category in the ontology. More specific annotations are propagated up to all genes associated with any of the parent annotations.
Name | Description |
---|---|
--data directory | Path to a GeneMANIA data set (e.g. /Users/username/genemania_plugin/gmdata-2010-07-28 ). |
--organism name | The name or taxonomy id of an organism whose genes should be considered. |
--query filename | The file where the resulting validation set should be saved. |
--db JDBC-connection-string | Optional. A JDBC connection string for a GO MySQL database. No other database backends are currently supported. Defaults to EBI's MySQL instance (i.e. jdbc:mysql://mysql.ebi.ac.uk:4085/go_latest?user=go_select&password=amigo ) |
--branch GO-branch | Optional. One of bp , mf , cc , or all , which selects GO categories from the biological process, molecular function, cellular component, or all branches, respectively. Defaults to all . |
Name | Taxonomy Id |
---|---|
A. Thaliana | 3702 |
C. Elegans | 6239 |
D. Melanogaster | 7227 |
H. Sapiens | 9606 |
M. Musculus | 10090 |
S. Cerevisiae | 4932 |
Networks may be specified by type or by name. To get a full listing of network names, use the option --list-networks
.
coexp | Co-expression |
---|---|
coloc | Co-localization |
gi | Genetic interactions |
path | Pathway interactions |
pi | Physical interactions |
predict | Predicted |
spd | Shared protein domains |
other | Networks that don't belong to any of the above types. |
all | Shorthand for specifying all available networks |
preferred | Shorthand for coexp,pi,gi . Typically used for cross validation. |
automatic | Default — The networks are weighted such that the query genes interact as much as possible.
Note: This option corresponds to the query gene-based combining method on the web site. If you want the same behaviour as the website's automatic combining method, then omit any combining method options. |
---|---|
average | All networks are weighted equally. |
average_category | Networks are weighted such that each type of network has the same overall weight. |
For Organisms With GO Annotations: | |
bp | Networks are weighted in an attempt to reproduce Gene Ontology Biological Process co-annotation patterns. |
mf | Networks are weighted in an attempt to reproduce Gene Ontology Molecular Function co-annotation patterns. |
cc | Networks are weighted in an attempt to reproduce Gene Ontology Cellular Component co-annotation patterns. |