#] #] ********************* #] "$d_Refs"'Mathematics/8_[NN, genetics] [nomenclature, math].txt' # www.BillHowell.ca 10Feb2024 initial # view in text editor, using constant-width font (eg courier), tabWidth = 3 #48************************************************48 #24************************24 # Table of Contents, generate with : # $ grep "^#]" "$d_Refs"'Mathematics/8_[NN, genetics] [nomenclature, math].txt' | sed "s/^#\]/ /" # ********************* "$d_Refs"'Mathematics/8_[NN, genetics] [nomenclature, math].txt' ??Feb2024 ??Feb2024 ??Feb2024 +-----+ 10Feb2024 IJCNN2024 review paper stuff Nomenclature Gene examples (oldies but goodies) global independent feature extractor KmerEmbedding: Deep Neural Nets (DNN) k-mer encoding one-hot extraction #24************************24 # Setup, ToDos, #08********08 #] ??Feb2024 #08********08 #] ??Feb2024 #08********08 #] ??Feb2024 #08********08 #] +-----+ #] 10Feb2024 IJCNN2024 review paper stuff "$d_web"'Neural nets/Paper reviews/ [journal, conference] paper review- math only.txt' #] Nomenclature lncRNAs long non-coding RNA endogenous single-stranded polynucleotides with a sequence length >=200 nucleotides that does not encode proteins ncRNA non-coding RNA LPIGLAM LPI prediction based on [global, local] features of lncRNA and protein LPI lncRNA-protein interactions #] Gene examples (oldies but goodies) p53 & H19 interplay has major roles in tumorigenesis and metastasis? H19 tumorigenesis, but also crucial to embryonic development one of the first discovered lncRNAs p53 tumor suppressor, represses the H19 gene are mutually counter-regulated : |-> P53 represses the H19 gene |-> H19-derived miR-675 inhibits p53 and p53-dependent protein expression HOTAIR HOX Transcript Antisense Intergenic RNA |-> PRC2 H3K27-methylation |-> LSD1 H3K4-demethylation [7] #] global independent feature extractor #] KmerEmbedding: We use k-mer features to encode lncRNA and protein to capture the global characteristics of sequences. The k-mer features transform variable-length sequences into fixed-length feature vectors. For lncRNA sequences we calculate the corresponding nucleotide frequencies (A, U, G, C) to fully extract features. Then, we take combinations of k = 1, 2, 3, and 4 to obtain a 340-dimensional feature vector. For proteins, based on dipole moment and side chain volume, we divide the 20 amino acids into 7 groups: {Ala, Gly, Vlal}, {Ile, Leu, Phe, Pro}, {Thr, Met, Tyr, Ser}, {His, Asn, Tpr, Gln}, {Arg, Lys}, {Glu, Asp}, {Cys} [27]. We take combinations of k = 1, 2, and 3 to calculate the frequency of protein sequences, resulting in a 399-dimensional feature vector. #] Deep Neural Nets (DNN) As k-mer features already contain higher-level information, we utilize a simple deep neural network to extract features, employing LeakyReLU to prevent the vanishing gradient problem and dropout to address overfitting issues. (11) Lglobal = DNN(1, Lkmer) (12) Pglobal = DNN(2, Pkmer) where DNN 1 and DNN 2 are deep neural networks constructed by stacking multiple layers in the arrangement of dropout layer, fully connected layer, and LeakyReLU activation function. #08********08 #] k-mer encoding 10Feb2024 https://en.wikipedia.org/wiki/K-mer In bioinformatics, k-mers are substrings of length k {\displaystyle k} contained within a biological sequence. Primarily used within the context of computational genomics and sequence analysis, in which k-mers are composed of nucleotides (i.e. A, T, G, and C), k-mers are capitalized upon to assemble DNA sequences,[1] improve heterologous gene expression,[2][3] identify species in metagenomic samples,[4] and create attenuated vaccines.[5] Usually, the term k-mer refers to all of a sequence's subsequences of length k {\displaystyle k}, such that the sequence AGAT would have four monomers (A, G, A, and T), three 2-mers (AG, GA, AT), two 3-mers (AGA and GAT) and one 4-mer (AGAT). More generally, a sequence of length L {\displaystyle L} will have L − k + 1 {\displaystyle L-k+1} k-mers and n k {\displaystyle n^{k}} total possible k-mers, where n {\displaystyle n} is number of possible monomers (e.g. four in the case of DNA). #08********08 #] one-hot extraction 10Feb2024 https://en.wikipedia.org/wiki/One-hot In digital circuits and machine learning, a one-hot is a group of bits among which the legal combinations of values are only those with a single high (1) bit and all the others low (0).[1] A similar implementation in which all bits are '1' except one '0' is sometimes called one-cold.[2] In statistics, dummy variables represent a similar technique for representing categorical data. # enddoc