Patent Issued for Methods to compress, encrypt and retrieve genomic alignment data (USPTO 11393559): Sophia Genetics S.A. - Insurance News | InsuranceNewsNet

InsuranceNewsNet — Your Industry. One Source.™

Sign in
  • Subscribe
  • About
  • Advertise
  • Contact
Home Now reading Newswires
Topics
    • Advisor News
    • Annuity Index
    • Annuity News
    • Companies
    • Earnings
    • Fiduciary
    • From the Field: Expert Insights
    • Health/Employee Benefits
    • Insurance & Financial Fraud
    • INN Magazine
    • Insiders Only
    • Life Insurance News
    • Newswires
    • Property and Casualty
    • Regulation News
    • Sponsored Articles
    • Washington Wire
    • Videos
    • ———
    • About
    • Advertise
    • Contact
    • Editorial Staff
    • Newsletters
  • Exclusives
  • NewsWires
  • Magazine
  • Newsletters
Sign in or register to be an INNsider.
  • AdvisorNews
  • Annuity News
  • Companies
  • Earnings
  • Fiduciary
  • Health/Employee Benefits
  • Insurance & Financial Fraud
  • INN Exclusives
  • INN Magazine
  • Insurtech
  • Life Insurance News
  • Newswires
  • Property and Casualty
  • Regulation News
  • Sponsored Articles
  • Video
  • Washington Wire
  • Life Insurance
  • Annuities
  • Advisor
  • Health/Benefits
  • Property & Casualty
  • Insurtech
  • About
  • Advertise
  • Contact
  • Editorial Staff

Get Social

  • Facebook
  • X
  • LinkedIn
Newswires
Newswires RSS Get our newsletter
Order Prints
August 4, 2022 Newswires
Share
Share
Tweet
Email

Patent Issued for Methods to compress, encrypt and retrieve genomic alignment data (USPTO 11393559): Sophia Genetics S.A.

Insurance Daily News

2022 AUG 04 (NewsRx) -- By a News Reporter-Staff News Editor at Insurance Daily News -- A patent by the inventors Ayday, Erman (Renens, CH), Garcia, Jesus (Saint Sulpice, CH), Huang, Zhicong (Saint Sulpice, CH), Hubaux, Jean-Pierre (Saint-Sulpice, CH), Lin, Huang (Saint Sulpice, CH), Molyneaux, Adam (Blonay, CH), filed on March 8, 2017, was published online on July 19, 2022, according to news reporting originating from Alexandria, Virginia, by NewsRx correspondents.

Patent number 11393559 is assigned to Sophia Genetics S.A. (Saint Sulpice, Switzerland).

The following quote was obtained by the news editors from the background information supplied by the inventors: “Next-Generation Sequencing Data Processing

“Next-generation sequencing (NGS) or massively parallel sequencing (MPS) technologies have significantly decreased the cost of DNA sequencing in the past decade. NGS has broad application in biology and dramatically changed the way of research or diagnosis methodologies. Advances in high-throughput sequencing technologies are spurring the production of a huge amount of genomic data. For example, the 1000 Genomes Project generated more data in its first six months than the NCBI Genbank database had accumulated in 21 years of existence. As of 2007, when the first high-throughput sequencing technology was released to the market, the growth rate of genomic data has outpaced Moore’s law-more than doubling each year (http://www.genome.gov/sequencingcosts/). For example, the HiSeq X Ten System, released by Illumina in 2014, can deliver over 18,000 human genomes per year, at the price of $1000 per genome. Big data researchers estimate the current worldwide sequencing capacity to exceed 35 petabases per year. Furthermore, it is currently estimated that for every 3 billion bases of human genome sequence, 30-fold more data (about 100 gigabases) must be collected because of errors in sequencing and alignment. Even nowadays, more than 100 petabytes of storage are already used by 20 largest institutions; this corresponds to more than 1 million dollars of storage maintenance cost if we consider the Amazon cloud storage pricing (https://aws.amazon.com/s3/pricing/). This number continues to grow and 2-40 exabytes of storage capacity will be needed by 2025 for the human genomes. Hundreds of thousands of human genomes will be sequenced in the coming years, which necessitates more efficient compression approaches to genomic data storage.

“Moreover, next generation sequencing data are more and more used as a tool in medical practice such as routine diagnosis, where security and privacy come as a major concern. The main threats to genomic data are (i) the disclosure of an individual’s genetic characteristics due to the leakage of his/her genomic data and (ii) the identification of an individual from his/her own genome sequence. For example, as part of a clinical trial, the genetic information of a patient, once leaked, could be linked to the disease under study (or to other diseases), which can have serious consequences such as denial of access to life insurance or to employment for the individual participant. There is therefore a need for more secure genomic data management methods that address the privacy threat models that are specific to the genomic data processing systems and workflows.

“Next Generation Sequencing Data Formats and Workflows

“Next generation sequencers typically output a series of short reads, a few hundred nucleotides sequences with the associated quality score estimates in data files such as the FASTQ files. This raw sequencing data is further analyzed in the bioinformatics pipeline by aligning the raw short reads to a reference genome, and identifying the specific variants as the differences relative to the reference genome.

“In general, geneticists prefer storing aligned, raw genomic data of the patients, in addition to their variant calls (which include each nucleotide on the DNA sequence once, hence is much more compact). Sequence alignment/map files such as the human readable SAM files and their more compact, machine-readable binary version BAM files are the de facto standards used for DNA alignment data produced by next-generation DNA sequencers (http://samtools.github.io/hts-specs/SAMv1.pdf). There are hundreds of millions of short sequencing reads (each including between 100 and 400 nucleotides) in the SAM file of a patient. Each nucleotide is present in several short reads in order to have statistically high coverage of each patient’s DNA.

“Genomic Data Compression

“There are different approaches to dealing with the compression of genomic data. Before high-throughput technologies were introduced, there existed algorithms designed for compressing genomic sequences of relatively small size (e.g., tens of megabases), for instance BioCompress (in Grumbach, S. & Tahi, F. Compression of DNA sequences, in Data Compression Conference, 1993. DCC ‘93. 340-350), GenCompress (in Chen, X., Kwong, S. & Li, M. A Compression algorithm for DNA sequences and its applications in genome comparison. in Proceedings of the Fourth Annual International Conference on Computational Molecular Biology 107-ACM, 2000), and DNACompress (in Chen, X., Li, M., Ma, B. & Tromp, J. DNACompress: Fast and effective DNA sequence compression. Bioinformatics 18, 1696-1698-2002). These compression algorithms exploit the redundancy within DNA sequences and compress the data by identifying highly repetitive subsequences. The next generation sequencing technologies however pose new challenges for the compression of genomic data in terms of data size and structure. Due to the high similarity of DNA sequences among individuals, it is inefficient to store and transfer entirely a newly assembled genomic sequence because more than 99% of the data for two assembled human genomes are redundant. In Christley, S., Lu, Y., Li, C. & Xie, X. Human genomes as email attachments. Bioinformatics 25, 274-275 (2009), Christley et al proposed to store DNAzip, a reference-based compression algorithm where only differences to a reference sequence are stored. In next generation sequencing, individual sequenced data are typically organized as millions of short reads that represent short sequences, each of which comprises between 100 and 400 bases (nucleotides). Each genomic position is usually covered by multiple short reads (coverage). Li et al. (Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078-2079-2009) proposed to apply a general-purpose compression algorithm, such as BGZF (Blocked GNU Zip format-http//samtools.github.io/hts-specs/SAMv1.pdf), to these datasets as the basis for the the BAM format, the binary version of the SAM format, which is still the de facto standard of storing aligned short reads.

“More recently, various advanced compression algorithms have been proposed to further improve the compression of high-throughput DNA sequence data, such as Quip (Jones, D. C., Ruzzo, W. L., Peng, X. & Katze, M. G. Compression of next-generation sequencing reads aided by highly efficient de novo assembly. Nucl. Acids Res. gks754-2012), Samcomp (Bonfield, J. K. & Mahoney, M. V. Compression of FASTQ and SAM Format Sequencing Data. PLoS ONE 8, e59190-2013), HUGO (Li, P. et al. HUGO: Hierarchical Multi-reference Genome Compression for aligned reads. Journal of the American Medical Informatics Association 21, 363-373-2014), and CRAM, a reference-based compression algorithm for aligned data (Fritz, M. H.-Y., Leinonen, R., Cochrane, G. & Birney, E. Efficient storage of high throughput DNA sequencing data using reference-based compression. Genome Res. 21, 734-740-2011). CRAM is used for instance by the 1000 genomes project (http://www.1000genomes.org/). Most of these algorithms use conventional entropy coding techniques, such as Huffman variable-length encoding, Golomb, or arithmetic coding, to compress the metadata text strings (e.g., read name, position, mapping quality, etc.). Recently, Massie et al. (Massie, M. et al. Adam: Genomics formats and processing patterns for cloud scale computing. EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2013-207-2013) proposed ADAM, a cloud-computing framework for genomic data compression, which combines various data compression engineering techniques such as dictionary coding and gzip compression in combination with distributed processing to reduce 25% of the storage costs compared to the BAM de facto standard. The ADAM scheme also achieves significant (2-10 x ) speedup in decompression performance for genomics data access patterns.

“Genomic Data Security

“Some genomic data encryption solutions have been proposed on top of some compression algorithms, such as for instance the encryption option in cramtools for the CRAM genomic data compression format (http://www.ebi.ac.uk/ena/software/cram-toolkit), but they remain straightforward applications of encryption standards and do not take into consideration the specific genomic data storage and genomic data processing threat models even if the solution uses highly secure encryption primitives (e.g., the AES encryption method). In particular, the data retrieval process may cause incidental leakage of sensitive genomic information. Once leaked, genomic information could be abused in various ways, such as for denial of employment and health insurance, blackmail or even genetic discrimination. Establishing a secure and privacy-preserving solution for genomic data storage is therefore needed in order to facilitate the trusted usage, storage and transmission of genomic data.”

There is additional summary information. Please visit full patent to read further.

In addition to the background information obtained for this patent, NewsRx journalists also obtained the inventors’ summary information for this patent: “Some embodiments of the present disclosure are directed to methods to encode genomic data alignment information organized as a read-based alignment information data stream, comprising the steps of: transposing, with a processor, the read-based alignment information data stream into a position-based alignment information data stream; encoding, with a processor, the position-based alignment information data stream into a reference-based compressed position data stream; and encrypting, with a processor, the reference-based compressed position data stream into a compressed encrypted alignment data stream.

“In some embodiments, encoding the position-based alignment information data stream into a reference-based compressed position data stream may comprise a step of differential encoding. In a possible embodiment, differential encoding may comprise recording, for each position in the reference-based compressed position data stream, the alignment differences relative to the alignment reference sequence. In a possible embodiment, encoding the position-based alignment information data stream into a reference-based compressed position data file may comprise a step of entropy coding.

“In some embodiments, encrypting the reference-based compression position data stream into a compressed encrypted alignment data stream may comprise a step of encrypting the position information with an order-preserving encryption scheme. In a possible embodiment, encrypting the reference-based compression position data stream into a compressed encrypted alignment data stream may comprise a step of encrypting the position-based alignment information with a symmetric encryption scheme. The symmetric encryption scheme may be a stream cipher, such as the AES scheme in CTR mode.

“Some embodiments of the present disclosure are directed to methods to retrieve genomic data alignment information from a compressed encrypted alignment data stream, recorded on a storage unit, comprising the steps of: receiving a genomic alignment range query [Pos1, Pos2] from a genomic data analysis system; retrieving from the storage unit, with a processor, the subset of the compressed encrypted alignment data stream corresponding to the genomic alignment range [Pos1, Pos2] in the compressed encrypted alignment data stream; decrypting, with a processor, the compressed encrypted alignment data stream into a reference-based compressed position data stream corresponding to the genomic alignment range [Pos1, Pos2]; and decoding, with a processor, the reference-based compressed position data stream into a position-based alignment information data stream corresponding to the genomic alignment range [Pos1, Pos2].

“In a possible embodiment, retrieving genomic data alignment information from a compressed encrypted alignment data stream, recorded on a storage unit, may further comprise a step of reverse transposing, with a processor, the position-based alignment information data stream into a read-based alignment information data.”

The claims supplied by the inventors are:

“1. A method to encode genomic data alignment information organized as a read-based alignment information data stream, comprising: Transposing, with a processor, the read-based alignment information data stream into a position-based alignment information data stream, wherein a character is a start marker for each short read in the position-based alignment information data stream, the start marker followed by metadata information regarding at least a nucleotide base identified at a position with an associated quality score; Encoding, with a processor, the position-based alignment information data stream into a reference-based compressed position data stream; and Encrypting, with a processor, the reference-based compressed position data stream into a compressed encrypted alignment data stream, including independently encrypting variant information for each row of a data structure in a storage that stores the compressed encrypted alignment data stream, providing privacy control of specific compressed encrypted alignment data within the stored compressed encrypted alignment data stream, the encrypting the reference-based compressed position data stream into a compressed encrypted alignment data stream comprising, first, an order-preserving encryption scheme, and second, encrypting sensitive information at each position, wherein the method results in increased storage efficiency or faster genomic data queries.

“2. The method of claim 1, wherein encoding the position-based alignment information data stream into a reference-based compressed position data stream comprises differential encoding.

“3. The method of claim 2, wherein differential encoding comprises recording, for each position in the reference-based compressed position data stream, the alignment differences relative to the alignment reference sequence, and wherein only the differences for each position with respect to the reference-based compressed position data stream are recorded.

“4. The method of claim 1, wherein encoding the position-based alignment information data stream into a reference-based compressed position data file further comprises entropy coding.

“5. The method of claim 1, wherein the order preserving encryption scheme is configured to retrieve resulting encrypted data for each row of the data structure without decrypting a whole block data.

“6. The method of claim 1, wherein encrypting the reference-based compression position data stream into a compressed encrypted alignment data stream comprises encrypting the position-based alignment information with a symmetric encryption scheme.

“7. The method of claim 6, wherein the symmetric encryption scheme is a stream cipher.

“8. The method of claim 7, wherein the symmetric encryption scheme is a block cipher operating in a stream cipher mode.

“9. A method to retrieve genomic data alignment information from a compressed encrypted alignment data stream, recorded on a storage, comprising: Receiving a genomic alignment range query [Pos1, Pos2] from a genomic data analysis system; Retrieving from the storage, with a processor, the subset of the compressed encrypted alignment data stream corresponding to the genomic alignment range [Pos1, Pos2] in the compressed encrypted alignment data stream; Decrypting, with a processor, the compressed encrypted alignment data stream into a reference-based compressed position data stream corresponding to the genomic alignment range [Pos1, Pos2], including independently decrypting variant information for each row of a data structure in the storage that stores the compressed encrypted alignment data stream, providing privacy control of specific compressed encrypted alignment data within the stored compressed encrypted alignment data stream; and Decoding, with a processor, the reference-based compressed position data stream into a position-based alignment information data stream corresponding to the genomic alignment range [Pos1, Pos2], wherein the method results in increased storage efficiency or faster genomic data queries, and wherein decoding the reference-based compressed position data stream comprises retrieving a metadata information block and decoding the metadata information block in accordance with an encoding embodiment.

“10. The method of claim 9, further comprising: reverse transposing, with a processor, the position-based alignment information data stream into a read-based alignment information data, wherein a character is a start marker for each short read in the portion-based alignment information data stream, the start marker followed by metadata information regarding at least a nucleotide base identified at a position with an associated quality score.

“11. The method of claim 9, wherein retrieving the subset of the compressed encrypted alignment data stream for the genomic alignment range [Pos1, Pos2] comprises retrieving the symmetric encrypted data and the metadata stored in data blocks between the order-preserving encrypted position associated with Pos1 and the order-preserving encrypted position associated with Pos2.

“12. The method of claim 11, wherein decrypting the compressed encrypted alignment data stream into a reference-based compressed position data stream corresponding to the genomic alignment range [Pos1, Pos2] comprises symmetric decryption of the symmetric encrypted data between the order-preserving encrypted position associated with Pos1 and the order-preserving encrypted position associated with Pos2.

“13. The method of claim 12, wherein the symmetric decryption scheme is a stream decipher.

“14. The method of claim 12, wherein the symmetric decryption scheme is a block decipher operating in a stream decipher mode.

“15. The method of claim 9, wherein decoding the position-based alignment information data stream into reference-based compressed position data stream comprises entropy decoding.

“16. The method of claim 9, wherein decoding the position-based alignment information data stream into reference-based compressed position data stream comprises differential decoding.

“17. The method of claim 1, wherein encoding the position-based alignment information data stream into a reference-based compressed position data file further comprises text coding algorithms, and wherein the reference-based compressed position data file is a compact binary reference-based compressed position data file.

“18. The method of claim 1, wherein encoding the position-based alignment information data stream into a reference-based compressed position data file further comprises variable length coding, wherein the variable length coding is configured to compress differences found in reference-based compression, and wherein the variable length coding is configured to compress differences found in mapping quality scores.

“19. The method of claim 15, wherein the entropy decoding is VLC decoding.

“20. The method of claim 9, wherein the encoding embodiment is a gunzip reverse algorithm, and wherein decoding comprises concatenating the reference-based compressed position data stream, the position-based alignment information data stream, and the metadata information block to reconstruct the genomic data alignment information.”

URL and more information on this patent, see: Ayday, Erman. Methods to compress, encrypt and retrieve genomic alignment data. U.S. Patent Number 11393559, filed March 8, 2017, and published online on July 19, 2022. Patent URL: http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=11393559.PN.&OS=PN/11393559RS=PN/11393559

(Our reports deliver fact-based news of research and discoveries from around the world.)

Older

Patent Issued for Unmanned vehicle security guard (USPTO 11392145): United Services Automobile Association

Newer

Washington Utilities & Transportation Commission Issues Penalty Assessment Involving JFS Transport

Advisor News

  • Global economic growth will moderate as the labor force shrinks
  • Estate planning during the great wealth transfer
  • Main Street families need trusted financial guidance to navigate the new Trump Accounts
  • Are the holidays a good time to have a long-term care conversation?
  • Gen X unsure whether they can catch up with retirement saving
More Advisor News

Annuity News

  • Pension buy-in sales up, PRT sales down in mixed Q3, LIMRA reports
  • Life insurance and annuities: Reassuring ‘tired’ clients in 2026
  • Insurance Compact warns NAIC some annuity designs ‘quite complicated’
  • MONTGOMERY COUNTY MAN SENTENCED TO FEDERAL PRISON FOR DEFRAUDING ELDERLY VICTIMS OF HUNDREDS OF THOUSANDS OF DOLLARS
  • New York Life continues to close in on Athene; annuity sales up 50%
More Annuity News

Health/Employee Benefits News

  • Stafford woman's premiums set to rise to $2,240 a month Stafford woman's premiums set to rise to $2,240 a month
  • Dec. 15 last day for ACA health coverage starting Jan. 1
  • Tim Walz says Minnesota is auditing payments in Medicaid programs vulnerable to fraudsters. But the scope of the audit is quite limited
  • Higher cost, worse coverage: Affordable Care Act enrollees say expiring subsidies will hit them hard
  • Senators Budd and Cruz Introduce Legislation to Increase Affordable Healthcare Coverage Options for Americans
Sponsor
More Health/Employee Benefits News

Life Insurance News

  • Legals for December, 12 2025
  • AM Best Affirms Credit Ratings of Manulife Financial Corporation and Its Subsidiaries
  • AM Best Upgrades Credit Ratings of Starr International Insurance (Thailand) Public Company Limited
  • PROMOTING INNOVATION WHILE GUARDING AGAINST FINANCIAL STABILITY RISKS ˆ SPEECH BY RANDY KROSZNER
  • Life insurance and annuities: Reassuring ‘tired’ clients in 2026
More Life Insurance News

- Presented By -

Top Read Stories

More Top Read Stories >

NEWS INSIDE

  • Companies
  • Earnings
  • Economic News
  • INN Magazine
  • Insurtech News
  • Newswires Feed
  • Regulation News
  • Washington Wire
  • Videos

FEATURED OFFERS

Slow Me the Money
Slow down RMDs … and RMD taxes … with a QLAC. Click to learn how.

ICMG 2026: 3 Days to Transform Your Business
Speed Networking, deal-making, and insights that spark real growth — all in Miami.

Your trusted annuity partner.
Knighthead Life provides dependable annuities that help your clients retire with confidence.

Press Releases

  • National Life Group Announces Leadership Transition at Equity Services, Inc.
  • SandStone Insurance Partners Welcomes Industry Veteran, Rhonda Waskie, as Senior Account Executive
  • Springline Advisory Announces Partnership With Software And Consulting Firm Actuarial Resources Corporation
  • Insuraviews Closes New Funding Round Led by Idea Fund to Scale Market Intelligence Platform
  • ePIC University: Empowering Advisors to Integrate Estate Planning Into Their Practice With Confidence
More Press Releases > Add Your Press Release >

How to Write For InsuranceNewsNet

Find out how you can submit content for publishing on our website.
View Guidelines

Topics

  • Advisor News
  • Annuity Index
  • Annuity News
  • Companies
  • Earnings
  • Fiduciary
  • From the Field: Expert Insights
  • Health/Employee Benefits
  • Insurance & Financial Fraud
  • INN Magazine
  • Insiders Only
  • Life Insurance News
  • Newswires
  • Property and Casualty
  • Regulation News
  • Sponsored Articles
  • Washington Wire
  • Videos
  • ———
  • About
  • Advertise
  • Contact
  • Editorial Staff
  • Newsletters

Top Sections

  • AdvisorNews
  • Annuity News
  • Health/Employee Benefits News
  • InsuranceNewsNet Magazine
  • Life Insurance News
  • Property and Casualty News
  • Washington Wire

Our Company

  • About
  • Advertise
  • Contact
  • Meet our Editorial Staff
  • Magazine Subscription
  • Write for INN

Sign up for our FREE e-Newsletter!

Get breaking news, exclusive stories, and money- making insights straight into your inbox.

select Newsletter Options
Facebook Linkedin Twitter
© 2025 InsuranceNewsNet.com, Inc. All rights reserved.
  • Terms & Conditions
  • Privacy Policy
  • InsuranceNewsNet Magazine

Sign in with your Insider Pro Account

Not registered? Become an Insider Pro.
Insurance News | InsuranceNewsNet