Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genome accessibility/callability #1219

Open
mufernando opened this issue May 22, 2024 · 1 comment
Open

Genome accessibility/callability #1219

mufernando opened this issue May 22, 2024 · 1 comment

Comments

@mufernando
Copy link

mufernando commented May 22, 2024

It is important to consider genome accessibility when computing rates from genomic data.

scikit-allel has options to include an "accessibility mask", a boolean array indicating whether a base is accessible or not, and can be used to properly normalize quantities.

I found mentions of implementing this in #341

I am happy to help make this happen, but since I am new to the codebase I'd need some hand-helding... Ideally we would need a way of reading BED files which can be attached to the genotype dataset. Then, when computing per base statistics, we would need to intersect the accessible intervals with the windows intervals to get the right denominator.

@jeromekelleher
Copy link
Collaborator

Sounds like adding a bed2zarr command to vcf2zarr would be a great starting point - fancy taking it on???

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants