[{"data":1,"prerenderedAt":1002},["ShallowReactive",2],{"content-query-Lz7IV4SBah":3},{"_path":4,"_dir":5,"_draft":6,"_partial":6,"_locale":7,"title":8,"description":9,"date":10,"draft":6,"tags":11,"thumbnail":15,"alt_description":16,"slug":17,"body":18,"_type":996,"_id":997,"_source":998,"_file":999,"_stem":1000,"_extension":1001},"/posts/a-guide-to-plink-data-in-sql","posts",false,"","A Guide to Manage Bioinformatics Data in SQL Database","How to work with genotype data in databases","2025-01-04T00:00:00.000Z",[12,13,14],"sql","plink","bioinformatics","/img/a_guide_to_plink_data_in_sql.png","Getting better with databases as bioinformaticians","a-guide-to-plink-data-in-sql",{"type":19,"children":20,"toc":969},"root",[21,44,49,54,59,79,84,91,98,103,109,114,120,125,131,136,142,147,160,165,174,185,191,196,205,214,223,232,250,255,264,273,282,291,296,305,314,323,332,341,348,353,362,371,377,382,391,398,404,409,418,427,436,445,454,461,466,472,481,486,495,500,509,516,522,527,536,541,550,559,566,572,577,586,591,597,602,611,620,629,636,642,647,656,665,674,681,686,692,708,720,727,736,741,748,753,760,765,770,780,789,798,805,810,819,827,836,841,848,854,859,868,875,880,886,891,900,909,915,920,925,943,948,953,964],{"type":22,"tag":23,"props":24,"children":25},"element","blockquote",{},[26,33,38],{"type":22,"tag":27,"props":28,"children":29},"p",{},[30],{"type":31,"value":32},"text","Dedicated to Tamerlan.",{"type":22,"tag":27,"props":34,"children":35},{},[36],{"type":31,"value":37},"The world belongs to those who believe in the beauty of their dreams",{"type":22,"tag":27,"props":39,"children":41},{"align":40},"right",[42],{"type":31,"value":43},"-- Not Random Indonesian Girl",{"type":22,"tag":27,"props":45,"children":46},{},[47],{"type":31,"value":48},"Many bioinformaticians excel at processing genetic data but have limited exposure to modern database practices. This tutorial aims to help laboratory specialists enhance their data management skills by building a practical SQLite database for PLINK genotype data.",{"type":22,"tag":27,"props":50,"children":51},{},[52],{"type":31,"value":53},"PLINK data, widely used in genetic analysis for applications like disease risk assessment and pharmacogenomics, typically exists in text-based formats. We'll demonstrate how to transform this data into a queryable SQL database using Python, following current best practices. This approach will introduce bioinformatics professionals to essential database skills while working with familiar genetic data.",{"type":22,"tag":27,"props":55,"children":56},{},[57],{"type":31,"value":58},"Our step-by-step guide will cover:",{"type":22,"tag":60,"props":61,"children":62},"ul",{},[63,69,74],{"type":22,"tag":64,"props":65,"children":66},"li",{},[67],{"type":31,"value":68},"Setting up a Python project for database operations",{"type":22,"tag":64,"props":70,"children":71},{},[72],{"type":31,"value":73},"Converting PLINK text files to SQLite format",{"type":22,"tag":64,"props":75,"children":76},{},[77],{"type":31,"value":78},"Accessing the database through DBeaver",{"type":22,"tag":27,"props":80,"children":81},{},[82],{"type":31,"value":83},"This tutorial is designed for bioinformaticians and other Data Clerks looking to expand their technical toolkit without disrupting their current workflow.",{"type":22,"tag":85,"props":86,"children":88},"h2",{"id":87},"python-project-components",[89],{"type":31,"value":90},"Python Project Components",{"type":22,"tag":92,"props":93,"children":95},"h3",{"id":94},"fastapi",[96],{"type":31,"value":97},"FastAPI",{"type":22,"tag":27,"props":99,"children":100},{},[101],{"type":31,"value":102},"Imagine our web application being a receptionist, whenever someone requests data, FastAPI handles it in a super fast manner (hence the name) making it easy to create APIs, which is a way different programs talk to each other. In our example when we want to store PLINK data into a database, FastAPI would handle that request and send back the results.",{"type":22,"tag":92,"props":104,"children":106},{"id":105},"sqlmodel",[107],{"type":31,"value":108},"SQLModel",{"type":22,"tag":27,"props":110,"children":111},{},[112],{"type":31,"value":113},"Think of it as a translator between your Python code and your database. It helps you work with your database and define precise structure for your PLINK data. Some experienced Data Specialists may consider it as an alternative to SQLAlchemy.",{"type":22,"tag":92,"props":115,"children":117},{"id":116},"uv",[118],{"type":31,"value":119},"UV",{"type":22,"tag":27,"props":121,"children":122},{},[123],{"type":31,"value":124},"And last, but not least the Python Package manager written in Rust, providing ease of use when it comes to start a project quick and clean. Thus might be considered as alternative to pip. It creates Git branch, virtual environment, keep track of your project dependencies and so much more.",{"type":22,"tag":85,"props":126,"children":128},{"id":127},"set-up",[129],{"type":31,"value":130},"Set up",{"type":22,"tag":27,"props":132,"children":133},{},[134],{"type":31,"value":135},"First we need to open our Terminal and install our components and set up the project, let's do this typing following commands into our terminal:",{"type":22,"tag":92,"props":137,"children":139},{"id":138},"install-uv",[140],{"type":31,"value":141},"Install UV",{"type":22,"tag":27,"props":143,"children":144},{},[145],{"type":31,"value":146},"if using Linux / Windows",{"type":22,"tag":148,"props":149,"children":154},"pre",{"className":150,"code":152,"language":153,"meta":7},[151],"language-bash","pip install uv\n","bash",[155],{"type":22,"tag":156,"props":157,"children":158},"code",{"__ignoreMap":7},[159],{"type":31,"value":152},{"type":22,"tag":27,"props":161,"children":162},{},[163],{"type":31,"value":164},"or using Mac",{"type":22,"tag":148,"props":166,"children":169},{"className":167,"code":168,"language":153,"meta":7},[151],"brew install uv\n",[170],{"type":22,"tag":156,"props":171,"children":172},{"__ignoreMap":7},[173],{"type":31,"value":168},{"type":22,"tag":27,"props":175,"children":176},{},[177,183],{"type":22,"tag":178,"props":179,"children":182},"img",{"alt":180,"src":181},"terminal_installation","/img/plink/plink_1.png",[],{"type":31,"value":184},"\nIn my case I have it installed, so nothing really happens here after the prompt.",{"type":22,"tag":92,"props":186,"children":188},{"id":187},"create-project",[189],{"type":31,"value":190},"Create Project",{"type":22,"tag":27,"props":192,"children":193},{},[194],{"type":31,"value":195},"Now let's initiate the project with UV",{"type":22,"tag":148,"props":197,"children":200},{"className":198,"code":199,"language":153,"meta":7},[151],"uv init plink_data\n",[201],{"type":22,"tag":156,"props":202,"children":203},{"__ignoreMap":7},[204],{"type":31,"value":199},{"type":22,"tag":27,"props":206,"children":207},{},[208,212],{"type":22,"tag":178,"props":209,"children":211},{"alt":180,"src":210},"/img/plink/plink_2.png",[],{"type":31,"value":213},"\nChange directory to a new project via \"cd plink_data\" and type \"ls\" to see files inside the project.",{"type":22,"tag":148,"props":215,"children":218},{"className":216,"code":217,"language":153,"meta":7},[151],"cd plink_data\nls\n",[219],{"type":22,"tag":156,"props":220,"children":221},{"__ignoreMap":7},[222],{"type":31,"value":217},{"type":22,"tag":27,"props":224,"children":225},{},[226,230],{"type":22,"tag":178,"props":227,"children":229},{"alt":180,"src":228},"/img/plink/plink_3.png",[],{"type":31,"value":231},"\nAs soon as we switched to plink_data project we can see three basic files here",{"type":22,"tag":60,"props":233,"children":234},{},[235,240,245],{"type":22,"tag":64,"props":236,"children":237},{},[238],{"type":31,"value":239},"hello.py",{"type":22,"tag":64,"props":241,"children":242},{},[243],{"type":31,"value":244},"pyproject.toml",{"type":22,"tag":64,"props":246,"children":247},{},[248],{"type":31,"value":249},"README.md",{"type":22,"tag":27,"props":251,"children":252},{},[253],{"type":31,"value":254},"We also have initialized git project. Let's explore it first",{"type":22,"tag":148,"props":256,"children":259},{"className":257,"code":258,"language":153,"meta":7},[151],"git status\n",[260],{"type":22,"tag":156,"props":261,"children":262},{"__ignoreMap":7},[263],{"type":31,"value":258},{"type":22,"tag":27,"props":265,"children":266},{},[267,271],{"type":22,"tag":178,"props":268,"children":270},{"alt":180,"src":269},"/img/plink/plink_4.png",[],{"type":31,"value":272},"\nGit says we are at master branch with no commits and couple of untracked files. If you don't know what Git is, then don't mind and let's keep up with our project. Let's kick it off",{"type":22,"tag":148,"props":274,"children":277},{"className":275,"code":276,"language":153,"meta":7},[151],"uv run hello.py\n",[278],{"type":22,"tag":156,"props":279,"children":280},{"__ignoreMap":7},[281],{"type":31,"value":276},{"type":22,"tag":27,"props":283,"children":284},{},[285,289],{"type":22,"tag":178,"props":286,"children":288},{"alt":180,"src":287},"/img/plink/plink_5.png",[],{"type":31,"value":290},"\nWe just ran our project with CPython, created virtual environment and received greetings from plink-data project. Good job so far !",{"type":22,"tag":27,"props":292,"children":293},{},[294],{"type":31,"value":295},"Now let's add our project components by running following command",{"type":22,"tag":148,"props":297,"children":300},{"className":298,"code":299,"language":153,"meta":7},[151],"uv add fastapi sqlmodel python-multipart uvicorn\n",[301],{"type":22,"tag":156,"props":302,"children":303},{"__ignoreMap":7},[304],{"type":31,"value":299},{"type":22,"tag":27,"props":306,"children":307},{},[308,312],{"type":22,"tag":178,"props":309,"children":311},{"alt":180,"src":310},"/img/plink/plink_6.png",[],{"type":31,"value":313},"\nAll components being installed and we can synchronize them",{"type":22,"tag":148,"props":315,"children":318},{"className":316,"code":317,"language":153,"meta":7},[151],"uv sync\n",[319],{"type":22,"tag":156,"props":320,"children":321},{"__ignoreMap":7},[322],{"type":31,"value":317},{"type":22,"tag":27,"props":324,"children":325},{},[326,330],{"type":22,"tag":178,"props":327,"children":329},{"alt":180,"src":328},"/img/plink/plink_7.png",[],{"type":31,"value":331},"\nAlso we can see the project dependencies structure",{"type":22,"tag":148,"props":333,"children":336},{"className":334,"code":335,"language":153,"meta":7},[151],"uv tree\n",[337],{"type":22,"tag":156,"props":338,"children":339},{"__ignoreMap":7},[340],{"type":31,"value":335},{"type":22,"tag":27,"props":342,"children":343},{},[344],{"type":22,"tag":178,"props":345,"children":347},{"alt":180,"src":346},"/img/plink/plink_8.png",[],{"type":22,"tag":27,"props":349,"children":350},{},[351],{"type":31,"value":352},"Our plink-data project and it's components like fastapi which depends on pydantic and starlette, sqlmodel depend on sqlalchemy and so on. Now let's activate our python virtual environment",{"type":22,"tag":148,"props":354,"children":357},{"className":355,"code":356,"language":153,"meta":7},[151],". .venv/bin/activate\n",[358],{"type":22,"tag":156,"props":359,"children":360},{"__ignoreMap":7},[361],{"type":31,"value":356},{"type":22,"tag":27,"props":363,"children":364},{},[365,369],{"type":22,"tag":178,"props":366,"children":368},{"alt":180,"src":367},"/img/plink/plink_9.png",[],{"type":31,"value":370},"\nBy following this steps we accomplished to set up our project in a couple of minutes without wasting our time on creating git project , virtual environment and declare our dependencies. UV made it for us, and it's bad ass. Now let's write some source code",{"type":22,"tag":85,"props":372,"children":374},{"id":373},"src",[375],{"type":31,"value":376},"SRC",{"type":22,"tag":27,"props":378,"children":379},{},[380],{"type":31,"value":381},"Let's create source directory where the main python code would live",{"type":22,"tag":148,"props":383,"children":386},{"className":384,"code":385,"language":153,"meta":7},[151],"mkdir src\ncd src\n",[387],{"type":22,"tag":156,"props":388,"children":389},{"__ignoreMap":7},[390],{"type":31,"value":385},{"type":22,"tag":27,"props":392,"children":393},{},[394],{"type":22,"tag":178,"props":395,"children":397},{"alt":180,"src":396},"/img/plink/plink_10.png",[],{"type":22,"tag":92,"props":399,"children":401},{"id":400},"database",[402],{"type":31,"value":403},"Database",{"type":22,"tag":27,"props":405,"children":406},{},[407],{"type":31,"value":408},"Here we would need to define a database structure",{"type":22,"tag":148,"props":410,"children":413},{"className":411,"code":412,"language":153,"meta":7},[151],"nano database.py\n",[414],{"type":22,"tag":156,"props":415,"children":416},{"__ignoreMap":7},[417],{"type":31,"value":412},{"type":22,"tag":27,"props":419,"children":420},{},[421,425],{"type":22,"tag":178,"props":422,"children":424},{"alt":180,"src":423},"/img/plink/plink_11.png",[],{"type":31,"value":426},"\nHere we would need to write following",{"type":22,"tag":148,"props":428,"children":431},{"className":429,"code":430,"language":153,"meta":7},[151],"from sqlmodel import SQLModel, create_engine\n\nDATABASE_URL = \"sqlite:///genotypes.db\"\nengine = create_engine(DATABASE_URL)\n\ndef create_db_and_tables():\n    SQLModel.metadata.create_all(engine)\n",[432],{"type":22,"tag":156,"props":433,"children":434},{"__ignoreMap":7},[435],{"type":31,"value":430},{"type":22,"tag":27,"props":437,"children":438},{},[439,443],{"type":22,"tag":178,"props":440,"children":442},{"alt":180,"src":441},"/img/plink/plink_12.png",[],{"type":31,"value":444},"\nthen press Ctrl + X, and press \"Y\" and \"ENTER\" to save content",{"type":22,"tag":148,"props":446,"children":449},{"className":447,"code":448,"language":153,"meta":7},[151],"cat database.py\n",[450],{"type":22,"tag":156,"props":451,"children":452},{"__ignoreMap":7},[453],{"type":31,"value":448},{"type":22,"tag":27,"props":455,"children":456},{},[457],{"type":22,"tag":178,"props":458,"children":460},{"alt":180,"src":459},"/img/plink/plink_13.png",[],{"type":22,"tag":27,"props":462,"children":463},{},[464],{"type":31,"value":465},"I actually use bat, but it's an additional feature that has to be installed first, however cat would give you the same results, but without syntax highlight.",{"type":22,"tag":92,"props":467,"children":469},{"id":468},"models",[470],{"type":31,"value":471},"Models",{"type":22,"tag":148,"props":473,"children":476},{"className":474,"code":475,"language":153,"meta":7},[151],"nano models.py\n",[477],{"type":22,"tag":156,"props":478,"children":479},{"__ignoreMap":7},[480],{"type":31,"value":475},{"type":22,"tag":27,"props":482,"children":483},{},[484],{"type":31,"value":485},"The following code would create a class for GenotypeData, i.e the PLINK data structure",{"type":22,"tag":148,"props":487,"children":490},{"className":488,"code":489,"language":153,"meta":7},[151],"from datetime import datetime\nfrom typing import Optional\n\nfrom sqlmodel import Field, SQLModel\n\nclass GenotypeData(SQLModel, table=True):\n    id: Optional[int] = Field(default=None, primary_key=True)\n    family_id: str = Field(index=True)\n    individual_id: str = Field(index=True)\n    paternal_id: str\n    maternal_id: str\n    sex: int\n    phenotype: int\n    snp1: str\n    snp2: str\n    snp3: str\n    snp4: str\n    snp5: str\n    uploaded_at: datetime = Field(default_factory=datetime.utcnow)\n",[491],{"type":22,"tag":156,"props":492,"children":493},{"__ignoreMap":7},[494],{"type":31,"value":489},{"type":22,"tag":27,"props":496,"children":497},{},[498],{"type":31,"value":499},"Save it with Ctrl + X, press \"Y\" and \"ENTER\", and check the content",{"type":22,"tag":148,"props":501,"children":504},{"className":502,"code":503,"language":153,"meta":7},[151],"cat models.py\n",[505],{"type":22,"tag":156,"props":506,"children":507},{"__ignoreMap":7},[508],{"type":31,"value":503},{"type":22,"tag":27,"props":510,"children":511},{},[512],{"type":22,"tag":178,"props":513,"children":515},{"alt":180,"src":514},"/img/plink/plink_14.png",[],{"type":22,"tag":92,"props":517,"children":519},{"id":518},"main",[520],{"type":31,"value":521},"Main",{"type":22,"tag":27,"props":523,"children":524},{},[525],{"type":31,"value":526},"Create main python file",{"type":22,"tag":148,"props":528,"children":531},{"className":529,"code":530,"language":153,"meta":7},[151],"nano main.py\n",[532],{"type":22,"tag":156,"props":533,"children":534},{"__ignoreMap":7},[535],{"type":31,"value":530},{"type":22,"tag":27,"props":537,"children":538},{},[539],{"type":31,"value":540},"Pass the following code",{"type":22,"tag":148,"props":542,"children":545},{"className":543,"code":544,"language":153,"meta":7},[151],"from fastapi import FastAPI, UploadFile\nfrom sqlmodel import Session\n\nfrom .database import create_db_and_tables, engine\nfrom .models import GenotypeData\n\n\napp = FastAPI()\n\n@app.on_event(\"startup\")\n\ndef on_startup():\n create_db_and_tables()\n\n@app.post(\"/upload/\")\n\nasync def upload_file(file: UploadFile):\n content = (await file.read()).decode()\n with Session(engine) as session:\n  for line in content.splitlines():\n  fields = line.strip().split()\n  if not fields: # Skip empty lines\n   continue\n\n  genotype_data = GenotypeData(\n   family_id=fields[0],\n   individual_id=fields[1],\n   paternal_id=fields[2],\n   maternal_id=fields[3],\n   sex=int(fields[4]),\n   phenotype=int(fields[5]),\n   snp1=f\"{fields[6]} {fields[7]}\",\n   snp2=f\"{fields[8]} {fields[9]}\",\n   snp3=f\"{fields[10]} {fields[11]}\",\n   snp4=f\"{fields[12]} {fields[13]}\",\n   snp5=f\"{fields[14]} {fields[15]}\",\n  )\n  session.add(genotype_data)\n session.commit()\n\n return {\"message\": f\"Data from {file.filename} uploaded successfully\"}\n",[546],{"type":22,"tag":156,"props":547,"children":548},{"__ignoreMap":7},[549],{"type":31,"value":544},{"type":22,"tag":148,"props":551,"children":554},{"className":552,"code":553,"language":153,"meta":7},[151],"cat main.py\n",[555],{"type":22,"tag":156,"props":556,"children":557},{"__ignoreMap":7},[558],{"type":31,"value":553},{"type":22,"tag":27,"props":560,"children":561},{},[562],{"type":22,"tag":178,"props":563,"children":565},{"alt":180,"src":564},"/img/plink/plink_15.png",[],{"type":22,"tag":92,"props":567,"children":569},{"id":568},"init",[570],{"type":31,"value":571},"Init",{"type":22,"tag":27,"props":573,"children":574},{},[575],{"type":31,"value":576},"We also need a simple init file, this way we interpret whole src directory as the python package",{"type":22,"tag":148,"props":578,"children":581},{"className":579,"code":580,"language":153,"meta":7},[151],"touch __init__.py\n",[582],{"type":22,"tag":156,"props":583,"children":584},{"__ignoreMap":7},[585],{"type":31,"value":580},{"type":22,"tag":27,"props":587,"children":588},{},[589],{"type":31,"value":590},"And that's it.",{"type":22,"tag":92,"props":592,"children":594},{"id":593},"create-a-sample-data-or-use-your-own",[595],{"type":31,"value":596},"Create a sample data or use your own",{"type":22,"tag":27,"props":598,"children":599},{},[600],{"type":31,"value":601},"I will create a sample to ingest the data, if you have your own PLNIK data, feel free to upload your samples into the same folder we working on",{"type":22,"tag":148,"props":603,"children":606},{"className":604,"code":605,"language":153,"meta":7},[151],"nano sample.txt\n",[607],{"type":22,"tag":156,"props":608,"children":609},{"__ignoreMap":7},[610],{"type":31,"value":605},{"type":22,"tag":148,"props":612,"children":615},{"className":613,"code":614,"language":153,"meta":7},[151],"FAM1    IND1    0    0    1    2    A A    G G    A C    T T    A G\nFAM1    IND2    0    0    2    2    A G    G T    C C    T T    G G\nFAM2    IND3    0    0    1    1    G G    T T    C C    A T    G G\nFAM2    IND4    0    0    2    1    A G    G T    0 0    T T    A G\nFAM3    IND5    0    0    1    2    A A    G G    C C    T T    G G\n",[616],{"type":22,"tag":156,"props":617,"children":618},{"__ignoreMap":7},[619],{"type":31,"value":614},{"type":22,"tag":148,"props":621,"children":624},{"className":622,"code":623,"language":153,"meta":7},[151],"cat sample.txt\n",[625],{"type":22,"tag":156,"props":626,"children":627},{"__ignoreMap":7},[628],{"type":31,"value":623},{"type":22,"tag":27,"props":630,"children":631},{},[632],{"type":22,"tag":178,"props":633,"children":635},{"alt":180,"src":634},"/img/plink/plink_16.png",[],{"type":22,"tag":92,"props":637,"children":639},{"id":638},"upload-sample",[640],{"type":31,"value":641},"Upload sample",{"type":22,"tag":27,"props":643,"children":644},{},[645],{"type":31,"value":646},"First we need to launch our application with the uvicorn command",{"type":22,"tag":148,"props":648,"children":651},{"className":649,"code":650,"language":153,"meta":7},[151],"uvicorn src.main:app --reload\n",[652],{"type":22,"tag":156,"props":653,"children":654},{"__ignoreMap":7},[655],{"type":31,"value":650},{"type":22,"tag":27,"props":657,"children":658},{},[659,663],{"type":22,"tag":178,"props":660,"children":662},{"alt":180,"src":661},"/img/plink/plink_17.png",[],{"type":31,"value":664},"\nCool! The app is live and running. The nuance is that we have to keep this terminal in it's current state and open another terminal to ingest the file.\nIn the new terminal write the following command:",{"type":22,"tag":148,"props":666,"children":669},{"className":667,"code":668,"language":153,"meta":7},[151],"curl -X POST -F \"file=@sample.txt\" http://localhost:8000/upload/\n",[670],{"type":22,"tag":156,"props":671,"children":672},{"__ignoreMap":7},[673],{"type":31,"value":668},{"type":22,"tag":27,"props":675,"children":676},{},[677],{"type":22,"tag":178,"props":678,"children":680},{"alt":180,"src":679},"/img/plink/plink_18.png",[],{"type":22,"tag":27,"props":682,"children":683},{},[684],{"type":31,"value":685},"Congrats! Your data has been ingested.",{"type":22,"tag":85,"props":687,"children":689},{"id":688},"read-data-using-sql",[690],{"type":31,"value":691},"Read data using SQL",{"type":22,"tag":27,"props":693,"children":694},{},[695,697,706],{"type":31,"value":696},"First you need a program that will allow you access your database with SQL. My way to go with SQL is dbeaver, but you can use any other program such as Data Grip for example. I have it installed, if you don't go to official website to ",{"type":22,"tag":698,"props":699,"children":703},"a",{"href":700,"rel":701},"https://dbeaver.io/download/",[702],"nofollow",[704],{"type":31,"value":705},"download",{"type":31,"value":707}," and install it. Community version is free.",{"type":22,"tag":27,"props":709,"children":710},{},[711,713,718],{"type":31,"value":712},"This is how interface look like, click on the socket + sign to add the database\n",{"type":22,"tag":178,"props":714,"children":717},{"alt":715,"src":716},"dbeaver_interface","/img/plink/plink_19.png",[],{"type":31,"value":719},"\nChoose SQLite and press Next",{"type":22,"tag":27,"props":721,"children":722},{},[723],{"type":22,"tag":178,"props":724,"children":726},{"alt":715,"src":725},"/img/plink/plink_20.png",[],{"type":22,"tag":27,"props":728,"children":729},{},[730,732],{"type":31,"value":731},"Press Open\n",{"type":22,"tag":178,"props":733,"children":735},{"alt":715,"src":734},"/img/plink/plink_21.png",[],{"type":22,"tag":27,"props":737,"children":738},{},[739],{"type":31,"value":740},"Then choose genotype.db file and press open and then finish",{"type":22,"tag":27,"props":742,"children":743},{},[744],{"type":22,"tag":178,"props":745,"children":747},{"alt":715,"src":746},"/img/plink/plink_22.png",[],{"type":22,"tag":27,"props":749,"children":750},{},[751],{"type":31,"value":752},"Look at the bar where genotypes.db connection is chosen instead of N/A. You have to explicitly choose it.",{"type":22,"tag":27,"props":754,"children":755},{},[756],{"type":22,"tag":178,"props":757,"children":759},{"alt":715,"src":758},"/img/plink/plink_23.png",[],{"type":22,"tag":92,"props":761,"children":762},{"id":12},[763],{"type":31,"value":764},"SQL",{"type":22,"tag":27,"props":766,"children":767},{},[768],{"type":31,"value":769},"Now we can do some basic SELECT statements like so",{"type":22,"tag":148,"props":771,"children":775},{"className":772,"code":774,"language":12,"meta":7},[773],"language-sql","SELECT * FROM genotypedata\n",[776],{"type":22,"tag":156,"props":777,"children":778},{"__ignoreMap":7},[779],{"type":31,"value":774},{"type":22,"tag":27,"props":781,"children":782},{},[783,787],{"type":22,"tag":178,"props":784,"children":786},{"alt":715,"src":785},"/img/plink/plink_24.png",[],{"type":31,"value":788},"\nNow as we got all data at hand, let's explore some DML (Data Manipulation Language) functionality. For example we might need to see how many individuals are in each familiy",{"type":22,"tag":148,"props":790,"children":793},{"className":791,"code":792,"language":12,"meta":7},[773],"SELECT\n family_id,\n COUNT(*) as individual_count\nFROM genotypedata\nGROUP BY family_id;\n",[794],{"type":22,"tag":156,"props":795,"children":796},{"__ignoreMap":7},[797],{"type":31,"value":792},{"type":22,"tag":27,"props":799,"children":800},{},[801],{"type":22,"tag":178,"props":802,"children":804},{"alt":715,"src":803},"/img/plink/plink_25.png",[],{"type":22,"tag":27,"props":806,"children":807},{},[808],{"type":31,"value":809},"Or let's say we want to see only females with phenotype 2",{"type":22,"tag":148,"props":811,"children":814},{"className":812,"code":813,"language":12,"meta":7},[773],"SELECT *\nFROM genotypedata\nWHERE sex = 2\nAND phenotype = 2;\n",[815],{"type":22,"tag":156,"props":816,"children":817},{"__ignoreMap":7},[818],{"type":31,"value":813},{"type":22,"tag":27,"props":820,"children":821},{},[822],{"type":22,"tag":178,"props":823,"children":826},{"alt":824,"src":825},"dbeaver interface","/img/plink/plink_26.png",[],{"type":22,"tag":148,"props":828,"children":831},{"className":829,"code":830,"language":12,"meta":7},[773],"SELECT \n family_id, \n COUNT(*) as total_records, \n SUM(CASE WHEN sex = 1 THEN 1 ELSE 0 END) as male_count, \n SUM(CASE WHEN sex = 2 THEN 1 ELSE 0 END) as female_count \nFROM genotypedata\nGROUP BY family_id;\n",[832],{"type":22,"tag":156,"props":833,"children":834},{"__ignoreMap":7},[835],{"type":31,"value":830},{"type":22,"tag":27,"props":837,"children":838},{},[839],{"type":31,"value":840},"Get total records and split by sex",{"type":22,"tag":27,"props":842,"children":843},{},[844],{"type":22,"tag":178,"props":845,"children":847},{"alt":715,"src":846},"/img/plink/plink_27.png",[],{"type":22,"tag":92,"props":849,"children":851},{"id":850},"advanced-sql",[852],{"type":31,"value":853},"Advanced SQL",{"type":22,"tag":27,"props":855,"children":856},{},[857],{"type":31,"value":858},"Let's say we want to see Genotype Distribution by Phenotype analyzing relationships between genotypes and phenotypes",{"type":22,"tag":148,"props":860,"children":863},{"className":861,"code":862,"language":12,"meta":7},[773],"SELECT\n phenotype,\n snp1,\n COUNT(*) as count,\n ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (PARTITION BY phenotype), 2) as percentage\nFROM genotypedata\nGROUP BY phenotype, snp1\nORDER BY phenotype, count DESC;\n",[864],{"type":22,"tag":156,"props":865,"children":866},{"__ignoreMap":7},[867],{"type":31,"value":862},{"type":22,"tag":27,"props":869,"children":870},{},[871],{"type":22,"tag":178,"props":872,"children":874},{"alt":715,"src":873},"/img/plink/plink_28.png",[],{"type":22,"tag":27,"props":876,"children":877},{},[878],{"type":31,"value":879},"We can see that between snp1 of phenotype 1 is evenly distributed in 50 / 50, but not much for phenotype 2 where distribution is 67 / 33",{"type":22,"tag":92,"props":881,"children":883},{"id":882},"hardy-weinberg-equilibrium-hwe-check",[884],{"type":31,"value":885},"Hardy-Weinberg Equilibrium (HWE) Check",{"type":22,"tag":27,"props":887,"children":888},{},[889],{"type":31,"value":890},"It's based on a fundamental principle: in a stable population, the frequency of genotypes should follow a predictable pattern unless something is interfering.",{"type":22,"tag":148,"props":892,"children":895},{"className":893,"code":894,"language":12,"meta":7},[773],"WITH allele_counts AS (\n SELECT\n  COUNT(*) as total,\n  SUM(CASE WHEN snp1 LIKE 'A A' THEN 1 ELSE 0 END) as AA,\n  SUM(CASE WHEN snp1 LIKE 'A G' OR snp1 LIKE 'G A' THEN 1 ELSE 0 END) as AG,\n  SUM(CASE WHEN snp1 LIKE 'G G' THEN 1 ELSE 0 END) as GG\n FROM genotypedata\n)\nSELECT\n AA as observed_AA,\n AG as observed_AG,\n GG as observed_GG,\n ROUND(POWER((2*AA + AG)/(2.0*total), 2) * total, 2) as expected_AA,\n ROUND(2 * ((2*AA + AG)/(2.0*total)) * ((2*GG + AG)/(2.0*total)) * total, 2) as expected_AG,\n ROUND(POWER((2*GG + AG)/(2.0*total), 2) * total, 2) as expected_GG\nFROM allele_counts;\n",[896],{"type":22,"tag":156,"props":897,"children":898},{"__ignoreMap":7},[899],{"type":31,"value":894},{"type":22,"tag":27,"props":901,"children":902},{},[903,907],{"type":22,"tag":178,"props":904,"children":906},{"alt":824,"src":905},"/img/plink/plink_29.png",[],{"type":31,"value":908},"\nThe differences between observed and expected aren't large, but noticeable enough to warrant attention in quality control processes.",{"type":22,"tag":85,"props":910,"children":912},{"id":911},"wrapping-up-from-lab-benches-to-database-queries",[913],{"type":31,"value":914},"Wrapping Up: From Lab Benches to Database Queries 🧬",{"type":22,"tag":27,"props":916,"children":917},{},[918],{"type":31,"value":919},"We've come quite a journey from those text-based PLINK files to a fully-functional SQL database. Pretty cool transformation, right?",{"type":22,"tag":27,"props":921,"children":922},{},[923],{"type":31,"value":924},"Here's what you've accomplished:",{"type":22,"tag":60,"props":926,"children":927},{},[928,933,938],{"type":22,"tag":64,"props":929,"children":930},{},[931],{"type":31,"value":932},"Set up a modern Python project faster than you can say \"nucleotide sequencing\"",{"type":22,"tag":64,"props":934,"children":935},{},[936],{"type":31,"value":937},"Transformed genetic data into queryable gold using SQLite",{"type":22,"tag":64,"props":939,"children":940},{},[941],{"type":31,"value":942},"Learned to use use SQL queries (and even tackled Hardy-Weinberg equilibrium!)",{"type":22,"tag":27,"props":944,"children":945},{},[946],{"type":31,"value":947},"The best part? This is just the beginning. With your genetic data now living in a proper database, you've opened up a whole new world of possibilities for analysis and collaboration.",{"type":22,"tag":27,"props":949,"children":950},{},[951],{"type":31,"value":952},"Keep experimenting, keep querying, and most importantly - keep pushing the boundaries of what's possible with your data!",{"type":22,"tag":27,"props":954,"children":955},{},[956,958,962],{"type":31,"value":957},"Yours,",{"type":22,"tag":959,"props":960,"children":961},"br",{},[],{"type":31,"value":963},"\nBad Dog",{"type":22,"tag":27,"props":965,"children":966},{},[967],{"type":31,"value":968},"P.S. Remember: Every great bioinformatician started somewhere. Today, that somewhere was turning PLINK files into SQL magic! 🪄",{"title":7,"searchDepth":970,"depth":970,"links":971},2,[972,978,982,990,995],{"id":87,"depth":970,"text":90,"children":973},[974,976,977],{"id":94,"depth":975,"text":97},3,{"id":105,"depth":975,"text":108},{"id":116,"depth":975,"text":119},{"id":127,"depth":970,"text":130,"children":979},[980,981],{"id":138,"depth":975,"text":141},{"id":187,"depth":975,"text":190},{"id":373,"depth":970,"text":376,"children":983},[984,985,986,987,988,989],{"id":400,"depth":975,"text":403},{"id":468,"depth":975,"text":471},{"id":518,"depth":975,"text":521},{"id":568,"depth":975,"text":571},{"id":593,"depth":975,"text":596},{"id":638,"depth":975,"text":641},{"id":688,"depth":970,"text":691,"children":991},[992,993,994],{"id":12,"depth":975,"text":764},{"id":850,"depth":975,"text":853},{"id":882,"depth":975,"text":885},{"id":911,"depth":970,"text":914},"markdown","content:posts:a-guide-to-plink-data-in-sql.md","content","posts/a-guide-to-plink-data-in-sql.md","posts/a-guide-to-plink-data-in-sql","md",1775831731495]