2022-12-19

Postgresql Indexes

Indexes

cause, without index every query will be a sequential scan from disk
whole point of index is to make query faster than O(n)
so why not create an index for each type of query, cause indexes makes updating slow as we have to update indexes
it also increases disk usage, slows down backup and restore

uses B+ Tree for creating indexes
default in postgresql
stores pointer to heap file and not actual data
better to store actual data for primary index as reading from heap file is another disk i/o
used for query with operations like <, <= , = ,>=,>
can be used for range queries, first go to the start of range in O(log(n)) and then traverse to next node
can be used for patterns like ^foo or foo% (constant prefix) but not with %foo as BTree are sorted based on first char
not suitable for large values of key as btree copies the key value in nodes -> less keys per node -> tree with more depth

uses hash table instead of btree
PostgreSQL’s hash function maps any database value to a 32-bit integer, the hash code (about 4 billion possible hash codes)
hashfunction(index attribute) -> bucket which contains pointer to the rows -> actual data in table
can be smaller in size than btree
Hash index select and insert performance can be better than a B-Tree index
suitable for update heavy queries which uses equality operator
should be used with equality operator and not with comparative operators, example select * from table where key = 'hello'
suitable for large values as their too are converted to 32 bit int values

Block range index
A block range is a group of pages that are physically adjacent in the table; for each block range, some summary info is stored by the index
ex: a table storing a ZIP code column might have all codes for a city grouped together naturally.
When you set up a BRIN index, PostgreSQL reads your selected column’s maximum and minimum values for each 8k page of stored data. PostgreSQL then stores just 3 pieces of information into the BRIN index, the page number, the minimum value and the maximum value for your chosen column.
more suitable for read heavy data, which is not updated frequently as updating can mess up ranges
used for very large datasets where the data we are searching is in blocks, like timestamps and date ranges.
size of index is very small

Generalized inverted index
used where we need to index composite value
used for jsonb and array and tsvector (full text search)
organizes keys (like normalized words) into btree
node of btree contains lexmins and these lexmins points to the tuple id they exist in
supports query with operators:
- @> (contains)
- <@ (contained)
- && (overlap)
- || (concat)