../
Postgresql Indexes
Indexes
What
- a data structure, can make some queries fast
Why
- cause, without index every query will be a sequential scan from disk
- whole point of index is to make query faster than O(n)
- so why not create an index for each type of query, cause indexes makes updating slow as we have to update indexes
- it also increases disk usage, slows down backup and restore
Single Column Index
- index made on a single attribute
Multi Column Index
- index made on a group of attribute
Partial Index
- index made on a subset of table
- ex: index on all the incomplete appointments in appointments table
B+ Tree Index
- uses B+ Tree for creating indexes
- default in postgresql
- stores pointer to heap file and not actual data
- better to store actual data for primary index as reading from heap file is another disk i/o
- used for query with operations like <, <= , = ,>=,>
- can be used for range queries, first go to the start of range in O(log(n)) and then traverse to next node
- can be used for patterns like ^foo or foo% (constant prefix) but not with %foo as BTree are sorted based on first char
- not suitable for large values of key as btree copies the key value in nodes -> less keys per node -> tree with more depth
Hash Index
- uses hash table instead of btree
- PostgreSQL’s hash function maps any database value to a 32-bit integer, the hash code (about 4 billion possible hash codes)
- hashfunction(index attribute) -> bucket which contains pointer to the rows -> actual data in table
- can be smaller in size than btree
- Hash index select and insert performance can be better than a B-Tree index
- suitable for update heavy queries which uses equality operator
- should be used with equality operator and not with comparative operators, example
select * from table where key = 'hello'
- suitable for large values as their too are converted to 32 bit int values
BRIN Index
- Block range index
- A block range is a group of pages that are physically adjacent in the table; for each block range, some summary info is stored by the index
- ex: a table storing a ZIP code column might have all codes for a city grouped together naturally.
- When you set up a BRIN index, PostgreSQL reads your selected column’s maximum and minimum values for each 8k page of stored data. PostgreSQL then stores just 3 pieces of information into the BRIN index, the page number, the minimum value and the maximum value for your chosen column.
- more suitable for read heavy data, which is not updated frequently as updating can mess up ranges
- used for very large datasets where the data we are searching is in blocks, like timestamps and date ranges.
- size of index is very small
GIN Index
- Generalized inverted index
- used where we need to index composite value
- used for jsonb and array and tsvector (full text search)
- organizes keys (like normalized words) into btree
- node of btree contains lexmins and these lexmins points to the tuple id they exist in
- supports query with operators:
- @> (contains)
- <@ (contained)
- && (overlap)
- || (concat)
GiST Index
- generalized search tree
- a framework, not a single index
- used for spatial data and geometrical data
- postgresql includes GiST operator classes for several 2D geometric data types
- capable of optimizing nearest neighbour searches
- supports query with operators:
- << (left side)
- &< (not exceed to right)
- &> (not exceed to left)
- >> (right side)
- <<| (below)
- &<| (not exceed above)
- |&> (not exceed below)
- |>> (above)
- @> (contains)
- <@ (contained)
- ~= (same)
- && (overlap)
Bloom Index
- sort of like hash but different
- uses Bloom Filters
- used for multicolumn indexing