Postgresql Indexes

Indexes

What

  • a data structure, can make some queries fast

Why

  • cause, without index every query will be a sequential scan from disk
  • whole point of index is to make query faster than O(n)
  • so why not create an index for each type of query, cause indexes makes updating slow as we have to update indexes
  • it also increases disk usage, slows down backup and restore

Single Column Index

  • index made on a single attribute

Multi Column Index

  • index made on a group of attribute

Partial Index

  • index made on a subset of table
  • ex: index on all the incomplete appointments in appointments table

B+ Tree Index

  • uses B+ Tree for creating indexes
  • default in postgresql
  • stores pointer to heap file and not actual data
  • better to store actual data for primary index as reading from heap file is another disk i/o
  • used for query with operations like <, <= , = ,>=,>
  • can be used for range queries, first go to the start of range in O(log(n)) and then traverse to next node
  • can be used for patterns like ^foo or foo% (constant prefix) but not with %foo as BTree are sorted based on first char
  • not suitable for large values of key as btree copies the key value in nodes -> less keys per node -> tree with more depth

Hash Index

  • uses hash table instead of btree
  • PostgreSQL’s hash function maps any database value to a 32-bit integer, the hash code (about 4 billion possible hash codes)
  • hashfunction(index attribute) -> bucket which contains pointer to the rows -> actual data in table
  • can be smaller in size than btree
  • Hash index select and insert performance can be better than a B-Tree index
  • suitable for update heavy queries which uses equality operator
  • should be used with equality operator and not with comparative operators, example select * from table where key = 'hello'
  • suitable for large values as their too are converted to 32 bit int values

BRIN Index

  • Block range index
  • A block range is a group of pages that are physically adjacent in the table; for each block range, some summary info is stored by the index
  • ex: a table storing a ZIP code column might have all codes for a city grouped together naturally.
  • When you set up a BRIN index, PostgreSQL reads your selected column’s maximum and minimum values for each 8k page of stored data. PostgreSQL then stores just 3 pieces of information into the BRIN index, the page number, the minimum value and the maximum value for your chosen column.
  • more suitable for read heavy data, which is not updated frequently as updating can mess up ranges
  • used for very large datasets where the data we are searching is in blocks, like timestamps and date ranges.
  • size of index is very small

GIN Index

  • Generalized inverted index
  • used where we need to index composite value
  • used for jsonb and array and tsvector (full text search)
  • organizes keys (like normalized words) into btree
  • node of btree contains lexmins and these lexmins points to the tuple id they exist in
  • supports query with operators:
    • @> (contains)
    • <@ (contained)
    • && (overlap)
    • || (concat)

GiST Index

  • generalized search tree
  • a framework, not a single index
  • used for spatial data and geometrical data
  • postgresql includes GiST operator classes for several 2D geometric data types
  • capable of optimizing nearest neighbour searches
  • supports query with operators:
    • << (left side)
    • &< (not exceed to right)
    • &> (not exceed to left)
    • >> (right side)
    • <<| (below)
    • &<| (not exceed above)
    • |&> (not exceed below)
    • |>> (above)
    • @> (contains)
    • <@ (contained)
    • ~= (same)
    • && (overlap)

Bloom Index

  • sort of like hash but different
  • uses Bloom Filters
  • used for multicolumn indexing

#database #tech