../

Writing a Dns Resolver from scratch in Haskell

What is DNS anyway

Every device on internet has an ip address associated to it, which is used to make a connection with it, but ip addresses doesn’t mean more than random numbers separated by decimal for humans so we came up with a method to map ip address of these resources to a human readable key

Components of DNS

Their are three parts to DNS

DNS vs. Distributed KV Stores

Anatomy of a DNS Packet

DNS uses two types of messages, queries (which we send to name server) and response (which we receive from nameserver), both of them have same format A DNS packet consist of 5 segments,

DNS Header

     0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
   |                      ID                       |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
   |QR|   Opcode  |AA|TC|RD|RA|   Z    |   RCODE   |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
   |                    QDCOUNT                    |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
   |                    ANCOUNT                    |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
   |                    NSCOUNT                    |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
   |                    ARCOUNT                    |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

headers consist of several flags and number of records in other sections QR, if the packet is a query (0) or response (1) OPCODE can be standard query (0), inverse query (1), or server status request (2) AA, if DNS server is authoritative for the queried host TC, if message is truncated due to excessive length or not RD, if query is a recursive query RA, if nameserver supports recursive query or not Z is reserved for future use RCODE indicates the response code like no error (0), format error (1), server fail (2), domain does not exist (3), etc

DNS Question

    0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
   |                     QNAME                     |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
   |                     QTYPE                     |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
   |                     QCLASS                    |
   +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

QNAME, name of requested resources QTYPE, types of question, A for IPv4 address, AAAA for IPv6 address, MX for mail exchange, CNAME for canonical name etc, our DNS resolver currently supports A record QCLASS, the class of query being made like internet (1, most widely used), chaos (3, rarely used) and hesiod (4, also rarely used)

DNS Resource Record

Each record has a type, expiration date and type specific data

    0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5
  +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
  |                      NAME                     |
  +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
  |                      TYPE                     |
  +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
  |                     CLASS                     |
  +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
  |                      TTL                      |
  +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
  |                    RDLENGTH                   |
  +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--|
  |                    RDATA                      |
  +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

NAME, domain name belonging to resource record TYPE, type of resource record similar to type of question CLASS, class of data similar to class of question TTL, time to live RDLENGTH, length of data in RR RDATA, contains specific data associated with type of RR, for A record it will contain IPv4 address

Communicating with a Nameserver

Their are a couple of steps we need for implementing any network protocol like,

DNS protocol specs

So the DNS runs on port 53 and uses UDP as a transport protocol, which is not as reliable as TCP and we might encounter packet loss. But why do DNS chooses to use UDP instead ? The answer is simple, Cost to establish connection. One of the things that makes TCP reliable is a 3 way handshake and it makes TCP connection establishment slower as compared to UDP. It is not acceptable for DNS resolver to do a handshake with nameserver for every fresh query.

Representing the DNS packet

We can easily represent DNS header, question, resource record and the complete DNS packet using custom data types in haskell like shown below

import qualified Data.ByteString.Char8 as BS
import Data.Word (Word16, Word32)

data DNSHeader = DNSHeader
  { dnsHeaderId :: Data.Word.Word16,
    dnsHeaderFlags :: Data.Word.Word16, -- QR OPCODE AA TC RD RA Z RCODE
    dnsHeaderNumQuestion :: Data.Word.Word16,
    dnsHeaderNumAnswer :: Data.Word.Word16,
    dnsHeaderNumAuthority :: Data.Word.Word16,
    dnsHeaderNumAdditional :: Data.Word.Word16
  }
  deriving (Show)

data DNSQuestion = DNSQuestion
  { dnsQuestionName :: BS.ByteString,
    dnsQuestionType :: Data.Word.Word16,
    dnsQuestionClass :: Data.Word.Word16
  }
  deriving (Show)

data DNSRecord = DNSRecord
  { dnsRecordName :: BS.ByteString,
    dnsRecordType :: Data.Word.Word16,
    dnsRecordClass :: Data.Word.Word16,
    dnsRecordTtl :: Data.Word.Word32,
    dnsRecordData :: BS.ByteString
  }
  deriving (Show)

data DNSPacket = DNSPacket
  { dnsPacketHeader :: DNSHeader,
    dnsPacketQuestions :: [DNSQuestion],
    dnsPacketAnswers :: [DNSRecord],
    dnsPacketAuthorities :: [DNSRecord],
    dnsPacketAdditionals :: [DNSRecord]
  }
  deriving (Show)

Construct the request and sending to server

So a DNS query consist of the domain name we want to resolve and the query type for example we want to resolve IPv4 address for google.com which is a type A query we’ll use a helper function to create a DNS packet for our query

encodeQuery :: String -> Data.Word.Word16 -> IO BS.ByteString
encodeQuery domainName recordType = do
  let _id = 1
      recursionDesired = 1 `Data.Bits.shiftL` 8
      header = DNSHeader _id recursionDesired 1 0 0 0
      question = DNSQuestion (encodeDNSName domainName) recordType classIn
      queryBytes = headerToBytes header <> questionToBytes question
  return queryBytes

the above function will return a bytestring of DNS packet which we will send to the nameserver

Decoding and Processing the response

We will receive a DNS packet in bytestring form from the nameserver which may contain the actual answer we are looking for, we just need to decode and process it.

decodeQuery :: BS.ByteString -> Either String DNSPacket
decodeQuery bs =
  case runGetOrFail (getDNSPacket bs) (LBS.fromStrict bs) of
    Left (_, _, err) -> Left err
    Right (_, _, dnsPacket) -> Right dnsPacket


getDNSPacket :: BS.ByteString -> Get DNSPacket
getDNSPacket input = do
  header <- getDNSHeader
  questions <- getDNSQuestionNE input (dnsHeaderNumQuestion header)
  answers <- getDNSRecordList input (dnsHeaderNumAnswer header)
  authorities <- getDNSRecordList input (dnsHeaderNumAuthority header)
  additionals <- getDNSRecordList input (dnsHeaderNumAdditional header)
  return $ DNSPacket header questions answers authorities additionals

getDNSHeader :: Get DNSHeader
getDNSHeader = DNSHeader <$> getWord16be <*> getWord16be <*> getWord16be <*> getWord16be <*> getWord16be <*> getWord16be

getDNSQuestionNE :: BS.ByteString -> Word16 -> Get [DNSQuestion]
getDNSQuestionNE input x = do
  replicateM (fromIntegral x) getDNSQuestion
  where
    getDNSQuestion :: Get DNSQuestion
    getDNSQuestion = DNSQuestion <$> getDomainName input <*> getWord16be <*> getWord16be

getDNSRecordList :: BS.ByteString -> Word16 -> Get [DNSRecord]
getDNSRecordList input count = do
  replicateM (fromIntegral count) getDNSRecord
  where
    getDNSRecord = do
      domain <- getDomainName input
      type' <- getWord16be
      class' <- getWord16be
      ttl <- getWord32be
      data_len <- getInt16be
      data_ <- getRecordData (fromIntegral type') (fromIntegral data_len)
      return $ DNSRecord {dnsRecordName = domain, dnsRecordType = type', dnsRecordClass = class', dnsRecordTtl = ttl, dnsRecordData = data_}
    getRecordData :: Data.Word.Word16 -> Int -> Get BS.ByteString
    getRecordData type_ data_len
      | type_ == typeNs = getDomainName input
      | type_ == typeA = do
          ipBytes <- getByteString $ fromIntegral data_len
          return $ BS.pack $ ipToString $ B.unpack ipBytes
      | otherwise = getByteString $ fromIntegral data_len
    ipToString :: [Word8] -> String
    ipToString = intercalate "." . map show

Now the decoding most of the fields are straightforward but, DNS uses domain name compression for space optimization, for example if we query for domain “google.com”, the string “google.com” will be present multiple times in the DNS packet. Instead of storing it multiple times what DNS do is it stores the pointer to that string present previously. So we also need to take this compression in account while decoding domain name.

getDomainName :: BS.ByteString -> Get BS.ByteString
getDomainName input' = do
  len <- getInt8
  let lengthValue = len Data.Bits..&. 63
  getDomainName' input' len lengthValue
  where
    getDomainName' input len lengthValue
      | len == 0 = return BS.empty
      | isPointer len = do
          d <- getInt8
          let offset = fromIntegral $ lengthValue * 256 + fromIntegral d -- converting bits to int
          decodeCompressed offset input
      | otherwise = do
          label <- getByteString $ fromIntegral lengthValue
          rest <- getDomainName input
          return $
            if BS.null rest
              then label
              else label <> BS.pack "." <> rest
    decodeCompressed :: Int -> BS.ByteString -> Get BS.ByteString
    decodeCompressed offset input = do
      let msg = BS.drop offset input
      case runGetOrFail (getDomainName input) (LBS.fromStrict msg) of
        Left (_, _, err) -> traceShow ("err: " ++ show err) $ return BS.empty
        Right (_, _, domain) -> return domain
    isPointer c = Data.Bits.testBit c 7 && Data.Bits.testBit c 6

if the two MSB are set, then it indicates that remaining 14 bits represents pointer to the domain name backwards

Recursive Nature of DNS

See it is not the case that every time we query for a domain name, nameserver will respond with its ip. DNS resolution can involve multiple steps, as it may recursively query different name servers until it obtains the desired information. It can we possible that it may return ip of another nameserver which many have the answer or even the domain name of that nameserver, so our resolve function will look like this

matchRecordType :: Word16 -> [DNSRecord] -> Maybe String
matchRecordType recordType records = do
  let matchingRecord = L.find (\x -> dnsRecordType x == recordType) records
  (\x -> Just (BS.unpack $ dnsRecordData x)) =<< matchingRecord

getAnswer :: [DNSRecord] -> Maybe String
getAnswer = matchRecordType typeA

getNsIp :: [DNSRecord] -> Maybe String
getNsIp = matchRecordType typeA

getNs :: [DNSRecord] -> Maybe String
getNs = matchRecordType typeNs

resolve :: String -> Data.Word.Word16 -> String -> IO (Maybe String)
resolve domainName recordType nameserver = do
  query <- encodeQuery domainName recordType
  byteString <- sendUDPRequest nameserver 53 query
  case decodeQuery byteString of
    Left err -> do
      traceShow ("Error parsing DNS packet: " ++ err) (return Nothing)
    Right packet -> do
      let mIp = getAnswer $ dnsPacketAnswers packet
      let mNsIp = getNsIp $ dnsPacketAdditionals packet
      let mNsDomain = getNs $ dnsPacketAuthorities packet
      case (mIp, mNsIp, mNsDomain) of
        (Just ip, _, _) -> return $ Just ip
        (_, Just nsIp, _) -> resolve domainName recordType nsIp
        (_, _, Just nsDomain) -> do
          nameserver <- resolve nsDomain typeA nameserver
          resolve domainName recordType $ fromMaybe "" nameserver
        (_, _, _) -> traceShow "Error Occured" $ return Nothing

Final Step (testing)

Our poorman’s DNS resolver is finally completed now, you can test it by querying any domain name.

> cabal run haskell-dns-resolver google.com
Just "142.251.42.78"

> dig @8.8.8.8 google.com
; <<>> DiG 9.10.6 <<>> @8.8.8.8 google.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 5539
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;google.com.                    IN      A

;; ANSWER SECTION:
google.com.             227     IN      A       142.250.70.78

;; Query time: 49 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Mon Mar 18 19:17:16 IST 2024
;; MSG SIZE  rcvd: 55

Reference

/haskell/ /networking/ /codingchallenges/