Mathematical modeling
Problem: We all hate seeing spam in our inbox, but what we hate even more is when real emails wind up in the spam folder. Suppose that each time we see a spam email in our inbox, we get 1 happiness unit more sad each time a real email winds up in the spam folder, we get 5 happiness units more sad. Our happiness is unaffected by correct classifications.
##### How much happiness do we lose using the Additive Model? How much happiness do we lose using the Logistic model? Report your answers as happiness lossperemailreceived. (Use the test data for your calculations)
Data description
The test data has 58 columns 57 are independent variables (wordcount and character count for example) is.spam is the dependent variable. 1= is spam. 0= no spam there is a total of 1536 entries. 0= 941 and 1= 595
Column names are as follows ( I did not find any information relevant in the test data that can help with the problem)
[1] "word_freq_make" "word_freq_address" "word_freq_all" "word_freq_3d" [5] "word_freq_our" "word_freq_over" "word_freq_remove" "word_freq_internet" [9] "word_freq_order" "word_freq_mail" "word_freq_receive" "word_freq_will" [13] "word_freq_people" "word_freq_report" "word_freq_addresses" "word_freq_free" [17] "word_freq_business" "word_freq_email" "word_freq_you" "word_freq_credit" [21] "word_freq_your" "word_freq_font" "word_freq_000" "word_freq_money" [25] "word_freq_hp" "word_freq_hpl" "word_freq_george" "word_freq_650" [29] "word_freq_lab" "word_freq_labs" "word_freq_telnet" "word_freq_857" [33] "word_freq_data" "word_freq_415" "word_freq_85" "word_freq_technology" [37] "word_freq_1999" "word_freq_parts" "word_freq_pm" "word_freq_direct" [41] "word_freq_cs" "word_freq_meeting" "word_freq_original" "word_freq_project" [45] "word_freq_re" "word_freq_edu" "word_freq_table" "word_freq_conference" [49] "char_freq_semicolon" "char_freq_parens" "char_freq_bracket" "char_freq_exclamation" [53] "char_freq_dollar" "char_freq_pound" "capital_run_length_average" "capital_run_length_longest" [57] "capital_run_length_total" "is.spam"
The original data set if needed can be find here https://archive.ics.uci.edu/ml/datasets/spambase
I need to use functions gam() and glm() in R programming
Answer
 The questioner was satisfied and accepted the answer, or
 The answer was disputed, but the judge evaluated it as 100% correct.
1 Attachment

Thanks!

left you a tip for the last question you answered when it was already closed. Thanks!
 answered
 529 views
 $25.00
Related Questions
 Statistics/percentage question
 Average passanger waiting time  probability density function  normal distribution
 Critique my null and alternative hypothesis (beginner)
 Confidence Interval  Poisson
 Two statistics proofs with regressions, any help much appreciated!
 Interpretation of signifcance of continous by continous regression with interaction term
 Explain how to get the vertical values when $n = 10$, $p = .5$, $\mu = 5$ and $\sigma^2 = 2.5$
 Let $X$ be a single observation from the density $f(x) = (2θx + 1 − θ)I[0,1](x)$ with $−1≤ θ ≤ 1$. Find the most powerful test of size $α$ and its power
Bounty is too low
Sorry I can't increase it more..Please help me. it is the only question left out of 35. I have tried everything
I agree with Schwartstack, the bouty is low. This may take more than an hour to answer.
Could you rewrite this please: "Suppose that each time we see a spam email in our inbox, we get 1 happiness unit more sad each time a real email winds up in the spam folder, we get 5 happiness units more sad."
The original data set has 4601 entries.
Also, should we aim to train the 'best' possible models or just use the default ones? Should we do feature selection or cross validation or just train a simple model with all observations?