▲On the paper “Exploring the MIT Mathematics and EECS Curriculum Using LLMs” [pdf]people.csail.mit.edu

160 points by jlaneve 783 days ago | 75 comments

knaik94 782 days ago [-]

>We want to emphasize that all the student authors in this paper worked really hard on what could have been a very interesting and valuable paper had the data been collected with consent. The many problems with the published work were not the fault of the students.

I appreciate the clear stance that MIT has taken regarding where the responsibility lies in this situation. I think some people are missing the context that many of the authors were undergraduate/early career. Research is an iterative process and every paper has to start somewhere. I don't agree that the paper should be withdrawn because Arxiv is not technically a publication, but I also wouldn't consider the paper properly peer reviewed. Teachers own the copyright the exam material. I was taught copyrighted material can't and shouldn't be used as part of an eval dataset.

The followup by three other MIT ('24) seniors is a great peer review.

https://flower-nutria-41d.notion.site/No-GPT4-can-t-ace-MIT-...

No, GPT4 Can’t Ace MIT - https://news.ycombinator.com/item?id=36370685 - June 2023 (120 comments)

782 days ago [-]

dfdz 783 days ago [-]

https://arxiv.org/abs/2306.08997

“arXiv will not consider removal for reasons such as journal similarity detection, nor failure to obtain consent from co-authors, as these do not invalidate the license applied by the submitter”

The submitter can mark the paper as “withdrawn” but it will remain available

https://news.ycombinator.com/item?id=32780403

anonymouskimmer 782 days ago [-]

DMCA takedown, then? Or not applicable because the data is not part of the publication? Tables 17 and 18 in the Appendix could probably be removed as they seem to verbatim copy course descriptions, as well as, maybe, Figure 4.

> Iddo did not have permission from all the instructors to collect the assignment and exam questions that made up the dataset that was the subject of the paper.

skissane 782 days ago [-]

> DMCA takedown, then? Or not applicable because the data is not part of the publication?

Legally, you are allowed to refuse to comply with DMCA take down notices. If you do, you are increasing your risk of being sued for copyright infringement, and increasing the potential damages if you lose – but, if you decide (in any individual case) that is a risk worth taking, you are free to take that risk. If MIT tried to issue a DMCA takedown to arXiv over this, arXiv might decide that defending their own policies is worth the risk of being sued by MIT.

> Tables 17 and 18 in the Appendix could probably be removed as they seem to verbatim copy course descriptions, as well as, maybe, Figure 4.

Probably a sufficiently small extract from the source material, that it would fall under fair use? (Lack of acknowledgement of the specific source may be an issue; but that can be remedied by adding an acknowledgement, rather than removal.)

anonymouskimmer 782 days ago [-]

IANAL.

For the tables there's very little transformation, and a huge chunk of verbatim text. I don't see how there is any gain versus just publishing the course numbers and titles.

For figure 4 this might fall under "unpublished material" protections, which are: https://www2.archivists.org/publications/brochures/copyright...

> Generally, material is considered unpublished if it was not intended for public distribution or if only a few copies were created and distribution was limited.

> The law distinguishes between published and unpublished material and the courts often afford more copyright protection to unpublished material when an asserted fair use is challenged.

> Rather, courts evaluate fair use cases based on four factors, no one of which is determinative in and of itself:

2) > Courts give more protection to works that are “closer to the core of copyright protection,” such as unpublished

4) > The effect of the use upon the potential market for, or value of, the copyrighted work: This factor assesses how, and to what extent, the use damages the existing and potential market for the original.

Publication of the (possibly) previously unpublished copyrighted work in figure 4 fully and completely destroys its value. I don't know if a fair use claim can overcome such an impact, though that is up to a court to determine.

skissane 782 days ago [-]

> For the tables there's very little transformation, and a huge chunk of verbatim text.

IANAL either–but how is the copyright owner (MIT presumably) harmed by the reproduction of these course descriptions? It isn't like they harm the commercial value of the courses in any way; the course is the actual product here, the description is just sales and marketing collateral, and has minimal value apart from the product it is selling.

Furthermore, given the fact the paper was coauthored by MIT employees – arXiv could argue that MIT (through its employees acting as its agents) had granted them an implied license to reproduce it. Which is the other issue – even if this isn't fair use, MIT may have agreed to license it through its agents. You can still be bound by the actions of your employees, even if those actions violated your own internal policies–especially in dealings with third parties who had no reason to suspect there was any such violation.

> I don't see how there is any gain versus just publishing the course numbers and titles.

"Algebra I" and "Algebra II" don't mean much – what topics do they actually cover? A one sentence/paragraph course description adds a lot, because they tell you what topics are actually covered. Yes, someone could probably look it up on the MIT website – but it saves the reader a lot of effort doing that. Especially if someone is reading this 20 years from now, by which time the content of MIT courses may have changed a lot (despite having the same title), and finding what their content was 20 years ago may require a lot of research effort (if the reader even thinks to do that).

> Publication of the (possibly) previously unpublished copyrighted work in figure 4 fully and completely destroys its value

Figure 4 is likely not the "work", rather a small quote from a much larger work. How does a small quote from a work (even if allegedly unpublished) "fully and completely destroys its value"?

anonymouskimmer 782 days ago [-]

Re: the course descriptions. Yes, I can see a judge buying that defense. And yes, we don't know what license to use exists for MIT faculty. I could also see a judge buying that the research article here doesn't need to publish the course descriptions in order to make its point at all.

> IANAL either, but figure 4 is likely not the "work", rather a small quote from a much larger work. How does a small quote from a work (even if allegedly unpublished) "fully and completely destroys its value"?

Exams are often composites of multiple independent works. Said exams being recomposited periodically (i.e. using a database of questions to create an exam). The argument here is that the individual question is itself a complete work (equivalent to an independent chapter in a book of works on a topic). And here it is not just on its lonesome, but with its answer, too.

skissane 782 days ago [-]

> Exams are often composites of multiple independent works.

If figure 4 came from an exam. For all we know, figure 4 actually came from course notes, assignments, etc. Whether or not issuing those to students counts as "publication", they are easily available to future students in a way that past exam questions are often not, hence their publication does far less damage to their value.

Also, MIT says that "Iddo did not have permission from all the instructors" – for all we know, figure 4 is from one of those instructors for which he did have that permission.

anonymouskimmer 782 days ago [-]

Yep, those sorts of possibilities is what the "(possibly)" in my earlier post was for.

Based on a quick search it seems the figure 4 question and answer have to do with https://en.wikipedia.org/wiki/Markov_decision_process , which seem to be used in computer science. Iddo Drori is an associate professor of CS, so it seems quite likely it's his own question.

782 days ago [-]

behnamoh 782 days ago [-]

That’s what I don’t like about arxiv. The person posting the paper must be able to take it down as well.

dang 782 days ago [-]

Recent and related:

No, GPT4 Can’t Ace MIT - https://news.ycombinator.com/item?id=36370685 - June 2023 (120 comments)

az226 783 days ago [-]

So lame they focus on gatekeeping the exams and crying about permission as opposed to challenging the paper’s shit methodology.

caddemon 783 days ago [-]

I mean if the dude submitted the work without the permission of the actual lead authors that's a pretty big violation, which also could explain in part why the methodology was so bad.

rsfern 782 days ago [-]

The person who submitted to arxiv (and promoted the paper on Twitter) is one of the senior authors (listed last in the author block), so I don’t think that’s an excuse for publishing to arxiv before doing due diligence on methodology

Apparently some of the other profs were not in the loop about the arxiv submission though

stefan_ 782 days ago [-]

Senior author is a euphemism for "happens to run the lab". Explains why the other "senior authors" are lashing out, they never expected having to know what a paper that bears their name is all about.

rsfern 782 days ago [-]

Sure - the point is that the preprint wasn’t posted by a student without permission of their advisor, it was posted by a professor without (apparently) permission of their collaborators, who aren’t part of a monolithic lab for what it’s worth.

However, I generally agree with your take, and the response seems to me to kind of dodge the responsibility for methodology issues. that responsibility is not really compatible with the “hey we’re submitting to arxiv in a few days, last chance for comments” approach to collaboration that seems to be the minimum expected bar for signing off on the submission form that all authors are aware and agree to publication

sheepscreek 782 days ago [-]

Posted by a lecturer, not a professor I believe.

rsfern 782 days ago [-]

He’s an associate professor at BU, and I guess a lecturer at MIT

YeGoblynQueenne 782 days ago [-]

That's nonsense. Senior authors may not contribute text to the paper but they will normally be involved in every other way - reading drafts, making recommendations, directing the research.

I know it's standard on HN to accuse senior academics of exploiting their PhD students and I'm pretty sure there is plenty of that to go around but it is a very rare PhD student that can write a publishable paper without advise and guidance from their advisor. The name gives a hint, even.

I'm speaking in this as a recently graduated PhD student btw. I don't have any conflicts of interest (well, not yet, hopefully). It sucks that many students have absolutely rotten relations with their advisors but that's exactly because the student depends so much on the advisor for direction that it's easy for the power differential to be exploited by unscrupulous individuals.

stefan_ 782 days ago [-]

The main thrust of their complaint (as paper authors!) is that they used the "copyrighted" exam questions. I don't understand how both can be true: to have contributed to this paper, which entirely rests on ChatGPT (not) answering exam questions, and not knowing that that is what the paper is all about.

That is certainly an issue that would be discovered well before anyone is sitting down to write the actual paper.

dgacmu 782 days ago [-]

This may be true in some fields but it's generally not the case in academic computer science. (Generally.)

a_bonobo 782 days ago [-]

>Apparently some of the other profs were not in the loop about the arxiv submission though

Sounds more like they're throwing the one senior prof under the bus to protect their own faces; saying 'I didn't agree to preprint submission' would be the first excuse I'd come up with.

At that point, they had been majorly involved in the research, if they were not involved in the preprinting they must have seen several drafts at least, at which point they could've asked to get taken out. But they didn't do that, and that's telling.

rsfern 782 days ago [-]

I tend to agree, but it’s hard to know exactly how it went down from our outsiders perspective. They’re implying that they asked for changes before submission, and that they weren’t given a final call to ask for their names to be removed if it came to that. Personally I would be less skeptical about this version of events with more transparency about whatever technical issues they had with the manuscript

782 days ago [-]

az226 782 days ago [-]

But that’s something they should deal with internally. It is quite strange actually. The statement should also come from MIT’s public affairs office and not the professors themselves. So it looks like they are big mad and acting on their own to save face.

782 days ago [-]

simplesamp 782 days ago [-]

The paper is in Neurips template and was possibly submitted to the conference. And now other authors are complaining when other people pointed to it.

23B1 783 days ago [-]

You... think it's lame that an academic institution would focus on protecting their students and professors?

az226 782 days ago [-]

Paper didn’t include the exams themselves, please correct me if my understanding is incorrect.

dr_kretyn 782 days ago [-]

Based on what i watched by Yannick on YouTube and haven't actually read... Questions weren't part of the paper but authors made a goof and included all of them in GitHub. Shortly after they deleted these questions but as part of a new commit so one could see the change, thus all the questions.

az226 782 days ago [-]

But then this dumb statement should have been taking about the upload to GitHub and not the paper and how collaborators weren’t aware, didn’t get permission to run the analysis, etc. Sure, they’re all connected to the paper but they’re separate issues.

anonymouskimmer 782 days ago [-]

Figure 4 on the last page of the PDF might include a single question.

YeGoblynQueenne 782 days ago [-]

The signatories are some of the paper's authors. It's not common for authors to criticise their own methodology as shitty. Anyway it's clear they didn't think the paper was ready to be made public so what is there to criticise about its methodology? It's a work in progress.

11101010001100 783 days ago [-]

You realize this is only one part of the response?

bluepod4 783 days ago [-]

> We want to emphasize that all the student authors in this paper worked really hard on what could have been a very interesting and valuable paper had the data been collected with consent. The many problems with the published work were not the fault of the students.

This is the conclusion of the memo. The problems with methodology are clearly a secondary concern. They also seem to imply that issues with consent are solely or mainly the cause of the methodology problems. This could be true, but idk.

This is my opinion as an MIT grad who still views OCW content and similar content from other universities from time-to-time.

(Complete speculation: I almost feel as if certain professors or lecturers are “embarrassed” about their content.)

dkqmduems 782 days ago [-]

Working hard does not necessarily mean getting it right.

(Also an mit grad).

bluepod4 782 days ago [-]

No one said it did. Did you mean to reply to someone else?

This thread was about whether the response focused on the consent or methodology problems and it’s my opinion that they are focusing on the consent issues more.

I only copied a quote from the memo.

quadrifoliate 782 days ago [-]

> He did so without the consent of many of his co-authors and despite having been told of problems that should be corrected before publication.

I think they are point out consent issues in multiple cases, but also talk about "problems that should be corrected before publication".

I can't really see how there would problems to be corrected before publication.

If it's only a consent issue but the rest of the methodology is sound, they should just say "Drori should have waited for the consent to go ahead with publication, but the paper is fine"?

bluepod4 782 days ago [-]

To clarify, when I said this was the _conclusion_ of the memo, I meant that it was the literal closing statement (besides the one-liner/quip about GPT-4 not being able to earn an MIT degree).

I didn’t mean that it was the be-all and end-all of those CSAIL profs’ perspective(s).

The wording is still very suspect to me and seems like the Writing Center would have some constructive feedback for them (unless the meaning of words doesn’t matter).

dkqmduems 782 days ago [-]

I take the stance that getting it right is implicitly understood to be primary, as it should be in science

bluepod4 782 days ago [-]

I agree but not everyone thinks like this even at MIT, as you should know since you said that you’re an alum.

centmot 783 days ago [-]

Precisely.

782 days ago [-]

kleiba 782 days ago [-]

Phew, I'm just happy that OpenAI first gathered consent from all data sources on which they trained GPT4.

data_maan 782 days ago [-]

Here's a more rigorous take on evaluating the output of GPT models on math:

https://arxiv.org/abs/2301.13867

ianbutler 782 days ago [-]

They don't specify GPT-4. The fact they generically refer to ChatGPT leads me to believe they're assessing 3.5. GPT-4 is significantly more competent of a model and while I won't speculate on it's math ability I'd want to see a follow up, specifically for GPT-4.

Sunhold 782 days ago [-]

Yup. Page 19: "we focused on the 9th-January-2023 version of ChatGPT"

cscurmudgeon 782 days ago [-]

Unfortunately, negative but robust results like this won't get a fraction of the attention this MIT paper initially got.

muds 782 days ago [-]

Putting papers and code on arXiv shouldn't be punished. The incentive to do this is to protect your idea from getting scooped, and also to inform your close community on interesting problems that you're working on and get feedback. ArXiv is meant for work in progress ideas that won't necessarily stand the peer review process, but this isn't really acknowledged properly on social media. I highly doubt the Twitter storm would have been this intense if the twitter posts explicitly acknowledged this as a "Draft publication which hints as X." But I admit that pointing fingers at nobody in general and social media specifically is a pretty lazy solution.

The takeaway IMO seems to be to prepend the abstract with a clear disclaimer sentence conveying the uncertainty of the research in question. For instance, adding a clear "WORKING DRAFT: ..." in the abstract section.

ttpphd 782 days ago [-]

I think you missed the point that data needs to be collected and presented ethically. It's not about it being a work in progress and not peer reviewed.

muds 782 days ago [-]

I agree that the data collection process wasn't ethical, and the professor should definitely be reprimanded for that. It's extremely sad that the coauthors weren't aware of this as well. And I feel terrible for the undergrads: their first research experience was publically rebuked for no fault of their own.

However, there is no shortage of projects with sketchy data collection methodologies on arXiv that haven't received this amount of attention. The point of putting stuff on arXiv _is_ that the paper will not pass / has not passed peer review in its current form! I might even call arXiv a safe space to publish ideas. We all benefit from this: a lot of interesting papers are only available on arxiv v.s. being shared between specific labs.

I'm concerned that this fiasco was enabled by this new paradigm in AI social media reporting, where a project's findings are amplified and all the degrees of uncertainty are repressed. And I'm honestly not sure how to best deal with this other than either amplifying the uncertainty and jankyness in the paper itself to an annoyingly noticeable level, or just going back to the old way of privately sharing ideas.

Maybe this is the best case scenario for these sorts of papers? They pushed a paper on a public journal, and got a public "peer review" of the paper. Turns out the community voted "strong reject;" and it also turns out that the stakes for public rejection are (uncomfortably, IMO) higher than for a normal rejection. Maybe this causes the researchers to only publically release better research, or (more likely) this causes the researchers to privately release all future papers.

caddemon 782 days ago [-]

I agree with the ideal you're speaking of, but the lecturer that uploaded the paper was promoting it on Twitter in a way that really isn't consistent with that sort of intent. This entire scenario seems like a failed attempt to market his own academic brand, without care for the underlying scientific content. It's not a genuine sharing of an unfinished idea or early observation.

xdavidliu 782 days ago [-]

he didnt have the permission to post the data. a similar protest would have been made if he had posted the data on github, stackoverflow, reddit, or hackernews, and none of those are as peer reviewed as academic journals

PeterStuer 782 days ago [-]

It was a "copyright" issue, nothing to do with "ethics".

cscurmudgeon 782 days ago [-]

> The incentive to do this is to protect your idea from getting scooped, and

The other side is flag planting with half-baked ideas and results.

782 days ago [-]

currymj 782 days ago [-]

the official stance of arXiv is that it is intended for papers that are finished and ready for submission to peer review, i.e. the authors believe it's finished and publishable.

in a practical sense, most people don't think of it this way though. putting something on arXiv means "i want people to be able to cite this, and for it to show up on Google Scholar", which might include works in progress, short notes, or lots of other things that don't fit the official criteria.

782 days ago [-]

visarga 782 days ago [-]

There is an insightful video review of this paper by Yannic Kilcher: https://youtu.be/Tkijsu129M0?t=30

TL;DW - Their approach is to sequentially test methods, moving on to the next if one fails. However, this strategy is flawed as it requires ground truth, particularly for multi-choice answers. The analogy could be made to continually rolling a dice until landing on six. Similarly, if a question has four potential answers, the model merely has to attempt four times to stumble upon the correct response. And then they report 100% success rate.

az226 782 days ago [-]

Yes and no. Some LLMs can’t get the right answer no matter how many rolls of the dice they get.

The bigger BS is that several questions weren’t questions and some didn’t have enough detail to be answered and it somehow got 100% on those. This IMO out the entire paper into the trash.

caddemon 782 days ago [-]

It graded itself, so that explains that lol

sgt101 782 days ago [-]

Well, someone's in trouble. If a faculty member allowed their name to be put on this thing then that's on them, if they didn't then whoever did the submission has to go. Also all the papers of all the co-authors who allowed their name on it are now suspect for me and should be investigated seriously. Sure, if the fact is that they said "don't put my name on stuff that you publish unless I have specifically agreed" or "don't publish this thing, it needs work" and then that message got lost - they are off the hook for me. But there needs to be a look at the culture of the labs involved even in that case. That's brutal because as I say there is a possibility that they are victims of someone going over the top, but unfortunately I think MIT has to do it.

In addition, for me there is a much wider ethics issue here. How many papers can really be checked properly by someone claiming authorship even if that author isn't really contributing? I can read a paper like this in about a fortnight because I have a job and a life. A faculty member also has a job and a life - they are doing admin and teaching as well as research. So to me it's impossible for someone to check 25 papers or more a year.

I am seeing far higher counts than this by many academics.

But, this is an extremely conservative threshold in my opinion. When I have contributed to scientific papers it has taken me at least three months of solid work each time. Often these papers get rejected (rightly) and then have to be substantially amended (or occasionally just abandoned) I am not that talented for sure, but I really find it hard to credit that anyone with an actual job (so not a post-doc or a student) can contribute to more than one academic paper a year. Potentially two or three if there is a confluence of papers getting ready for print... but not on a sustained basis.

There are two solutions. Every university and research institute needs to investigate all the publications of academics with high paper counts per year. This is a red flag. I definitely think that if folks are in the top quartile in a department it needs to be looked at carefully.

The other solution is that no academic publishing venue (conference or journal) should accept more than one paper per year from any author.

CamperBob2 783 days ago [-]

Is there a link to the original story? Drori's homepage at MIT seems to have a "Certificate Error" (probably 405: Revoked Diploma.)

I assume the paper itself is long gone.

primordialsoup 782 days ago [-]

https://theaireport.substack.com/p/the-ai-report-5-gpt-4-ace...

jlaneve 782 days ago [-]

Original paper is on arxiv: https://arxiv.org/abs/2306.08997

idontpost 783 days ago [-]

Seems like it's still on arXiv.

ec109685 782 days ago [-]

Not condoning what the author did, but it’s amazing how little the questions answered in the paper have to do with real life software engineering.

Yes, it shows fortitude to pass MIT’s exam, but beyond that, a lot of stress for a test that will soon be forgotten after their undergraduate degree.

tinyhouse 782 days ago [-]

People overreact to this paper because it's from MIT. Lots of crappy papers out there, this is just another one. MIT is not special, they have a lot of average people too.

caddemon 782 days ago [-]

This was written by young undergrads with no actual research experience and a random visiting lecturer who is obviously trying to career climb (and had it backfire). Certainly there is bad research produced by MIT but I'd hardly call this "produced by MIT" in the traditional academic sense. It is extremely easy actually to get yourself some sort of affiliation.

Perhaps they should crack down on that, but I think the relative openness of MIT is overall a good university culture to have, and only a small minority of those with tenuous affiliation are actually grifters.

simplesamp 782 days ago [-]

What I fail to understand is that these authors were presumably okay sending the paper to Neurips (a top tier conference). I am assuming this based on the paper being in Neurips submission template. And now when the cat is out of the bag in open they are denouncing it.

artfulmink 782 days ago [-]

Honestly not quite a fair assumption. Some schools have students use neurips templates for class submissions (at least my graduate school did), and so these students may have just been using this format out of habit/familiarity.

simplesamp 782 days ago [-]

Maybe. But the paper being exact nine pages (page limit of neurips this year) and being put on arxiv just a month after neurips deadline is too much of a coincidence.

PeterStuer 782 days ago [-]

One can only dream this level of analysis and falsification would be the norm in academic publishing.

Sadly, this is an extreme outlier for now.

asylteltine 782 days ago [-]

[dead]

bjourne 782 days ago [-]

> We want to emphasize that all the student authors in this paper worked really hard on what could have > been a very interesting and valuable paper had the data been collected with consent. The many > problems with the published work were not the fault of the students.

So what did the student authors work "really hard on" if not the same data that was collected without consent? Either all student authors are at fault for working on a paper based on data collected without consent or none are. It's not publishing the paper that is the problem here.

knaik94 782 days ago [-]

The issue isn't that the results are necessarily bad from using a dataset with copyrighted exams. Using it for training data is "fine". But a copyrighted eval dataset makes it basically impossible to reproduce the results. One part of the original paper was validating the use of the MITQ dataset as a way to evaluate various models.

This is seperate from the issues related to the methodology not agreeing with conclusion that GPT4 could "get an MIT degree".

The second conclusion and methodology could have been fixed potentially.

bjourne 782 days ago [-]

The vast majority of all ml research is using copyrighted data both for training and evaluation.

knaik94 782 days ago [-]

I don't mean evaluating a model at the end of training, I mean datasets used to quantify the relative performance across models and algorithms.

Copyrighted data would make it harder to share and use, to reproduce results. I don't remember coming across a benchmark dataset that was copyrighted.

bjourne 782 days ago [-]

That's very odd because I can't for the life of me think of an ml benchmark dataset that is not copyrighted. Copyrighted datasets are everywhere in ml research.

knaik94 778 days ago [-]

Just looking at the eval datasets used in the HF leaderboard, ARC is CC BY-SA, TruthfulQA is Apache, and HellaSwag and MMLU are both MIT. The first dataset many people will explore ml with, MNIST, is GPL-3.0 license. The audio courpus datasets I've used have all been similarly permissively licensed, for example LibriTTS is CC BY 4.0. LJ Speech, one of the main datasets used in Tacotron is public domain. Maybe I should have been more explicit and used the term permissively copyrighted instead to help avoid confusion.

https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...

emmender 782 days ago [-]

Such despicable behavior from everyone involved.

1. Professors and MIT students attempting to gain fleeting fame riding on the bandwagon without trying to do deep work

2. Professors at MIT getting upset and lash out that GPT-4 can now get a MIT degree devaluing said degree.

What has become of academia these days ?

Loading comments...

knaik94 782 days ago [-]

The followup by three other MIT ('24) seniors is a great peer review.

https://flower-nutria-41d.notion.site/No-GPT4-can-t-ace-MIT-...

No, GPT4 Can’t Ace MIT - https://news.ycombinator.com/item?id=36370685 - June 2023 (120 comments)

782 days ago [-]

dfdz 783 days ago [-]

https://arxiv.org/abs/2306.08997

“arXiv will not consider removal for reasons such as journal similarity detection, nor failure to obtain consent from co-authors, as these do not invalidate the license applied by the submitter”

The submitter can mark the paper as “withdrawn” but it will remain available

https://news.ycombinator.com/item?id=32780403

anonymouskimmer 782 days ago [-]

> Iddo did not have permission from all the instructors to collect the assignment and exam questions that made up the dataset that was the subject of the paper.

skissane 782 days ago [-]

> DMCA takedown, then? Or not applicable because the data is not part of the publication?

> Tables 17 and 18 in the Appendix could probably be removed as they seem to verbatim copy course descriptions, as well as, maybe, Figure 4.

anonymouskimmer 782 days ago [-]

IANAL.

For the tables there's very little transformation, and a huge chunk of verbatim text. I don't see how there is any gain versus just publishing the course numbers and titles.

For figure 4 this might fall under "unpublished material" protections, which are: https://www2.archivists.org/publications/brochures/copyright...

> Generally, material is considered unpublished if it was not intended for public distribution or if only a few copies were created and distribution was limited.

> The law distinguishes between published and unpublished material and the courts often afford more copyright protection to unpublished material when an asserted fair use is challenged.

> Rather, courts evaluate fair use cases based on four factors, no one of which is determinative in and of itself:

2) > Courts give more protection to works that are “closer to the core of copyright protection,” such as unpublished

skissane 782 days ago [-]

> For the tables there's very little transformation, and a huge chunk of verbatim text.

> I don't see how there is any gain versus just publishing the course numbers and titles.

> Publication of the (possibly) previously unpublished copyrighted work in figure 4 fully and completely destroys its value

Figure 4 is likely not the "work", rather a small quote from a much larger work. How does a small quote from a work (even if allegedly unpublished) "fully and completely destroys its value"?

anonymouskimmer 782 days ago [-]

skissane 782 days ago [-]

> Exams are often composites of multiple independent works.

Also, MIT says that "Iddo did not have permission from all the instructors" – for all we know, figure 4 is from one of those instructors for which he did have that permission.

anonymouskimmer 782 days ago [-]

Yep, those sorts of possibilities is what the "(possibly)" in my earlier post was for.

782 days ago [-]

behnamoh 782 days ago [-]

That’s what I don’t like about arxiv. The person posting the paper must be able to take it down as well.

dang 782 days ago [-]

Recent and related:

No, GPT4 Can’t Ace MIT - https://news.ycombinator.com/item?id=36370685 - June 2023 (120 comments)

az226 783 days ago [-]

So lame they focus on gatekeeping the exams and crying about permission as opposed to challenging the paper’s shit methodology.

caddemon 783 days ago [-]

I mean if the dude submitted the work without the permission of the actual lead authors that's a pretty big violation, which also could explain in part why the methodology was so bad.

rsfern 782 days ago [-]

Apparently some of the other profs were not in the loop about the arxiv submission though

stefan_ 782 days ago [-]

Senior author is a euphemism for "happens to run the lab". Explains why the other "senior authors" are lashing out, they never expected having to know what a paper that bears their name is all about.

rsfern 782 days ago [-]

sheepscreek 782 days ago [-]

Posted by a lecturer, not a professor I believe.

rsfern 782 days ago [-]

He’s an associate professor at BU, and I guess a lecturer at MIT

YeGoblynQueenne 782 days ago [-]

That's nonsense. Senior authors may not contribute text to the paper but they will normally be involved in every other way - reading drafts, making recommendations, directing the research.

stefan_ 782 days ago [-]

That is certainly an issue that would be discovered well before anyone is sitting down to write the actual paper.

dgacmu 782 days ago [-]

This may be true in some fields but it's generally not the case in academic computer science. (Generally.)

a_bonobo 782 days ago [-]

>Apparently some of the other profs were not in the loop about the arxiv submission though

Sounds more like they're throwing the one senior prof under the bus to protect their own faces; saying 'I didn't agree to preprint submission' would be the first excuse I'd come up with.

rsfern 782 days ago [-]

782 days ago [-]

az226 782 days ago [-]

782 days ago [-]

simplesamp 782 days ago [-]

The paper is in Neurips template and was possibly submitted to the conference. And now other authors are complaining when other people pointed to it.

23B1 783 days ago [-]

You... think it's lame that an academic institution would focus on protecting their students and professors?

az226 782 days ago [-]

Paper didn’t include the exams themselves, please correct me if my understanding is incorrect.

dr_kretyn 782 days ago [-]

az226 782 days ago [-]

anonymouskimmer 782 days ago [-]

Figure 4 on the last page of the PDF might include a single question.

YeGoblynQueenne 782 days ago [-]

11101010001100 783 days ago [-]

You realize this is only one part of the response?

bluepod4 783 days ago [-]

> We want to emphasize that all the student authors in this paper worked really hard on what could have been a very interesting and valuable paper had the data been collected with consent. The many problems with the published work were not the fault of the students.

This is my opinion as an MIT grad who still views OCW content and similar content from other universities from time-to-time.

(Complete speculation: I almost feel as if certain professors or lecturers are “embarrassed” about their content.)

dkqmduems 782 days ago [-]

Working hard does not necessarily mean getting it right.

(Also an mit grad).

bluepod4 782 days ago [-]

No one said it did. Did you mean to reply to someone else?

This thread was about whether the response focused on the consent or methodology problems and it’s my opinion that they are focusing on the consent issues more.

I only copied a quote from the memo.

quadrifoliate 782 days ago [-]

> He did so without the consent of many of his co-authors and despite having been told of problems that should be corrected before publication.

I think they are point out consent issues in multiple cases, but also talk about "problems that should be corrected before publication".

I can't really see how there would problems to be corrected before publication.

If it's only a consent issue but the rest of the methodology is sound, they should just say "Drori should have waited for the consent to go ahead with publication, but the paper is fine"?

bluepod4 782 days ago [-]

To clarify, when I said this was the _conclusion_ of the memo, I meant that it was the literal closing statement (besides the one-liner/quip about GPT-4 not being able to earn an MIT degree).

I didn’t mean that it was the be-all and end-all of those CSAIL profs’ perspective(s).

The wording is still very suspect to me and seems like the Writing Center would have some constructive feedback for them (unless the meaning of words doesn’t matter).

dkqmduems 782 days ago [-]

I take the stance that getting it right is implicitly understood to be primary, as it should be in science

bluepod4 782 days ago [-]

I agree but not everyone thinks like this even at MIT, as you should know since you said that you’re an alum.

centmot 783 days ago [-]

Precisely.

782 days ago [-]

kleiba 782 days ago [-]

Phew, I'm just happy that OpenAI first gathered consent from all data sources on which they trained GPT4.

data_maan 782 days ago [-]

Here's a more rigorous take on evaluating the output of GPT models on math:

https://arxiv.org/abs/2301.13867

ianbutler 782 days ago [-]

Sunhold 782 days ago [-]

Yup. Page 19: "we focused on the 9th-January-2023 version of ChatGPT"

cscurmudgeon 782 days ago [-]

Unfortunately, negative but robust results like this won't get a fraction of the attention this MIT paper initially got.

muds 782 days ago [-]

ttpphd 782 days ago [-]

I think you missed the point that data needs to be collected and presented ethically. It's not about it being a work in progress and not peer reviewed.

muds 782 days ago [-]

caddemon 782 days ago [-]

xdavidliu 782 days ago [-]

PeterStuer 782 days ago [-]

It was a "copyright" issue, nothing to do with "ethics".

cscurmudgeon 782 days ago [-]

> The incentive to do this is to protect your idea from getting scooped, and

The other side is flag planting with half-baked ideas and results.

782 days ago [-]

currymj 782 days ago [-]

the official stance of arXiv is that it is intended for papers that are finished and ready for submission to peer review, i.e. the authors believe it's finished and publishable.

782 days ago [-]

visarga 782 days ago [-]

There is an insightful video review of this paper by Yannic Kilcher: https://youtu.be/Tkijsu129M0?t=30

az226 782 days ago [-]

Yes and no. Some LLMs can’t get the right answer no matter how many rolls of the dice they get.

The bigger BS is that several questions weren’t questions and some didn’t have enough detail to be answered and it somehow got 100% on those. This IMO out the entire paper into the trash.

caddemon 782 days ago [-]

It graded itself, so that explains that lol

sgt101 782 days ago [-]

I am seeing far higher counts than this by many academics.

The other solution is that no academic publishing venue (conference or journal) should accept more than one paper per year from any author.

CamperBob2 783 days ago [-]

Is there a link to the original story? Drori's homepage at MIT seems to have a "Certificate Error" (probably 405: Revoked Diploma.)

I assume the paper itself is long gone.

primordialsoup 782 days ago [-]

https://theaireport.substack.com/p/the-ai-report-5-gpt-4-ace...

jlaneve 782 days ago [-]

Original paper is on arxiv: https://arxiv.org/abs/2306.08997

idontpost 783 days ago [-]

Seems like it's still on arXiv.

ec109685 782 days ago [-]

Not condoning what the author did, but it’s amazing how little the questions answered in the paper have to do with real life software engineering.

Yes, it shows fortitude to pass MIT’s exam, but beyond that, a lot of stress for a test that will soon be forgotten after their undergraduate degree.

tinyhouse 782 days ago [-]

People overreact to this paper because it's from MIT. Lots of crappy papers out there, this is just another one. MIT is not special, they have a lot of average people too.

caddemon 782 days ago [-]

simplesamp 782 days ago [-]

artfulmink 782 days ago [-]

simplesamp 782 days ago [-]

Maybe. But the paper being exact nine pages (page limit of neurips this year) and being put on arxiv just a month after neurips deadline is too much of a coincidence.

PeterStuer 782 days ago [-]

One can only dream this level of analysis and falsification would be the norm in academic publishing.

Sadly, this is an extreme outlier for now.

asylteltine 782 days ago [-]

[dead]

bjourne 782 days ago [-]

knaik94 782 days ago [-]

This is seperate from the issues related to the methodology not agreeing with conclusion that GPT4 could "get an MIT degree".

The second conclusion and methodology could have been fixed potentially.

bjourne 782 days ago [-]

The vast majority of all ml research is using copyrighted data both for training and evaluation.

knaik94 782 days ago [-]

I don't mean evaluating a model at the end of training, I mean datasets used to quantify the relative performance across models and algorithms.

Copyrighted data would make it harder to share and use, to reproduce results. I don't remember coming across a benchmark dataset that was copyrighted.

bjourne 782 days ago [-]

That's very odd because I can't for the life of me think of an ml benchmark dataset that is not copyrighted. Copyrighted datasets are everywhere in ml research.

knaik94 778 days ago [-]

https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...

emmender 782 days ago [-]

Such despicable behavior from everyone involved.

1. Professors and MIT students attempting to gain fleeting fame riding on the bandwagon without trying to do deep work

2. Professors at MIT getting upset and lash out that GPT-4 can now get a MIT degree devaluing said degree.

What has become of academia these days ?